Incident Management
An incident management process is a set of procedures and actions taken to respond to and resolve critical issues. Only such issues that satisfy all 3 conditions below are qualified as Incidents:
impact the normal operation of the critical service
noticeable by external or internal users of Knoema
significantly negatively impact the user experience
Documents how to work with incidents:
Как мы реагируем на инцидент | Atlassian
5 Steps of Incident Management
Roles
Incident Manager
@Konstantin Trukhin (Unlicensed)
Responsibilities of Incident Manager: Collect information about open incidents and communicate the status to CSM team in #platform-support Slack channel.
Incident Team
The incident team is a team of persons from the engineering side who are helping the incident manager with the investigation from the technical side. The members of the incident team will be changed from sprint to sprint.
Data OASIS - @Niyaz Batyrov (Unlicensed)
Enterprise Core and Data Hub - @Alexey Matyukhin (Unlicensed)
Expert Tools - @Vitaly Popov (Unlicensed)
Incident SLAs
Severity | General description of severity | Status update frequency |
---|---|---|
1 - Critical |
| Every hour |
2 - High |
| Every 2 hours |
3 - Medium |
| Twice a day |
4 - Low |
| Daily |
NOTE: Status update frequency is for business days and working time
Severity Matrix explained
Incident fixation
Each incident should be added to the Jira project: https://knoema.jira.com/jira/software/c/projects/IN/boards/57
Incident priority should be set up based on the severity of the incident described above.
Postmortems should be written for incidents with Severity 1 and 2.