Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 8 Next »

An incident management process is a set of procedures and actions taken to respond to and resolve critical issues. Only such issues that satisfy all 3 conditions below are qualified as Incidents:

  • impact normal operation of the platform

  • noticeable by external or internal users of Knoema

  • significantly negatively impact the user experience

Documents how to work with incidents:

https://www.atlassian.com/ru/incident-management/handbook/incident-response#assess

5 Steps of Incident Management

Roles:

Incident Manager: Konstantin Trukhin (Unlicensed)

Responsibilities of Incident Manager: Collect information about open incidents and communicate the status to CSM team in #platform-support Slack channel.

Incident Team: Konstantin Trukhin (Unlicensed) Alexey Matyukhin (Unlicensed) Vyacheslav Lopaev Pavel Starkov

Incident SLAs:

Severity

General description of severity

Status update frequency

1

Whole Platform or its critical components are not responding and users are cannot complete routine tasks

Every 30 min

2

Platform response is heavily delayed (load time increased 200%) and users cannot normally complete most typical tasks

Every hour

3

Either a high value customer reported the issue as critical or the issue impacts some very common capability of the platform and is noticeable by many customers

Twice a day

4

Certain non-critical components of the platform are not functioning normally but the issue was reported by some customers already

Daily

5

The issue has not been reported by users and doesn’t impact any important components or performance of the platform

Daily

  • Postmortems should be written for incidents with Severity 1 and 2.

Severity Matrix explained

  • No labels