veniture Blog

Enterprise crisis management with Jira Service Management

Written by Alma Dizdaric | Dec 1, 2021 11:02:00 AM

An organizational crisis is a single event or series of events that cause significant disturbances to an organization’s workflow, information flow, or processes. Organizational crises are usually without warning and pose a threat to an organization's reputation.

Crises can be in the form of IT system failures, downtime, faulty hardware, cyber attacks, data leaks, or any other incident that can impede employees' ability to work effectively or hinder customers from interacting or using a company's services or products.

All organizations have inherent fault lines that leave them vulnerable to a crisis. eCommerce stores can experience server downtime or latency due to high traffic volumes. Systems may fail or get hacked. A manufacturing plant may have an industrial accident. While PR crises range from workplace violence to natural disasters, a technological crisis is far more common and can be likened to a tsunami. The fault lines trigger an earthquake, unleashing a wave of bad publicity—backlash on social media platforms, poor public relations, and ultimately, lost business.

While it is not always possible to fully prevent crises, there are always a few tremors or early warning signals before the actual crisis. Customers log faults, complaints, or notice bugs and report them to the IT service desk. Recognizing and responding to early indicators of a more significant crisis can mitigate its impact on business operations and aid recovery efforts dramatically when the crisis is in full scale. Crisis management methods should form an important part of business continuity planning.

 

How can organizations manage crises?

Organizations have to respond and resolve incidents and crises appropriately to minimize their impact on day to day operations. Here are several steps they must take as part of a successful crisis management process:

 

Identification, logging, and categorization

When in a crisis, it may take some time for teams to be informed. Crisis incidents and other unexpected events are typically identified through users, community members logged reports, system analyses, or are manually identified. Once an incident has been identified, it must be logged and categorized according to its urgency and impact. Categorization is a key component of risk assessment in crisis situations. P1 incidents are incidents that have a critical business impact and are prioritized for immediate resolution. Once these events are logged, investigation and categorization can commence.

 

Notification and escalation

Escalation determines the categorization of incidents. It determines resource allocation and subsequent procedures. For minor incidents, details are logged, and teams are notified without an official alert.

 

Investigation and diagnosis

Once an incident is logged and assigned, employees investigate the cause and solutions to the incident. This is sometimes referred to as the response phase by crisis experts. The crisis management team or the crisis manager will notify stakeholders, including staff, human resources, the communication crisis team, customers, or authorities, about the incident and inform them of the crisis management plan in due time.

 

Recovery

Once a crisis has been resolved and systems restored, steps are taken to avoid recurrences and sustain the recovery process, all of which are part of post-crisis operations. During recovery, the damage caused by the crisis is determined. After recovery, incidents are closed by finalizing documentation and evaluating the actions taken in response to the crisis to help teams identify areas for improvement.

Documentation may be used to support crisis communications, inform employees about an incident or communicate with customers. Learnings are added to the crisis response plan if relevant to help teams treat a similar crisis when it arises.

 

How does an escalation process work?

Escalation processes are formal processes for addressing issues and problems as they arise, typically built on IT monitoring. If properly configured, this process will provide companies with alarms and warnings whenever an incident occurs. All companies must have a written escalation process, and teams must be trained to use them effectively.

An escalation process provides calm during a crisis. All emergencies become a matter of implementing decisions and best practice procedures based on previous learnings.

A good escalation process assigns priority levels to issues to eliminate the need for judgment calls. It delegates responsibilities to specific personnel and defines how much time staff at different support levels should spend attempting to fix the issue before escalating it to the next support tier. There should also be a plan to inform and alert those impacted by the problem itself, e.g., customers who may not be able to access their accounts.

 

Introducing JSM's Service Desk as a portal for requests

IT teams receive a variety of requests on any given day. A service request is typically a user request for something new (e.g., a new laptop), while a change request informs IT that something is being added, modified, or removed that may affect IT services (e.g., a system upgrade). At the same time, your service desk may encounter serious incidents or problems that require escalation.

 

If tickets bog down a service desk, it is much harder to prioritize and distinguish routine fixes from crises that could result in reputational harm. Low-risk service requests must be handled as a distinct workstream and can be automated and pre-approved, allowing the IT team to focus on more critical incidents.

However, while requests should be handled in different workstreams, ITIL processes recommend non-hierarchical collaboration for teams to manage requests more effectively. In the traditional model, IT support is split into the service desk, tech or app management teams, and developer/vendor support. Support is tiered and based on escalation.

The service desk solves most tickets, and what they cannot resolve is passed to second-tier support and so forth. When a major crisis occurs, crisis communication may take some time to get to the right tier.

ITIL recommends swarming rather than a tiered approach. One support personnel handles a ticket from start to finish, and the employee most likely to resolve the ticket responds to the crisis. If an employee cannot resolve it, rather than pass it on, the problem is solved collaboratively with support from the wider team and other key stakeholders using tools like Slack. That way, everyone learns by resolving the request. This knowledge is best retained in a central repository so that all team members (and future hires or other stakeholders) can learn from practical crisis incidents. Swarming also helps maintain communication with the customer and avoids bouncing them from support agent to support agent without resolution.

Automation can be deployed through self-service tools detailing problems that can be quickly resolved to avoid bogging teams down with tickets. Automation helps eliminate everyday repetitive tasks from the queue and increases capacity. Automation can also route service requests, act as a triage tool, or as an emergency event notification to the most capable team member available for fast resolution.

Jira Service Management empowers every team to set up a service desk quickly and at scale. Work is tracked through an open, collaborative platform, linking issues across Jira and development tools so that all swarming teams have access to rich, contextual data. This allows them to respond quickly and efficiently to service requests, incidents, and crises. Jira Service Management efficiently manages risks and customer service with automation support that eliminates unnecessary work. JSM also provides a complete audit trail and knowledgebase to foster a culture of learning within the organization.

 

Integration with OpsGenie

Opsgenie is an incident management platform designed to ensure that critical incidents are never ignored and directed to the right staff. Opsgenie receives alerts from your organization's monitoring systems and apps and categorizes alerts based on importance. It then notifies the right on-call employees through multiple channels (voice calls, email, SMS, push messages). If the alert is not acknowledged, it is automatically escalated.

Opsgenie is now fully integratable with Jira, allowing teams to manage all Jira issues and Opsgenie alerts from a single platform. Issues can be assigned, acknowledged, and closed from OpsGenie. OpsGenie also supports rich notifications for all Jira issues on preferred collaboration tools such as Slack, Teams, or HipChat.

 

The key stages of the crisis management process

The first step in effective crisis management is crisis preparedness. Ensuring that you have the right tools to identify, manage and resolve crises is your best defense against mishaps in your organization.