PagerDuty seeks to ease incident response with generative AI

PagerDuty on Wednesday announced enhancements to its Operations Cloud incident response (IR) platform aimed at accelerating response and remediation when IT issues arise.

The new capabilities include features aimed at analyzing cyber events and interacting with IR teams in conversational English in real-time.

“We typically see time to acknowledge [an incident] at about 45 minutes to an hour. We can now bring that down to just two or three minutes,” said Jonathan Rende, senior VP for products at PagerDuty, in an interview with CIO.com. “And it’s not just about responding, but it’s about fixing it. That can be multiple hours and even days. We’re taking it down to sub one-hour.”

Some of the new capabilities exist within the company’s PagerDuty Copilot offering, such as having an automated function to summarize post-incident reviews. The copilot, which is invoked via Slack, was first announced in December 2023.

“By interpreting the results of automated diagnostics, providing responders with helpful incident context, drafting status updates, and generating drafts of post-incident reviews with the click of a button, PagerDuty Copilot allows teams to eliminate time-consuming and repetitive tasks so they can focus on high-priority needs,” the company wrote in a press release. “If the user asks PagerDuty Copilot to generate a post-incident review, it can generate a draft in seconds — reduced from the hours it typically takes.”

In addition to post-incident review, PagerDuty Copilot includes a low-code feature for generating workflow definitions and prompts, an automation digest for summarizing highlights from automated tasks, and the ability to build a narrative timeline of an incident, which requires PagerDuty’s Enterprise Incident Management or Jeli products and is in early beta.

Rende said that one of the better additions is PageDuty Copilot’s enablement of natural conversation interactions. He gave an example of a large retailer experiencing an unexpected shutdown of all shopping carts.

IR managers “could create a virtual war room via a Slack channel. Someone could ask, ‘What just changed?’” and PagerDuty Copilot would list all system changes, with a focus on the elements that it thinks are most likely the cause, Rende said.

One IT challenge today is a lack of full visibility into global environments. Large cloud environments provide an example of this, as do SaaS platforms and third-party partner systems in IT supply chains. Although Rende said that PagerDuty does not have visibility into systems IT cannot access, it’s generative AI systems could extrapolate from what it can see to analyze and potentially suggest the third-party platform that is likely responsible — by ruling out everything else.

Stephen Elliot, an IDC group vice president who tracks cloud and DevOps, saw much to like in the announcement. Asked about how an enterprise would fare differently from using these new capabilities and doing it themselves with their existing AI-based automation, Elliot pointed to consolidation.

“The difference is that this is from a single platform/technology and not a set of separate best-of-breed tools that do AIOps, automation, root causes, etc., separately. Automation is part of the platform,” Elliot said. “The data is collected onto the platform and the genAI applied to it. They have a large customer base so, for many tech shops with generalists, this might make sense.”

The PagerDuty Operations Console is currently in Early Access and will be generally available in Q3 2024, the company said. Workflow Automation is generally available. Runbook Automation’s project-based runner management is currently in Early Access and will be generally available in Q3 2024. PagerDuty Copilot is currently in Early Access and will be generally available in Q3 2024. The enterprise plan for PagerDuty Incident Management, including Jeli Post-Incident Reviews, is generally available.

© Foundry