Note: Meeting Room 7 will be available as an On-Call Room for attendees.

Back To Schedule
Thursday, August 31 • 16:30 - 17:00
Incident Management and Chatops at Shopify

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

SREs are expected to be incident management experts. Yet, incident handling is hard, often messy, and exhausting. We encounter new incidents, look up everywhere for possible explanations, sometimes tunnel on symptoms, and, under pressure, forget some good practices.

At Shopify, we care not only about handling incidents quickly and efficiently, but also SRE well-being. We have a special IMOC (Incident Manager On Call) rotation and an incident chatbot to assist IMOCs. In this talk, I’ll first explain the IMOC role and how training SREs for this duty is essential to handling incidents well.

Our chatbot assists the IMOC by reducing manual effort and context switching. We integrated the bot with our conversation tool and several third-party tools (PagerDuty, StatusPage, Github) to send timely reminders. It also binds the incident to a discussion channel where all communications happen, allows status page updates directly from the chat room, keeps notes and records event times, and generates service disruption content. To avoid burnout for long-running incidents, the chatbot also reaches out to other IMOCs.

Our chatbot supports best practices and "streamlines" incident response. Attendees will leave with strategies for incorporating chatbots into their incident management and considerations for automating precisely and smartly.


Daniella Niyonkuru

Daniella Niyonkuru is a Production Engineer at Shopify where she helps build a better, faster and more resilient platform. Previously, Daniella worked as an Aircraft System Software Specialist, and researched Formal Model Driven Development for Embedded Systems.

Thursday August 31, 2017 16:30 - 17:00 IST
Pembroke Room