Anatomy of an Incident
Google's Approach to Incident Management for Production Services


Book Details
Authors | Ayelet Sachto, Adrienne Walcer |
Publisher | O'Reilly Media |
Published | 2022 |
Edition | 1st |
Paperback | 70 pages |
Language | English |
ISBN-13 | 9781098113759, 9781098113742 |
ISBN-10 | 1098113756, 1098113748 |
License | Compliments of Google Cloud |
Book Description
When it comes to system design, failure is inevitable. Scientists and engineers implement solutions based on the available information, without a complete knowledge of the future. You can't always anticipate the next zero-day event, viral media trend, weather disaster, or shift in technology. But you can be prepared to respond when incidents like these affect your systems.
With this book, SRE and DevOps practitioners, IT managers, and engineering leaders will explore methods to help their organizations prepare for, respond to, and recover from incidents. With advice from Ayelet Sachto, Adrienne Walcer, and Jessie Yang, you'll learn how to prepare for and handle failure if and when it happens.
- Learn the stages of the incident management lifecycle: preparedness, response, recovery, and mitigation
- Deal proactively with incidents: issues that require a coordinated and timely response
- Be prepared: practice disaster role-playing and incident response exercises
- Learn the characteristics of the incident response organizational structure
- Formulate steps to recovery and mitigation after an incident has occurred
- Conduct postmortems to analyze what went wrong
- Explore a real-world example from Google: The Mayan Apocalypse
- Learn how to measure and reduce an incident's impact
- Use postmortems as a tool for prevention and psychological safety
This book is published as open-access, which means it is freely available to read, download, and share without restrictions.
If you enjoyed the book and would like to support the author, you can purchase a printed copy (hardcover or paperback) from official retailers.