Explore a comprehensive analysis of failure management across diverse fields in this 46-minute conference talk from linux.conf.au. Delve into real-world examples from open hardware, aviation, and Google's production environment to gain valuable insights on anticipating, preventing, and learning from failures. Discover practical strategies for developing a keen sense for potential issues, implementing effective procedures, and conducting thorough root cause analyses. Learn from critical incidents in aviation, such as AF447 and QF32, and understand the implications of automation gone wrong. Gain knowledge on avoiding hardware mishaps, improving software development practices, and the importance of proper postmortems. This talk equips you with essential skills to enhance your approach to risk management and failure prevention across various technological domains.
Planning for and Handling Failures - From Open Hardware and Aviation to Production at Google