1. There is no single "root cause". IMO this term is harmful because, while it makes for an easily graspable concept, the metaphor encourages identifying a _single_ cause. There is _never_ a single reason behind an incident. Instead there are always several "contributing factors".
Conversation
Notices
-
Embed this notice
Scott Leggett :fedi: :golang: (smlx@fosstodon.org)'s status on Wednesday, 05-Feb-2025 15:54:07 JST Scott Leggett :fedi: :golang:
-
Embed this notice
Scott Leggett :fedi: :golang: (smlx@fosstodon.org)'s status on Wednesday, 05-Feb-2025 15:54:07 JST Scott Leggett :fedi: :golang:
2. "Human error" is _never_ a contributing factor (or "root cause" 🤬). The problem is that until Human 2.0 comes out it is completely unfixable. Humans don't make decisions or take actions in a vacuum. There is _always_ an outdated procedure, bad policy, false belief, missing documentation, poor tooling, or lack of training behind a mistake made by a human. That is something you can fix!
Alexandre Oliva likes this. -
Embed this notice
Scott Leggett :fedi: :golang: (smlx@fosstodon.org)'s status on Wednesday, 05-Feb-2025 15:54:08 JST Scott Leggett :fedi: :golang:
I see some common ingrained misunderstandings around "Incident Reviews" / "Post Mortems" in technical orgs.
🧵
-
Embed this notice