Summarizing ‘The Phoenix Project: A Novel about IT, DevOps, and Helping Your Business Win’
I first heard of The Phoenix Project in 2016, when a former customer told me he expected all of his employees to read the book. At the time, I skimmed the book, both out of curiosity and to be a good partner.
As I continue my growth in Cloud & DevOps, I thought it’d make sense to finally read the book completely. This article is the result of my notes & key takeaways from The Phoenix Project.
As for a book review, I give it 5/5 stars because the book:
- caused me to reflect on how I achieve work
- made me laugh out loud a handful of times
- now serves as useful compass in my career
- has a good plot with many relatable characters
- is a unique way learn about IT through fiction
Part 1: Chapter 1–16
Bill’s Leadership
Bill, the protagonist, is regularly focussed on root cause analysis and situational awareness. He is a former Marine so inherently he is founded in leadership principles.
Also the concept of leaders creating other leaders is slowly emerging, especially when Bill empowers Patty and Wes to own and solve their problems, despite Bill being freshly promoted above them. At this point in the book, I had a sense that this aspect would become important as the story continued, and as Parts Unlimited’s problems became increasingly intertwined.
Deadline Performance
In Chapter 4, it becomes apparent that holding several people across multiple departments to one deadline can be incredibly challenging and sometimes unfeasible. Deadlines can be especially tough if the date of the deadline isn’t well-chosen, or if parties aren’t communicating consistently. In the book, one character suggests the Dev team frequently finishes their work late, leaving other teams to be held with the “hot potato.” This is foreshadowing how Development & Operations can to work together closely to continuously achieve more with less time and fewer resources.
Change Management
One obvious disfunction in Parts Unlimited’s IT Department is their lacking adherence to, and their disrespect for, change management processes. Oftentimes characters resort to “just getting it done,” without consideration of how the timing of work can affect other teammates, other areas of the business, or even customers themselves.
It appears at this point that Parts Unlimited has several silos of work that needs completing, but the labor is not well organized, and culturally the isolated departments are sometimes more fixated on starting new work than finishing ongoing work. This goes back to the concepts that they do not have unified planning towards deadlines, and they manage change through improvisation rather than pre-meditation.
Work in Progress (WIP)
A potential board member, Erik, is introduced to the reader as a scruffy but wise advisor. He tests Bills understanding of Parts Unlimited’s problems by bringing him to a manufacturing plant, and showing how the production processes of creating physical products are so similar to processes in Tech — and how they are arguably the same. Erik then goes on to suggest that WIP is the “silent killer,” and in Chapters 7–11 is becomes increasingly apparent that the sheer volume of work bogging down Parts Unlimited is massive, while the importance of some work is called into question.
The 4th Type of Work
Erik hints that there are four types of work going on, three of which are business projects, internal IT projects, and projects to manage change. For a handful of chapters, Bill ponders what the fourth type of work is. It becomes increasingly obvious that the IT organization’s focus is so scattered, and much of that focus is not on work that was originally planned. Instead, they get caught “firefighting,” and oftentimes their efforts to win these firefights cause more fires. As Part 1 ends, I am expecting the reallocation of labor and how it relates to unplanned work to be major pillars of Part 2.
Part 2: Chapter 17–29
Bill’s Growing Leadership
In Part 1, Bill is introduced by the writer as a strong leader, but as Part 2 develops, the reader watches Bill improve in several leadership areas, despite having resigned over frustration at the end of Part 1. At this point, he has become more sensitive to empowering other leaders, he is becoming more aware of the lean ways that his IT environment under his direction can operate, and overall he feels more experienced and “managerial” so to speak, especially as his relationship with the CEO Steve Masters becomes more constructive.
Prioritizing Constraints
At times in Part 1, it becomes increasingly clear that Brent, as top engineer at Parts Unlimited, is regularly a point of bottlenecking. When it comes to getting work done, Brent is Bill’s most vital resource, but Brent is constantly stuck troubleshooting, juggling tasks, helping others, etc. Therefore Bill needs to understand Theory of Constraints, free up Brent, and have Brent focus on whatever will increase the throughput of critical work.
Five Antidotes to Bottlenecking:
- Identifying constraints: Figure out where the constraints exist.
- Exploiting constraints: Seek to make full use of the right resource, ie Brent.
- Subordinating to constraints: Allocating resources to the constraint, and even if it isn’t intuitive, sometimes you have to do less to do more.
- Elevating constraints: Offload tasks, integrate tasks, simplify tasks, or as ends up being a common choice in the book, eliminate tasks.
- Preventing inertia from becoming a constraint: The solution mustn’t become the problem.
Technical Debt
I want to come back to the concept of there being 4 types of work, with the 4th type of work being unplanned work. In a large meeting with the CEO, Erik, Bill & more IT leadership, Erik explains just how detrimental unplanned work really is. He expands, when workers are busy on unplanned work, the opportunity cost of other work that could be done is paid exponentially. While workers are stuck on unplanned work, more unplanned work piles up for them to address after they sort their current unplanned work, and this spirals out of control as is evident from Chapter 1 of the book.
Wait Time vs. Utilization
As the percentage of time a resource is busy increases, wait time grows exponentially. This can be understood with the example that, sometimes if a resource spends 15% of their time slacking, scrolling social media, etc., it can be seen as a positive because they have some working capacity remaining to use as needed. A deeper mathematical dive into the relationship between Wait Time & Utilization can be found in this Medium article by Chris Choy, or this article at AHA Moments.
Implementing Kanban
Chapter 24 sees the story focus on Bill’s personal life. His world seems to be increasingly manageable, and for the first time in a long time, he spends a relaxing weekend with the family. Bill’s relief is in part due to his team (Patty especially) implementing Kanban.
In the absence of improvements, processes don’t stay the same. This is due to chaos and entropy inherent to any system. In order to be agile, maximize visibility of work in progress, and allocate resources accordingly, the team’s Kanban board becomes a central tool in identifying bottlenecks & maximizing throughput of meaningful work. Having worked with Kanban boards in my own team work before, I must say it makes it amazingly easy to see the who is doing work, where it is in the process, what work is getting done, when it will be done, and why it is at risk of incompletion if there is indeed a logjam.
Categories of Waste
As the story continues, Bill gains a more mature understanding how how he can prioritize work. He is also become more and more aware of redundancies and waste. For example, he has an update that is scheduled to a system, but the system is going to be de-commissioned next year, so why update it? The author, Gene Kim, runs us through different types of waste so that workflow can increase.
The types of waste are:
- Partially done work: Semi-complete work that sits unfinished.
- Extra processes which don’t have value: Over-engineering systems without actually improving the system.
- Extra features that are not needed: Adding unnecessary pieces to systems.
- Switching from task to task: Juggling, not specializing in one task, etc.
- Waiting & delays: The result of bottlenecking, jammed up WIP.
- Unneeded Motion: Movement of people or equipment that is unnecessary
- Defective work: Work with results that don’t actually work.
- Non-standardized work: Work that isn’t uniform or easily repeated.
- Heroics: Individuals taking it into their own hands, going off the path, etc.
Part 3: Chapter 30–35
Erik the Guru
Part 3 opens with Bill going back to the manufacturing plant with Erik. A recurring challenge for Bill is that he cannot keep up with customer demand, and Erik suggests that Bill needs a feedback loop to go back to the early parts of product definition, design & development. In order to do so, Erik starts hinting that Bill needs to start reducing his batch size (dramatically) and increasing his deployment frequency (dramatically).
Erik, who throughout the story has been key in nudging along Bill’s growth, has now become the sensai, the master, the guru of IT. The next step according to Erik is to create a deployment pipeline, which shall feature code checked in to production, getting everything in version control, and having the necessary environments build for the Dev, QA & Prod to take place. Erik also calls for deploying code into test and production environments on-demand, saying that this is how Bill can keep up with customer demand.
DevOps Underpinnings
Throughout the book, Bill is consistently trying to silo his new outlook into what Erik describes in Part 1 as The Three Ways, which are:
- The 1st Way — Flow/Systems Thinking: Consider the process of work from Development to Operations, and seek to improve that process holistically.
- The 2nd Way — Amplify Feedback Loops: In the wake of an increasingly harmonious system from The 1st Way, seek to improve the feedback process from Operations to Development.
- The 3rd Way — Culture of Continual Experimentation & Learning: Engrain several, sometimes spontaneous feedback loops within the entire cycle, so that problems can be solved and new information can be formed across the entire cycle.
Safer Systems
I gathered four environmental conditions which will lead to more fault-proof systems, which are:
- Problem swarming: Problems are swarmed & solved which results in the quick construction of new knowledge.
- Globalizing information: New, local knowledge is spread and exploited globally across the organization.
- Leaders creating leaders: Leaders create leaders who are continuously growing the capabilities and themes outlined in this Medium article.
- Managing complex work: By managing complex work, you have the opportunity to reveal the root problems in design & operations.
Big Breakthrough
In Chapter 31, Patty draws out a Value-Stream Map of Parts Unlimited’s deployment pipelines & environments, which reveals that two issues keep coming up.
- 1) At every stage of the deployment process, environments are rarely ready when needed. When they are ready, they need rework.
- 2) The code packaging process is subpar and sometimes non-existent. VP of Dev Chris and his team don’t document the code well enough, making it hard for the team to identify root problems when bugs occur in Ops.
Finally, the team is on the cusp of their big breakthrough: Automation. To allow for a system that automatically pushes work through the system without hiccups, the team looks to build a common build procedure for Dev, QA & Production. They then look to standardize the environments, and they eliminate the majority of variants that cause so much trouble. Brent is invited to Operation sprint meetings, and at the same time everything is more clearly checked into version control. The team starts having each small batch of code committed and tested automatically, and the computing services are appropriately created to host the code environments through infrastructure as code & Cloud services. Suddenly simultaneousness & synchronization are inherent to the increasingly automated IT ecosystem.
Project Unicorn
Project Unicorn becomes a separate “SWAT” team independent of one respective department, with Brent being the key resource on the team. In almost a Guerilla Warfare fashion, the team acts with speed & agility to solve problems and provide mini-feedback loops within the system (The Third Way). As we see in Chapter 32 & 33, much of the newly created information from this team is being utilized globally across the organization, such as when Marketing uses new information for what ends up being highly successful e-mail campaigns.
Note: Project Unicorn ends up being central to another novel written by Gene Kim, called The Unicorn Project: A Novel about Developers, Digital Disruption, and Thriving in the Age of Data
Helping Business Win
While much of the middle of the book focussed on Information Technology, the conclusion of the book highlights the business impact of Parts Unlimited’s IT revolution. DevOps has served as an antidote to Parts Unlimited’s most painful problem: Customers are unhappy, and prior to implementing DevOps, Parts Unlimited could not make their customers happy because the business through IT cannot quickly adapt to customer demand.
Chapter 35 takes place at CEO Steve Masters’ house, where there is a party with the company’s leadership to celebrate an extraordinarily successful financial Q4, as well as to recognize Bill’s impact on success (Spoiler: He is offered a fast-tracked opportunity to eventually become COO). However, the book does not end on a euphoric, ‘kumbaya’ moment, but rather there is word of more business challenges, which will likely be addressed in the aforementioned sequential novel, The Unicorn Project.
Concluding Thought
While I learned a lot of key details, my bottomline takeaway is that DevOps is not a framework or model. Instead DevOps is a cultural paradigm for how various teams need to collaborate with steamlining, automation, and agility to deliver measurable business value.
Thanks for reading.