The Pragmatic Cost of a Nine
In 25 years, it will probably be Amazon that will go down as the Toyota of the 21st century. However, for astute observers, Google will be the Toyota Production System of the 21st century. Google's beginning starts at the turn of the 21st century. Google was created by two Stanford University students, Larry Page and Sergey Brin, with the help of Sun co-founder Andy Bechtolsheim who wrote the pair a check for $100,000. Around ten years later, Andy Bechtolsheim also created a significant disruption in the networking world when he founded Arista Networks. Jeff Bezos, Amazon founder, was also an early angel investor.
Before Google, information technology (IT) operations were considered administrative. Before 2010, most IT jobs were either programmers or system administrators. Google was founded by two Ph.D. students who saw opportunities from a first-principles perspective. They also recruited a high percentage of PhDs. In his book Principles of Network and System Administration, Mark Burgess claimed that system administration was a form of human-computer engineering. Critics strongly rejected this and said the role should not be called engineers. Google eventually changed the industry mindset by defining the system administration role to Site Reliability Engineering (SRE). Google decided early on that scale would be an issue for their existence.
When Yahoo launched in 1994, there were only around three thousand websites on the internet. By the time Google launched its service in 1998, almost three million websites were on the internet. When Instagram launched in 2010, around three hundred million web servers were on the internet. There are two billion websites on the internet as of 2022 (1). Google realized early on that traditional IT practices and processes would not keep up with the predicted growth of the internet. In Google's view, Google had an engineering problem, not an administration issue. As students in nearly all of their growth areas, they took a pragmatic approach. They redefined how to scale distributed computing related to storing data. They designed databases differently. They even created and developed computer and network devices themselves.
One of the crucial areas is how Google looked at the reliability of its computer infrastructure. In the early 2000's many large banks, retail, and insurance companies saw reliability in absolutes or, some might say, deterministic. You would hear executives say that the system can never go down. They would talk about how many nine's of availability a service could maintain with the primary objective of increasing availability by another nine. A system with four nines (i.e., 99.99 percent availability) would be considered better than one with three nines (i.e., 99.9 percent availability) —five-nines better than four nines, and so on. Service providers would advertise their competitive advantage based on how many nines of availability they could provide.
Google published a book of essays in 2016 called Site Reliability Engineering (SRE), describing how they managed their infrastructure. The book was the first to shed light on a lot of the science and methodology behind how Google grew at a mass scale. In one of the chapters called Embracing Risk, Google explained that it was not their intention to create perfect reliability, systems that never fail. They realized that absolute reliability was not a goal. Instead, they took a pragmatic approach. Absolute or deterministic reliability conflicted with how quickly an organization could grow. At Google, growth was vital. Banks, retail stores, and insurance companies used to have code freezes, where no new features or changes to the system would happen for a specific period. Retail organizations, for example, freeze all changes from November to January. Under those conditions, Google could not grow and remain competitive. Google describes what I would call a pragmatic cost of a nine in the SRE book. The reliability engineer can ask two questions:
How much more money do we make by increasing the nines?
Does adding the additional nine cost more than the money we make?
It's as simple as that.
Charles Sanders Peirce, the founder of Pragmatism, came to the same conclusion in searching for the perfect measurement using pendulums. At some point, the accuracy of the pendulum crossed the pragmatic cost of building it. There was an economic point at which the accuracy of a measurement device reached a point where additional accuracy would yield diminishing returns.