Interesting anecdotes from an ex-SpaceX engineer who started out thinking “Elon’s algorithm” was obviously correct and gradually grew cynical as SpaceX scaled:
Questioning the requirements was an extremely literal thing that you were supposed to do multiple times every single day. I’d make a claim about my system (“hey, if the stuff in this tube gets too hot, my part will explode, so please don’t put anything too hot near it”) and that very afternoon three or four people would stop by my desk, ready to debate.
“Hello,” they would say. “I’m the Responsible Engineer for the Hot Things Near Tubes system,” and then the floodgates would open. What did I mean by near? What did I mean by hot? How hot was too hot? Was it really going to explode? If it exploded, was that really so terrible?
The first time, the debate would be interesting. The second, it would be a bit tiresome. By the first week after a new claim, it was exhausting and a little rote. But you had to win, every time, because if you didn’t, nobody would follow your requirement.
It also worked in the other direction. I learned to pay attention to everything that was happening in the whole program, absorbing dozens of update emails a day, because people would announce Requirements, and I’d need to go Question Them. If I didn’t do this, I’d find my system forced to jump through too many hoops to work, and, of course, I would be Responsible. If I was Responsible for too many things, I wouldn’t be able to support all of them—unless, of course, I managed to Delete the Part and free myself from one of those burdens.
And so when there were requirements, they were strong, because they had to survive an endless barrage of attack. When there were parts, they were well-justified, because every person involved in the process of making them had tried to delete them first. And there were no requirements matrices, no engineering standards, practically no documentation at all.
The key point came in, the reason why it was capitalized. It wasn’t philosophy, it wasn’t advice—it was an Algorithm. A set of process steps that you followed to be a good engineer. And all of us good engineers were being forced by unstoppable cultural forces to maniacally follow it.
There was one question slowly building in my mind. The point of SpaceX was to get good engineers, do first principles analysis, let them iterate, and avoid documentation. This whole process was clearly succeeding at the last three steps. But if we were already so great, why did we have to have this process enforced so aggressively?
As the time went on and the Algorithm grew, screaming ever-louder about what we should specifically do, the question grew more ever more urgent.
Tell people to ritualize Questioning Requirements and they will do so ritually. You’ll deliver the same explanation for how hot your tube can be a hundred times, and each time you deliver it you think about it less. You will realize that the best way to get work done is to build a persona as extremely knowledgeable and worthless to question, and then nobody ever questions your work.
Tell people to Delete the Part, and they’ll have the system perform ridiculous gymnastics in software to avoid making a 30$ bracket, or waste performance to avoid adding a process.
Tell people to Optimize the Part and they’ll push it beyond margins unnecessarily, leaving it exquisite at one thing and hopeless at others.
Tell them to Accelerate, and they’ll do a great job of questioning, but when push comes to shove they will always Accelerate at the cost of quality or rework, and so you find yourself building vehicles and then scrapping them, over and over again.
There is no step for Test in the Algorithm, no step for “prove it works.” And so years went by where we Questioned, and Deleted, and Optimized, and Accelerated, and Automated, and rockets piled up outside the factory and between mid-2021 and mid-2023 they never flew.
Every engineer was Responsible for their own part. But every engineer had perverse incentives. With all that Accelerating and Automating, if my parts got on the rocket on time, I succeeded. In fact, if the rocket never flew, I succeeded more, because my parts never got tested.
And so we made mistakes, and we did silly things. The rocket exploded a lot, and sometimes we learned something useful, but sometimes we didn’t. We spent billions of dollars. And throughout it all, the program schedule slid inexorably to the right.
And I got cynical.
There were enormous opportunities to have upside improvement in the rocket industry of the 2000s and 2010s. The company was small and scrappy and working hard. The rules applied.
But by the 2020s, even SpaceX was growing large. The company had passed 10,000 people, with programs across the country, tendrils in every major space effort and endlessly escalating ambition.
And the larger it became, the greater the costs to its architecture became. As my program grew from dozens of people to hundreds to thousands, every RE needed to read more emails, track more issues, debate more requirements. And beyond that, every RE needed to be controlled by common culture to ensure good execution, which wasn’t growing fast enough to meet the churn rate of the new engineers.
This makes me wonder if SpaceX could actually be substantially faster if it took systems engineering as seriously as the author hoped (like say the Apollo program did), overwhelmingly dominant as they currently are in terms of mass launch fraction etc. To quote the author:
The first recorded use of the term “Systems Engineering” came from a 1950 presentation by Mervin J. Kelly, Vice President of Bell Telephone. It appeared as a new business segment, coequal with mainstays like Research and Development. Like much of the writing on systems engineering, the anodyne tone hid huge ambition.
‘Systems engineering’ controls and guides the use of the new knowledge obtained from the research and fundamental development programs … and the improvement and lowering of cost of services…’
In other words, this was meta-engineering.
The problems were too complex, so the process had to be a designed thing, a product of its own, which would intake the project goals and output good decision making.
It began with small things. There should be clear requirements for what the system is supposed to do. They should be boxed out and boiled down so that each engineer knows exactly what problem to solve and how it impacts the other ones. Changes would flow through the process and their impacts would be automatically assessed. Surrounding it grew a structure of reviews, process milestones, and organizational culture, to capture mistakes, record them, and make sure nobody else made them again.
And it worked! All of those transcendental results from Apollo were in fact supported on the foundations of exquisitely handled systems engineering and program management. The tools developed here helped catapult commercial aviation and sent probes off beyond the Solar System and much more besides.
At SpaceX, there was no such thing as a “Systems Engineer.” The whole idea was anathema. After all, you could describe the point of systems engineering, and process culture more generally, as the process of removing human responsibility and agency. The point of building a system to control human behavior is that humans are fallible. You write them an endless list of rules to follow and procedures to read, and they follow them correctly, and then it works out.
At SpaceX, it wasn’t going to be like that. First principles thinking and Requirements Questioning and the centrality of responsible engineering all centered around the idea of raising the agency of each individual engineer. Raising individual responsibility was always better.
My guess based on reading anecdotes like these and Berger’s books is that the algorithm is a vast improvement over anyone else’s engineering practices, but it alone doesn’t tell you what else you need to run a company. Maybe systems engineering is the missing piece, maybe some other management philosophy.
If you look at the major SpaceX programs, they are: Falcon development, operations, Starlink, and Starship. The first three were wildly successful, and Starship is late but technically and operationally superior to other companies (e.g. Raptor engines are double the chamber pressure of BE-4 and there have been 10x the test flights), with successes directly traceable to each step of the algorithm, and wasted energy due to not doing something else when appropriate. Raptor 3 engines are only possible to make as cheaply as Elon wants because they had a vast number of parts deleted; yet they also “accelerate”d to build hundreds of Raptor 2s which are now obsolete.
Interesting anecdotes from an ex-SpaceX engineer who started out thinking “Elon’s algorithm” was obviously correct and gradually grew cynical as SpaceX scaled:
This makes me wonder if SpaceX could actually be substantially faster if it took systems engineering as seriously as the author hoped (like say the Apollo program did), overwhelmingly dominant as they currently are in terms of mass launch fraction etc. To quote the author:
My guess based on reading anecdotes like these and Berger’s books is that the algorithm is a vast improvement over anyone else’s engineering practices, but it alone doesn’t tell you what else you need to run a company. Maybe systems engineering is the missing piece, maybe some other management philosophy.
If you look at the major SpaceX programs, they are: Falcon development, operations, Starlink, and Starship. The first three were wildly successful, and Starship is late but technically and operationally superior to other companies (e.g. Raptor engines are double the chamber pressure of BE-4 and there have been 10x the test flights), with successes directly traceable to each step of the algorithm, and wasted energy due to not doing something else when appropriate. Raptor 3 engines are only possible to make as cheaply as Elon wants because they had a vast number of parts deleted; yet they also “accelerate”d to build hundreds of Raptor 2s which are now obsolete.