Q: I write software for stuff that isn’t life or death. Because of this, I feel comfortable guessing & checking, copying & pasting, not having full test coverage, etc. and consequently bugs get through every so often. How different is it to work on safety critical software?
A: Having worked on both safety critical and non-safety critical software, you absolutely need to have a different mentality. The most important thing is making sure you know how your software will behave in all different scenarios. This affects the entire development process including design, implementation and test. Design and implementation will tend towards smaller components with clear boundaries. This enables those components to be fully tested before they are integrated into a wider system. However, the full system still needs to be tested, which makes end to end testing and observability an important part of the process as well. By exposing information about the decisions the software is making in telemetry, we are able to automate monitoring of the software. This automation can be used in development, regression testing, as well against software running on the real vehicles during missions. This helps us to be confident the software is working as expected throughout its entire life cycle, especially when we have crew onboard.
Came across this in a SpaceX AMA:
I am reminded of the Rocket Alignment Problem and the posts on Security Mindset.