I have a tentative model for this category of phenomenon that goes something like:
Reality has a surprising amount of detail. Everyday things that you use all the time and appear simple to you are actually composed of many sub-parts and sub-sub-parts all working together.
The default state of any sub-sub-part is to not be in alignment with your purpose. There are many more ways for a part to be badly-aligned than for it to be well-aligned, so in order for it to be aligned, there has to be (at some point) some powerful process that selectively makes it be aligned.
Even if a part was aligned, the general nature of entropy means there are many petty, trivial reasons that it could stop being aligned with little fanfare. (Though the mean-time-to-misalignment can vary dramatically depending on which part we’re talking about.)
So, it shouldn’t be surprising when find that a complex system is broken in seven different ways for trivial and banal reasons. That’s the default outcome if you just put a system in a box and leave it there for a while.
OK, but if that’s the default state, then how do I explain the systems that aren’t like that?
Suppose we have a system that is initially working perfectly until, one day, one tiny thing goes wrong with the system.
If people use the system frequently and care about the results, then someone will promptly notice that there is one tiny thing wrong.
If the person who discovers this expects to continue using the system in the future, they have an incentive to fix the problem.
If there is only one problem, and it is tiny, then the cost to diagnose the fix the problem is probably small.
So, very often, the person will just go ahead and fix it, immediately and at their own expense, just to make the problem go away.
No one keeps careful track of this—not even the person performing the fix. So this low-level ongoing maintenance fades into the background and gets forgotten, creating the illusion of a system that just continues working on its own.
This is especially true for multi-user systems where no individual user does a large percentage of the maintenance
I don’t think this invisible-maintenance situation describes the majority of systems, but I think it does describe the majority of user-system interactions, because the systems that get this sort of maintenance tend to be the ones that are heavily used. This creates the illusion that this is normal.
Some of the ways this can fail include:
Users cannot tell that the system has developed a small problem
Maybe the system’s performance is too inconsistent for a small problem to be apparent
Maybe the operator is not qualified to judge the quality of the output
Maybe the system is used so infrequently that there’s time for several problems develop between uses
For an individual user, the cost (to that user) of fixing a problem is higher than the selfish benefits to that particular user of the problem being fixed
Maybe no single user expects to use the system very many times in the future
Maybe users lack the expertise or the authority to perform the fix (and there is no standard channel for maintenance requests that is sufficiently cheap and reliable)
Maybe the system is just inherently expensive to repair or to debug (relative to the value the system provides to a single user)
Hmmm… I like the continuous maintenance model here, but I don’t buy it as the main explanation for all the systems which aren’t printer-like. As a counterexample: consider a table. Or a chair, or a whiteboard marker or toothbrush (which need to be replaced sometimes but don’t have the deep-recursive-problems nature), or the chrome web browser (which works pretty consistently, and when it doesn’t work a restart or reinstall basically-always works without any further recursive problem solving, even on my linux box). These don’t require much continuous maintenance from the user(s).
It feels like there’s something here about simplicity—e.g. tables, chairs, and whiteboard markers may have a surprising amount of detail if you look close, but they’re still substantively simple in a way that printers are not. My current best articulation is roughly that printers have a more complex/narrower set of preconditions on their environment in order to work, whereas tables and chairs and the chrome web browser have a much simpler/more robust set of preconditions on their environment. The surprising detail of a table is mostly “internal” in some sense; it’s in the design itself, not in the preconditions/assumptions made by the designer.
… but I’m not at all sure that that’s the right way to think about it.
I guess it’s about entropy. Some items have more entropy than others. They have more parts and more possible states, and most of those states are misconfigured.
A table or a chair has most of its parts glued together in a fixed, low-entropy position, and has few parts to begin with.
A printer has many parts, both at the hardware and software level, and they can be in many states. It has paper, which can be in a state of having run out or not, and if it has run out, the supply can be almost literally anywhere. It has many intricate mechanical parts which can wear out at different rates, it has an ink cartridge that can dry out even when unused, and many intricate little parts that I don’t even know about.
On top of that, the software also has high complexity. Lots of software components, including the low level stuff that manages the actual printing, whatever needs to connect to the wifi, whatever is needed in case it can print from email, etc. all of these processes can run concurrently, which may introduce race conditions, any of these components may have a bug.
And then there’s the drivers installed on your laptop, which may be out of date, or your laptop’s drivers may be fully up to date but the printer software is out of date and your driver is no longer compatible with it.
Many components, many possible states, high entropy.
EDIT: also, I think a big part of it is that printers aren’t really tested for reliability. I used to work at a car manufacturer. Cars are hybrid software-hardware systems, and both sides of that equation are highly complex. Car engineering is a huge discipline that focuses aggressively on reliability. Even though car makers appear to come out with a new model every year, in reality they only change the cosmetics. The underlying platform usually only changes once every six years, and every version of the platform is aggressively stress tested in every possible way. The company I worked for had a test track in the coldest possible part of Canada, as well as in Death Valley, because reliability testing has to be aggressive and thorough.
Also the software side used special, hardened compilers that only supported a subset of the programming language, and were rarely changed. Getting a contract to supply a car company with a critical tool like a compiler is a hugely profitable multi-year deal, because they are happy to pay for reliability and reluctant to change something that is working well.
I bet nobody really cares about the reliability of printers to the same extent, and they have huh entropy, so they generally suck.
I would agree that, while reality-in-general has a surprising amount of detail, some systems still have substantially more detail than others, and this model applies more strongly to systems with more detail. I think of computer-based systems as being in a relatively-high-detail class.
I also think there are things you can choose to do when building a system to make it more durable, and so another way that systems vary is in how much up-front cost the creator paid to insulate the system against entropy. I think furniture has traditionally fallen into a high-durability category, as an item that consumers expect to be very long-lived...although I think modernity has eroded this tradition somewhat.
I have a tentative model for this category of phenomenon that goes something like:
Reality has a surprising amount of detail. Everyday things that you use all the time and appear simple to you are actually composed of many sub-parts and sub-sub-parts all working together.
The default state of any sub-sub-part is to not be in alignment with your purpose. There are many more ways for a part to be badly-aligned than for it to be well-aligned, so in order for it to be aligned, there has to be (at some point) some powerful process that selectively makes it be aligned.
Even if a part was aligned, the general nature of entropy means there are many petty, trivial reasons that it could stop being aligned with little fanfare. (Though the mean-time-to-misalignment can vary dramatically depending on which part we’re talking about.)
So, it shouldn’t be surprising when find that a complex system is broken in seven different ways for trivial and banal reasons. That’s the default outcome if you just put a system in a box and leave it there for a while.
OK, but if that’s the default state, then how do I explain the systems that aren’t like that?
Suppose we have a system that is initially working perfectly until, one day, one tiny thing goes wrong with the system.
If people use the system frequently and care about the results, then someone will promptly notice that there is one tiny thing wrong.
If the person who discovers this expects to continue using the system in the future, they have an incentive to fix the problem.
If there is only one problem, and it is tiny, then the cost to diagnose the fix the problem is probably small.
So, very often, the person will just go ahead and fix it, immediately and at their own expense, just to make the problem go away.
No one keeps careful track of this—not even the person performing the fix. So this low-level ongoing maintenance fades into the background and gets forgotten, creating the illusion of a system that just continues working on its own.
This is especially true for multi-user systems where no individual user does a large percentage of the maintenance
I don’t think this invisible-maintenance situation describes the majority of systems, but I think it does describe the majority of user-system interactions, because the systems that get this sort of maintenance tend to be the ones that are heavily used. This creates the illusion that this is normal.
Some of the ways this can fail include:
Users cannot tell that the system has developed a small problem
Maybe the system’s performance is too inconsistent for a small problem to be apparent
Maybe the operator is not qualified to judge the quality of the output
Maybe the system is used so infrequently that there’s time for several problems develop between uses
For an individual user, the cost (to that user) of fixing a problem is higher than the selfish benefits to that particular user of the problem being fixed
Maybe no single user expects to use the system very many times in the future
Maybe users lack the expertise or the authority to perform the fix (and there is no standard channel for maintenance requests that is sufficiently cheap and reliable)
Maybe the system is just inherently expensive to repair or to debug (relative to the value the system provides to a single user)
Hmmm… I like the continuous maintenance model here, but I don’t buy it as the main explanation for all the systems which aren’t printer-like. As a counterexample: consider a table. Or a chair, or a whiteboard marker or toothbrush (which need to be replaced sometimes but don’t have the deep-recursive-problems nature), or the chrome web browser (which works pretty consistently, and when it doesn’t work a restart or reinstall basically-always works without any further recursive problem solving, even on my linux box). These don’t require much continuous maintenance from the user(s).
It feels like there’s something here about simplicity—e.g. tables, chairs, and whiteboard markers may have a surprising amount of detail if you look close, but they’re still substantively simple in a way that printers are not. My current best articulation is roughly that printers have a more complex/narrower set of preconditions on their environment in order to work, whereas tables and chairs and the chrome web browser have a much simpler/more robust set of preconditions on their environment. The surprising detail of a table is mostly “internal” in some sense; it’s in the design itself, not in the preconditions/assumptions made by the designer.
… but I’m not at all sure that that’s the right way to think about it.
I guess it’s about entropy. Some items have more entropy than others. They have more parts and more possible states, and most of those states are misconfigured.
A table or a chair has most of its parts glued together in a fixed, low-entropy position, and has few parts to begin with.
A printer has many parts, both at the hardware and software level, and they can be in many states. It has paper, which can be in a state of having run out or not, and if it has run out, the supply can be almost literally anywhere. It has many intricate mechanical parts which can wear out at different rates, it has an ink cartridge that can dry out even when unused, and many intricate little parts that I don’t even know about.
On top of that, the software also has high complexity. Lots of software components, including the low level stuff that manages the actual printing, whatever needs to connect to the wifi, whatever is needed in case it can print from email, etc. all of these processes can run concurrently, which may introduce race conditions, any of these components may have a bug.
And then there’s the drivers installed on your laptop, which may be out of date, or your laptop’s drivers may be fully up to date but the printer software is out of date and your driver is no longer compatible with it.
Many components, many possible states, high entropy.
EDIT: also, I think a big part of it is that printers aren’t really tested for reliability. I used to work at a car manufacturer. Cars are hybrid software-hardware systems, and both sides of that equation are highly complex. Car engineering is a huge discipline that focuses aggressively on reliability. Even though car makers appear to come out with a new model every year, in reality they only change the cosmetics. The underlying platform usually only changes once every six years, and every version of the platform is aggressively stress tested in every possible way. The company I worked for had a test track in the coldest possible part of Canada, as well as in Death Valley, because reliability testing has to be aggressive and thorough.
Also the software side used special, hardened compilers that only supported a subset of the programming language, and were rarely changed. Getting a contract to supply a car company with a critical tool like a compiler is a hugely profitable multi-year deal, because they are happy to pay for reliability and reluctant to change something that is working well.
I bet nobody really cares about the reliability of printers to the same extent, and they have huh entropy, so they generally suck.
I would agree that, while reality-in-general has a surprising amount of detail, some systems still have substantially more detail than others, and this model applies more strongly to systems with more detail. I think of computer-based systems as being in a relatively-high-detail class.
I also think there are things you can choose to do when building a system to make it more durable, and so another way that systems vary is in how much up-front cost the creator paid to insulate the system against entropy. I think furniture has traditionally fallen into a high-durability category, as an item that consumers expect to be very long-lived...although I think modernity has eroded this tradition somewhat.