Boeing 737 MAX MCAS as an agent corrigibility failure

The Boeing Maneuvering Characteristics Augmentation System (MCAS) can be thought of, if reaching a bit, as a specialized AI: it performs a function normally reserved for a human pilot: pitching the nose down when it deems the angle of attack to be dangerously high. This is not, by itself, a problem. There are pilots in the cockpit who can take control when needed.

Only in this case they couldn’t. Simply manually pitching the nose up when the MCAS pitching it down too much would not disengage the system, it would activate again, and again. One has to manually disengage the autopilot (this information was not in the pilot training). For comparison, think of the cruise control system in a car: the moment you press the brake pedal, it disengages; if you push the gas pedal, then release, it return to the preset speed. At no time it tries to override your actions. Unlike MCAS.

MCAS disregards critical human input and even fights the human for control in order to reach its goal of “nominal flight parameters”. From the Corrigibility paper:

We say that an agent is “corrigible” if it tolerates or assists many forms of outside correction, including atleast the following: (1) A corrigible reasoner must at least tolerate and preferably assist the programmers in their attempts to alter or turn off the system...

In this case the “agent” actively fought its human handlers instead of assisting them. Granted, the definition above is about programmers, not pilots, and the existing MCAS probably would not fight a software update, being a dumb specialized agent.

But we are not that far off: a lot of systems include built-in security checks for the remote updates. If one of those checks were to examine the algorithm the updated code uses and reject it when it deems it unacceptable because it fails its internal checks, the corrigibility failure would be complete! In a life-critical always-on system this would produce a mini-Skynet. I don’t know whether something like that has happened yet, but I would not be surprised if it has, and resulted in catastrophic consequences.