Misalignment Harms Can Be Caused by Low Intelligence Systems

Most clinical trials are very simple statistical models. There are not trillions of parameters in a complex neural network architecture. Yet they are still the most effective tools we have at validating and optimising health and recovery in human beings. Very simple statistical tools can lead to large effects and, if misaligned, very large harms.

When private companies use A/​B tests on human beings, they are usually explicitly optimising for objectives which are not aligned with human wellbeing—engagement, click-through rate, company profits. The harm that this can then cause depends on the nature of the interaction surface between the technology and the human, and the amount of exposure.

In configuration space we can consider the total effect of a company on a human as where is the ‘interaction bandwidth’ between human and company across the human’s markov blanket at time t. As an approximation, we can say the magnitude of the overall interaction is going to be roughly proportional to the magnitude of the average interaction bandwidth multiplied by the cumulative interaction time:

In modern social media, is obviously high relative to many other interventions, but what about ? What makes the interaction bandwidth high or low? Is this determined by the the complexity of the recommendation algorithm you use? Is this determined by the potency of the content available to recommend?

If there was a piece of content so convincing that anyone who read it instantly decided to sell all of their possessions, give the proceeds to MegaCorp and spend the rest of their life giving any available money to MegaCorp, then all the MegaCorp Recommendation Algorithm™ would have to do to maximise profits would be to show this content. It would not require a complex statistical model. Now this content may not exist, but participants in the platform are strongly incentivised to try and create it as conforming with the utility of the platform leads to broad reach again using a simple recommendation system.

Now, once you do have a highly intelligent recommendation system optimising on humans with knowledge of the human’s present and past states, the interaction between recommendation system and human is no longer ergodic and there can be more powerful interactions constructed over time with a tight feedback loop on the human. My point is, that this is not necessary for severe effects and severe harms to be caused by misaligned systems.