Call for cognitive science in AI safety

Epistemic status: High expected utility, but also very high variance

The more I realise that AI take off is something that actually might happen, the more I am pulled towards this problem:

  • What are human preferences really?

  • What is the generator of human preferences?

  • What are our preferences made of?

  • What is the structure behind it all?

Before we tell our brand new AI overlord to figure out our values and do whatever we want it to do, we really ought to have a clearer idea of what “values” and “want” means.

I have a good idea of what my preferences are within the limited reach of my lived experience, and even a little bit beyond that. But to extrapolate from that into the vast distance of possible futures seems extremely dangerous.

My values are inconsistent and conflicting and definitely not constant over time. On top of that, there are the big heap of unknown unknowns with respect to how the brain works.

I am convinced that to solve AI safety we need to have a good understanding of human values, and I know I don’t have this understanding. I am just a physics and math nerd. I don’t know this stuff. I don’t know if the questions I have are open research questions, or if this stuff is already well known and understood in some separate community somewhere. That is why we need psychology nerds to join the cause.

Another topic that I would want AI safety orientated psychology research to do, is something like a case study of friendliness in existing agents (humans, subsystems in the brain, organisations). What are the mechanisms in the human brain that make us care about others, and can that be replicated?

* * *

A problem I see is that only math and computer nerds are called upon to work on AI safety, and all the psychology nerds out there do not even know that they are needed. Or maybe the psychology research that I am looking for is already out there and we just need to find each other to collaborate more.

I think that it is important that technical AI safety research does not try to set the agenda for psychology AI safety research. Information and inspirations needs to flow both ways. Both fields need to be free to follow their own curiosity, but we also need to collaborate to ground our work in each other’s knowledge.

* * *

Linda Linsefors

Cosigning:

Alexander Appel

Holden Lee

somnulence logencia