ISO: Name of Problem

I’m look­ing for a name for a prob­lem. I ex­pect it already has one, but I don’t know what it is.

The prob­lem: sup­pose we have an AI try­ing to learn what people want—e.g. an IRL vari­ant. In­tu­it­ively speak­ing, we point at a bunch of hu­mans and say “fig­ure out what they want, then do that”. A few pos­sible ways the AI could re­spond:

  • “Hmm, to the ex­tent that those things have util­ity func­tions, it looks like they want friend­ship, chal­lenge, status, etc…”

  • “Hmm, it looks like they want to max­im­ize the num­ber of cop­ies of the in­form­a­tion-car­ry­ing mo­lecules in their cells.”

  • “Hmm, it looks like they’re try­ing to max­im­ize en­tropy in the uni­verse.”

  • “Hmm, it looks like they’re try­ing to min­im­ize phys­ical ac­tion.”

Why would the AI think these things? Well, you’re point­ing at a bunch of atoms, and the mi­cro­scopic laws of mo­tion which gov­ern those atoms can be in­ter­preted as min­im­iz­ing a quant­ity called ac­tion. Or you’re point­ing at a bunch of or­gan­isms sub­ject to a se­lec­tion pro­cess which (loc­ally) max­im­izes the num­ber of cop­ies of some in­form­a­tion-car­ry­ing mo­lecules. How is the AI sup­posed to know which op­tim­iz­a­tion pro­cess you’re point­ing to? How can it know which level of ab­strac­tion you’re talk­ing about?

What data could tell the AI that you’re point­ing at hu­mans, not the atoms they’re made of?

This sounds like a ques­tion which would already have a name, so if any­body could point me to that name, I’d ap­pre­ci­ate it.