[Question] Are there substantial research efforts towards aligning narrow AIs?

I am certainly not an expert, but what I have read in the field of AI alignment is almost solely targeted at handling the question of how to make sure that a general AI does not start optimizing for outcomes that humans do not want to happen.

This is, to me, an obviously important problem, in that it seems like we only get one chance at it and if we fail we are all Doomed.

However, it also seems like a very difficult problem to get good feedback on since there are no AGIs to test different strategies on, which can make estimating the promise of different avenues of research very difficult.

A related problem in which real world feedback can be more readily attained might be alignment of narrow AIs. It seems to me that existing AIs already demonstrate properties that appear to be misaligned with the intentions of their programmers. For example, websites such as Facebook and YouTube employ algorithms to personalize the recommended content a user sees to keep them engaged and thus generating the companies more ad revenue. On the face of it, there seems to be nothing wrong with optimizing to show people what they want to see. Unfortunately, a very effective strategy for accomplishing this goal in many cases appears to be showing people extremely misleading or fabricated stories about reality which both support what the person already believes and creates a large emotional response. In this way these platforms support the spread of misinformation and political polarization, which was almost certainly not the intent of the programmers who probably imagined themselves to be creating a service that would be uniquely tailored to be enjoyable for everyone.

This problem does not seem to be something that can be simply patched, much in the way deficient friendly AI designs do not seem patchable. It seems, at least when I try to consider a possible solution, impossible, though not as impossible an impossibility as aligned AGI. The key immediate problems in my mind are “how do you decide what counts as misinformation in a relatively unbiased manner?” and “if you don’t fulfill people’s desire for confirmation of their poorly supported beliefs, how do you make your platform as attractive as a competitor who will fill that confirmation bias itch?”

Tackling the challenge of “build an AI system that recommends content to users without spreading misinformation” or something similar seems as though it could provide a reasonably relevant test for ideas for outer alignment of AI systems. There are clearly a number of relevant differences between this kind of test and actual AGI, but the fact that fields without concrete ways to test their ideas have historically not done well seems a very important consideration.

So are there groups doing this sort of research, and if not is there a good reason why such research would not be particularly useful for the AI alignment field?

No comments.