What Does It Mean to Align AI With Human Values?

Link post

The author has some weird misunderstandings about what AI-will-kill-everyone-ism advocates belive, but seems to have a weirdly[1] decent grasp of the problem, given their aforementioned misunderstandings. They argue IRL won’t be enough[2]. Here’s the interesting quote IMO:

It should be clear that an essential first step toward teaching machines ethical concepts is to enable machines to grasp humanlike concepts in the first place, which I have argued is still AI’s most important open problem.

An example of a weird misunderstanding:

Moreover, I see an even more fundamental problem with the science underlying notions of AI alignment. Most discussions imagine a superintelligent AI as a machine that, while surpassing humans in all cognitive tasks, still lacks humanlike common sense and remains oddly mechanical in nature. And importantly, in keeping with Bostrom’s orthogonality thesis, the machine has achieved superintelligence without having any of its own goals or values, instead waiting for goals to be inserted by humans.Moreover, I see an even more fundamental problem with the science underlying notions of AI alignment. Most discussions imagine a superintelligent AI as a machine that, while surpassing humans in all cognitive tasks, still lacks humanlike common sense and remains oddly mechanical in nature. And importantly, in keeping with Bostrom’s orthogonality thesis, the machine has achieved superintelligence without having any of its own goals or values, instead waiting for goals to be inserted by humans.

  1. ^

    By “weird” I mean “odd for the class of people writing pop sci articles on AI alignment whilst not being within the field”.

  2. ^

    For some reason they think “Many in the alignment community think the most promising path forward is a machine learning technique known as inverse reinforcement learning.” Perhaps they’re making a bucket error and lumping in CIRL with IRL?