What Does It Mean to Align AI With Human Values?

Algon13 Dec 2022 16:56 UTC

8 points

The author has some weird misunderstandings about what AI-will-kill-everyone-ism advocates belive, but seems to have a weirdly^[1] decent grasp of the problem, given their aforementioned misunderstandings. They argue IRL won’t be enough^[2]. Here’s the interesting quote IMO:

It should be clear that an essential first step toward teaching machines ethical concepts is to enable machines to grasp humanlike concepts in the first place, which I have argued is still AI’s most important open problem.

An example of a weird misunderstanding:

Moreover, I see an even more fundamental problem with the science underlying notions of AI alignment. Most discussions imagine a superintelligent AI as a machine that, while surpassing humans in all cognitive tasks, still lacks humanlike common sense and remains oddly mechanical in nature. And importantly, in keeping with Bostrom’s orthogonality thesis, the machine has achieved superintelligence without having any of its own goals or values, instead waiting for goals to be inserted by humans.Moreover, I see an even more fundamental problem with the science underlying notions of AI alignment. Most discussions imagine a superintelligent AI as a machine that, while surpassing humans in all cognitive tasks, still lacks humanlike common sense and remains oddly mechanical in nature. And importantly, in keeping with Bostrom’s orthogonality thesis, the machine has achieved superintelligence without having any of its own goals or values, instead waiting for goals to be inserted by humans.

^
By “weird” I mean “odd for the class of people writing pop sci articles on AI alignment whilst not being within the field”.
^
For some reason they think “Many in the alignment community think the most promising path forward is a machine learning technique known as inverse reinforcement learning.” Perhaps they’re making a bucket error and lumping in CIRL with IRL?

Algon13 Dec 2022 16:56 UTC

8 points

3 comments1 min readLW link

AI Human Values

the gears to ascension 13 Dec 2022 18:40 UTC
1 point
0
edit: WHOOPS WRONG MMITCHELL

mmitchell is a near term safety researcher doing what I view as great work. I think a lot of the miscommunications and odd mislabelings coming from her side of the AI safety/alignment field are because she doesn’t see herself as in it, and yet is doing work fundamentally within what I see as the field. So her criticisms of other parts of the field include labeling those as not her field, leading to labeling confusions. but she’s still doing good work on short-term impact safety imo.

I think she doesn’t quite see the path to AI killing everyone herself yet, if I understand from a distance? not sure about that one.
- Algon 13 Dec 2022 19:46 UTC
  1 point
  0
  Parent
  What’s their most important contribution? I’m wondering whether to read her papers, and I’m undecided after reading a couple of abstracts.
  - the gears to ascension 13 Dec 2022 20:42 UTC
    3 points
    0
    Parent
    as far as I’m aware, the biggest contribution to the safety field as a whole is mainly improved datasets, which is recent. The sort of stuff that doesn’t get prioritized in existential ai safety because it’s too short-term and might aid capabilities. In general, I’d recommend reading her papers’ abstracts, but wouldn’t recommend pushing past an abstract you find uninteresting.
    
    this one is probably the most interesting to me: https://arxiv.org/abs/2212.05129
    
    OH WAIT CRAP THERE ARE TWO MMITCHELLS AND I MEANT THE OTHER ONE. well, uh, anyway, have a link to the other mmitchell’s paper that seems cool.
    
    OP mmitchell also does seem pretty cool, but maybe not quite as close to safety/alignment—her work seems to be focused on adversarial examples: https://melaniemitchell.me/ & https://arxiv.org/abs/2210.13966