Daniel Kokotajlo comments on Ngo and Yudkowsky on alignment difficulty

Daniel Kokotajlo 17 Nov 2021 11:25 UTC
LW: 18 AF: 12
0
AF
[Notes mostly to myself, not important, feel free to skip]
My hot take overall is that Yudkowsky is basically right but doing a poor job of arguing for the position. Ngo is very patient and understanding.
“it doesn’t seem implausible to me that we build AIs that are significantly more intelligent (in the sense of being able to understand the world) than humans, but significantly less agentic.”—Ngo
“It is likely that, before the point where AGIs are strongly superhuman at seeking power, they will already be strongly superhuman at understanding the world, and at performing narrower pivotal acts like alignment research which don’t require as much agency (by which I roughly mean: large-scale motivations and the ability to pursue them over long timeframes).”—Ngo
“So it is legit harder to point out “the consequentialist parts of the cat” by looking for which sections of neurology are doing searches right there. That said, to the extent that the visual cortex does not get tweaked on failure to catch a mouse, it’s not part of that consequentialist loop either.”—Yudkowsky
“But the answer is that some problems are difficult in that they require solving lots of subproblems, and an easy way to solve all those subproblems is to use patterns which collectively have some coherence and overlap, and the coherence within them generalizes across all the subproblems. Lots of search orderings will stumble across something like that before they stumble across separate solutions for lots of different problems.”—Yudkowsky
This is really making me want to keep working on my+Ramana’s sequence on agency! :)
[Ngo][14:12]
Great
Okay, so one claim is that something like deontology is a fairly natural way for minds to operate.
[Yudkowsky][14:14]
(“If that were true,” he thought at once, “bureaucracies and books of regulations would be a lot more efficient than they are in real life.”)
I think I disagree with Yudkowsky here? I almost want to say “the opposite is true; if people were all innately consequentialist then we wouldn’t have so many blankfaces and bureaucracies would be a lot better because the rules would just be helpful guidelines.” Or “Sure but books of regulations work surprisingly well, well enough that there’s gotta be some innate deontology in humans.” Or “Have you conversed with normal humans about ethics recently? If they are consequentialists they are terrible at it.”
As such, on the Eliezer view as I understand it, we can see ourselves as asking for a very unnatural sort of object: a path-through-the-future that is robust enough to funnel history into a narrow band in a very wide array of circumstances, but somehow insensitive to specific breeds of human-initiated attempts to switch which narrow band it’s pointed towards.
I think this is a great paragraph. It’s a concise and reasonably accurate description of (an important part of) the problem.
I do think it, and this whole discussion, focuses too much on plans and not enough on agents. It’s good for illustrating how the problem arises even in a context where we have some sort of oracle that gives us a plan and then we carry it out… but realistically our situation will be more dire than that because we’ll be delegating to autonomous AGI agents. :(
- Eliezer Yudkowsky 18 Nov 2021 23:32 UTC
  LW: 20 AF: 9
  0
  AF Parent
  The idea is not that humans are perfect consquentialists, but that they are able to work at all to produce future-steering outputs, insofar as humans actually do work at all, by an inner overlap of the shape of inner parts which has a shape resembling consequentialism, and the resemblance is what does the work. That is, your objection has the same flavor as “But humans aren’t Bayesian! So how can you say that updating on evidence is what’s doing their work of mapmaking?”
  - Daniel Kokotajlo 19 Nov 2021 10:09 UTC
    LW: 6 AF: 3
    0
    AF Parent
    To be clear I think I agree with your overall position. I just don’t think the argument you gave for it (about bureaucracies etc.) was compelling.
- Charlie Steiner 18 Nov 2021 23:13 UTC
  LW: 5 AF: 3
  0
  AF Parent
  Ngo is very patient and understanding.
  Perhaps… too patient and understanding. Richard! Blink twice if you’re being held against your will!
  (I too would like you to write more about agency :P)