Seth Herd comments on Shane Legg interview on alignment

Seth Herd 28 Oct 2023 22:37 UTC
6 points
−5
I think it is about making the models more consequentialist, in the sense of making them smarter and more agentic.

I don’t see evidence that he’s ignoring the hard parts of alignment. And I’m not even sure how optimistic he is, beyond presumably thinking success in a possible.

You could be right in assuming those are his reasons for optimism. That does seem ungenerous, but it could be true. Those definitely aren’t my reasons for my limited optimism. See my linked pieces for those. Language model agents are modestly consequentialist and not automatically corrigible, but they have unique and large advantages for alignment. I’m a bit puzzled at the lack and of enthusiasm for that direction; I’ve only gotten vague criticisms along the lines of “that can’t possibly work because there are ways for it to possibly fail”. The lines of thought those critiques reference just argue that alignment is hard, not that it’s impossible or that a natural language alignment approach couldn’t work. So I’m really hoping to get some more direct engagement with these ideas.
- Roman Leventov 29 Oct 2023 7:35 UTC
  0 points
  −2
  Parent
  He should be rather optimistic because otherwise he probably wouldn’t stay at DeepMind.
  
  I also don’t remember he said much about the problems of misuse, AI proliferation, and Moloch, as well as the issue of choosing the particular ethics for the AGI, so I take this as small indirect evidence for that DeepMind have a plan similar to OpenAI’s “superalignment”, i.e., “we will create a cognitively aligned agent and will task it with solving the rest of societal and civilisational alignment and coordination issues”.
  - Seth Herd 29 Oct 2023 18:17 UTC
    4 points
    0
    Parent
    You could be right, but I didn’t hear any hints that he intends to kick those problems down the road to an aligned agent. That’s Conjecture’s CoEm plan, but I read OpenAIs Superalignment plan as even more vague: make AI better so it can help with alignment, prior to being AGI. Theirs was sort of a plan to create a plan. I like Shane’s better, in part because it’s closer to being an actual plan.
    
    He did explicitly note that choosing the particular ethics for the AGI is an outstanding problem, but I don’t think he proposed solutions, either AI or human. I corrigibility as the central value gives as much time to solve the outer alignment problem as you want (a “long contemplation”), after the inner alignment problem is solved, but I have no idea if his thinking is similar.
    
    I also don’t think he addressed misuse, proliferation, or competition. I can think of multiple reasons for keeping them offstage, but I suspect they just didn’t happen to make the top priority list for this relatively short interview.