Thane Ruthenis comments on Current AIs Provide Nearly No Data Relevant to AGI Alignment

Thane Ruthenis 16 Dec 2023 5:13 UTC
LW: 14 AF: 7
−1
AF
On my inside model of how cognition works, I don’t think “able to automate all research but can’t do consequentialist reasoning” is a coherent property that a system could have. That is a strong claim, yes, but I am making it.
I agree that it is conceivable that LLMs embedded in CoT-style setups would be able to be transformative in some manner without “taking off”. Indeed, I touch on that in the post some: that scaffolded and slightly tweaked LLMs may not be “mere LLMs” as far as capability and safety upper bounds go.
That said, inasmuch as CoT-style setups would be able to turn LLMs into agents/general intelligences, I mostly expect that to be prohibitively computationally intensive, such that we’ll get to AGI by architectural advances before we have enough compute to make a CoT’d LLM take off.
But that’s a hunch based on the obvious stuff like AutoGPT consistently falling plus my private musings regarding how an AGI based on scaffolded LLMs would work (which I won’t share, for obvious reasons). I won’t be totally flabbergasted if some particularly clever way of doing that worked.
- ryan_greenblatt 16 Dec 2023 5:24 UTC
  LW: 20 AF: 11
  12
  AF Parent
  
  On my inside model of how cognition works, I don’t think “able to automate all research but can’t do consequentialist reasoning” is a coherent property that a system could have.
  
  I actually basically agree with this quote.
  
  Note that I said “incapable of doing non-trivial consequentialist reasoning in a forward pass”. The overall llm agent in the hypothetical is absolutely capable of powerful consequentialist reasoning, but it can only do this by reasoning in natural language. I’ll try to clarify this in my comment.
- faul_sname 16 Dec 2023 18:50 UTC
  4 points
  2
  Parent
  How about “able to automate most simple tasks where it has an example of that task being done correctly”? Something like that could make researchers much more productive. Repeat the “the most time consuming part of your workflow now requires effectively none of your time or attention” a few dozen times and that does end up being transformative compared to the state before the series of improvements.
  
  I think “would this technology, in isolation, be transformative” is a trap. It’s easy to imagine “if there was an AI that was better at everything than we do, that would be tranformative”, and then look at the trend line, and notice “hey, if this trend line holds we’ll have AI that is better than us at everything”, and finally “I see lots of proposals for safe AI systems, but none of them safely give us that transformative technology”. But I think what happens between now and when AIs that are better than humans-in-2023 at everything matters.
  - Thane Ruthenis 16 Dec 2023 18:59 UTC
    10 points
    2
    Parent
    I’m not particularly concerned about AI being “transformative” or not. I’m concerned about AGI going rogue and killing everyone. And LLMs automatic workflow is great and not (by itself) omnicidal at all, so that’s… fine?
    But I think what happens between now and when AIs that are better than humans-in-2023 at everything matters.
    As in, AIs boosting human productivity might/should let us figure out how to make stuff safe as it comes up, so no need to be concerned about us not having a solution to the endpoint of that process before we’ve made the first steps?
    The problem is that boosts to human productivity also boost the speed at which we’re getting to that endpoint, and there’s no reason to think they differentially improve our ability to make things safe. So all that would do is accelerate us harder as we’re flying towards the wall at a lethal speed.
    - faul_sname 16 Dec 2023 19:10 UTC
      5 points
      2
      Parent
      
      As in, AIs boosting human productivity might/should let us figure out how to make stuff safe as it comes up, so no need to be concerned about us not having a solution to the endpoint of that process before we’ve made the first steps?
      
      I don’t expect it to be helpful to block individually safe steps on this path, though it would probably be wise to figure out what unsafe steps down this path look like concretely (which you’re doing!).
      
      But yeah. I don’t have any particular reason to expect “solve for the end state without dealing with any of the intermediate states” to work. It feels to me like someone starting a chat application and delaying the “obtain customers” step until they support every language, have a chat architecture that could scale up to serve everyone, and have found a moderation scheme that works without human input.
      
      I don’t expect that team to ever ship. If they do ship, I expect their product will not work, because I think many of the problems they encounter in practice will not be the ones they expected to encounter.
- Seth Herd 22 Dec 2023 21:20 UTC
  3 points
  1
  Parent
  Interesting. My own musings regarding how an AGI based on scaffolded LLMs seems like it would not be prohibitively computationally expensive. Expensive, yes, but affordable in large projects.
  
  It seems to me like para-human-level AGI is quite achievable with language model agents, but advancing beyond the human intelligence that created the LLM training set might be much slower. That could be a really good scenario.
  
  The excellent On the future of language models raises that possibility.
  
  You’ve probably seen my Capabilities and alignment of LLM cognitive architectures. I published that because it all of the ideas there seemed pretty obvious. To me those obvious improvements (a bit of work on episodic memory and executive function) lead to AGI with just maybe 10x more LLM calls than vanilla prompting (varying with problem/plan complexity of course). I’ve got a little more thinking beyond that which I’m not sure I should publish.