Jeremy Gillen comments on Veedrac’s Shortform

Jeremy Gillen 2 May 2025 16:36 UTC
4 points
0
If we can clearly tie the argument for AGI x-risk to agency, I think it won’t have the same problem
Yeah agreed, and it’s really hard to get the implications right here without a long description. In my mind entities didn’t trigger any association with agents, but I can see how it would for others.
This thread helped inspire me to write the brief post Anthropomorphizing AI might be good, actually.
I broadly agree that many people would be better off anthropomorphising future AI systems more. I sometimes push for this in arguments, because in my mind many people have massively overanchored on the particular properties of current LLMs and LLM agents. I’m less a fan of your part of that post that involves accelerating anything.
One could say “well LLMs are already superhuman at some stuff and they don’t seem to have instrumental goals”. And that will become more compelling as LLMs keep getting better in narrow domains.
Yeah, but the line “capable systems necessarily have instrumental goals” helps clarify what you mean by “capable systems”. It must be some definition that (at least plausibly) implies instrumental goals.
Kat Woods’ tweet is an interesting case. I actually think her point is absolutely right as far as it goes
Huh I suspect that the disagreement about that tweet might come from dumb terminology fuzziness. I’m not really sure what she means by “the specification problem” when we’re in the context of generative models trained to imitate. It’s a problem that makes sense in a different context. But the central disagreement is that she thinks current observations (of “alignment behaviour” in particular) are very surprising, which just seems wrong. My response was this:
- Seth Herd 2 May 2025 22:54 UTC
  2 points
  0
  Parent
  Mostly agreed. When suggesting even differential acceleration I should remember to put a big WE SHOULD SHUT IT ALL DOWN just to make sure it’s not taken out of context. And as I said there, I’m far from certain that even that differential acceleration would be useful.
  I agree that Kat Woods is overestimating how optimistic we should be based on LLMs following directions well. I think re-litigating who said what when and what they’d predict is a big mistake since it is both beside the point and tends to strengthen tribal rivalries—which are arguably the largest source of human mistakes. There is an interesting, subtle issue there which I’ve written about in The (partial) fallacy of dumb superintelligence and Goals selected from learned knowledge: an alternative to RL alignment. There are potential ways to leverage LLM’s relatively rich (but imperfect) understanding into AGI that follows someone’s instructions. Creating a “goal slot” based on linguistic instructions is possible. But it’s all pretty complex and uncertain.