Jeremy Gillen comments on JDP Reviews IABIED

Jeremy Gillen 19 Sep 2025 6:48 UTC
12 points
0
I think the alienness of the minds involved is a total misnomer, they could be very hominid-like and it wouldn’t matter much if they include superintelligent planners doing argmax(p(problem_solved)).
Yeah I agree with this. Although I think focusing on argmax confused a lot of people (including me) and I’m glad they didn’t do that in the book. When I was new to the community, I thought that implementing soft optimization would solve the main problems. I didn’t grok how large the reflective instability and pointer problems were.
I honestly just remember a lot of absurd posts spending their time thinking about daemons in the weights which were based on a model of gradient descent as being evolution-like in ways which it is not and the absurdity of said posts absolutely contributed to the alignment winter by giving people the impression that they’re blocked on impossible seeming problems that don’t actually exist and then focusing their attention somewhere else.
Yeah I agree that this happened. But if there was a retcon, then it would be in RFLO, not in the book, because RFLO defined mesaoptimization in a way that doesn’t match “daemons in the weights”. I think what happened was maybe closer to “lots of wild speculation about weird ways that overpowered optimizers might go wrong”, which, as people became less confused, was consolidated into something much more reasonable and less wild (which was RFLO). But then lots of people mentally attached the word mesaoptimizer to older ideas.
I think the issue is exacerbated by the way that when people post about alignment, they often have a detailed AGI design in their mind, and they are talking about alignment issues with that AGI design. But the AGI design isn’t described in much detail or at all. And over the last two decades the AGI designs that people have had in mind have varied wildly, and many of them have been pretty silly.
- jdp 19 Sep 2025 7:40 UTC
  7 points
  0
  Parent
  
  I think the issue is exacerbated by the way that when people post about alignment, they often have a detailed AGI design in their mind, and they are talking about alignment issues with that AGI design. But the AGI design isn’t described in much detail or at all. And over the last two decades the AGI designs that people have had in mind have varied wildly, and many of them have been pretty silly.
  
  I agree with this and don’t mind saying for future reference that my current AGI model is in fact a traditional RL agent with a planner and a policy where the policy is some LLM-like foundation model and the planner is something MCTS-like over ReAct-like blocks. The agent rewards itself by taking motor actions and then checking whether the action succeeded with evaluation actions that return a boolean result to assess subgoal completion.
  
  So, MuZero but with LLMs basically.