I think the issue is exacerbated by the way that when people post about alignment, they often have a detailed AGI design in their mind, and they are talking about alignment issues with that AGI design. But the AGI design isn’t described in much detail or at all. And over the last two decades the AGI designs that people have had in mind have varied wildly, and many of them have been pretty silly.
I agree with this and don’t mind saying for future reference that my current AGI model is in fact a traditional RL agent with a planner and a policy where the policy is some LLM-like foundation model and the planner is something MCTS-like over ReAct-like blocks. The agent rewards itself by taking motor actions and then checking whether the action succeeded with evaluation actions that return a boolean result to assess subgoal completion.
I agree with this and don’t mind saying for future reference that my current AGI model is in fact a traditional RL agent with a planner and a policy where the policy is some LLM-like foundation model and the planner is something MCTS-like over ReAct-like blocks. The agent rewards itself by taking motor actions and then checking whether the action succeeded with evaluation actions that return a boolean result to assess subgoal completion.
So, MuZero but with LLMs basically.