Seems like you can get pretty far by just having current opus 4.6 Claude code run for a week. Only problem is that this is prohibitively expensive.
My impression is that running something like Deepseek for a week straight doesn’t really get you much?
If inference costs per model are declining somewhere between 3x-10x+ per year this alone will get economical quite soon. What projects do you have up your sleeve for when this is viable?
My personal pet project I want to try this method on is preventing all of us from dying from misaligned AGI. ;) I want to try next-gen systems for deconfusion and conceptual clarification in the relevant domains.
I think even with scaffolding for more careful reasoning, Opus 4.6 probably isn’t quite smart or truth-seeking enough to do this as well as a smart human. But I’m not sure. I think it can be made smarter by instructing Claude Code (or Codex) it to use a reasoning process more like a human would when doing a long-term research project to clarify concepts in a complex domain. This is one way in which Human-like metacognitive skills will reduce LLM slop and aid alignment and capabilities. I doubt this will be enough on its own, but in combination with next-generation systems with somewhat better metacognition from training, it might help.
My goal would be to have a pretty straightforward set of prompts that’s obviously truth-seeking, so that if anyone runs it even with prompting for assumptions hostile AGI x-risk, the system comes back with “based on the conceptual uncertainties, humans should try to slow down AI progress and work harder on alignment if at all possible”.
The other target would be conceptual clarifications on exactly how much and what sorts of alignment we’re likely to need to survive.
Of course this path includes the risk of The Median Doom-Path: Slop, not Schemingl as Wentworth puts it: we use AI for conceptual alignment research and it helps confuse us. But this seems inevitable, so having independent researchers trying to make this go better seems like a good idea.
Seems like you can get pretty far by just having current opus 4.6 Claude code run for a week. Only problem is that this is prohibitively expensive.
My impression is that running something like Deepseek for a week straight doesn’t really get you much?
If inference costs per model are declining somewhere between 3x-10x+ per year this alone will get economical quite soon. What projects do you have up your sleeve for when this is viable?
My personal pet project I want to try this method on is preventing all of us from dying from misaligned AGI. ;) I want to try next-gen systems for deconfusion and conceptual clarification in the relevant domains.
I think even with scaffolding for more careful reasoning, Opus 4.6 probably isn’t quite smart or truth-seeking enough to do this as well as a smart human. But I’m not sure. I think it can be made smarter by instructing Claude Code (or Codex) it to use a reasoning process more like a human would when doing a long-term research project to clarify concepts in a complex domain. This is one way in which Human-like metacognitive skills will reduce LLM slop and aid alignment and capabilities. I doubt this will be enough on its own, but in combination with next-generation systems with somewhat better metacognition from training, it might help.
My goal would be to have a pretty straightforward set of prompts that’s obviously truth-seeking, so that if anyone runs it even with prompting for assumptions hostile AGI x-risk, the system comes back with “based on the conceptual uncertainties, humans should try to slow down AI progress and work harder on alignment if at all possible”.
The other target would be conceptual clarifications on exactly how much and what sorts of alignment we’re likely to need to survive.
Of course this path includes the risk of The Median Doom-Path: Slop, not Schemingl as Wentworth puts it: we use AI for conceptual alignment research and it helps confuse us. But this seems inevitable, so having independent researchers trying to make this go better seems like a good idea.