reallyeli comments on The behavioral selection model for predicting AI motivations

reallyeli 14 Dec 2025 20:17 UTC
1 point
0
More speculative thoughts:
- Perhaps we end up in a regime where this is possible, but expensive enough that we can only afford a bit of it.
- If that’s the case, the more sample-efficient the RL is, the better off we are — since as the total number of RL rewards we need decreases, we can afford to invest more in the quality of each one.
Even more speculative thoughts:
- As an extreme example, you could imagine an AI that’s as sample-efficient as a human child. Suppose an AI company had a virtual copy of von Neumann’s brain age 5, which they could run at 10x speed, and had to raise him to become an intent-aligned adult.
- The resources at their disposal would in principle enable a lot of oversight and thoughtful parenting and teaching, so in principle they could afford to do pretty well at this task for a while. It’s like a school with a 100-1 teacher-student ratio. They might fail in practice, of course.
- Once his outputs become sufficiently hard to check, they might have a lot more difficulty, e.g. maybe it’s hard to tell whether his advice on the Manhattan Project is subtly wrong in a way designed to sabotage the project.