Ok this is going to be messy but let me try to convey my hunch for why randomization doesn’t seem very useful.
- Say I have an intervention that’s helpful, and has a baseline 1⁄4 probability. If I condition on this statement, I get 1 “unit of helpfulness”, and a 4x update towards manipulative AGI. - Now let’s say I have four interventions like the one above, and I pick one at random. p(O | manipulative) = 1⁄4, which is the same as baseline, so I get one unit of helpfulness and no update towards manipulative AGI! - BUT, the four interventions have to be mutually exclusive. Which means that if I’d done no simulation at all, I would’ve gotten my one unit of helpfulness anyway, since the four interventions cover all possible outcomes. - Ok, well, what if my four interventions 1⁄8 baseline probability each, so only 50% total. Then I pick one at random, p(O | natural) = 1⁄8, p(O | manipulative) = 1⁄4, so I get a 2x update towards manipulative AGI. This is the same as if I’d just conditioned on the statement “one of my four interventions happens”, and let the randomization happen inside the simulation instead of outside. The total probability of that is 50%, so I get my one unit of helpfulness, at the cost of a 2x update.
Maybe the core thing here is a consequence of framing our conditions as giving us bits of search to get lottery outcomes that we like. Rolling the dice to determine what to condition on isn’t doing anything different from just using a weaker search condition—it gives up bits of search, and so it has to pay less.
I think so. But I’d want to sit down and prove something more rigorously before abandoning the strategy, because there may be times we can get value for free in situations more complicated than this toy example.
Ok this is going to be messy but let me try to convey my hunch for why randomization doesn’t seem very useful.
- Say I have an intervention that’s helpful, and has a baseline 1⁄4 probability. If I condition on this statement, I get 1 “unit of helpfulness”, and a 4x update towards manipulative AGI.
- Now let’s say I have four interventions like the one above, and I pick one at random. p(O | manipulative) = 1⁄4, which is the same as baseline, so I get one unit of helpfulness and no update towards manipulative AGI!
- BUT, the four interventions have to be mutually exclusive. Which means that if I’d done no simulation at all, I would’ve gotten my one unit of helpfulness anyway, since the four interventions cover all possible outcomes.
- Ok, well, what if my four interventions 1⁄8 baseline probability each, so only 50% total. Then I pick one at random, p(O | natural) = 1⁄8, p(O | manipulative) = 1⁄4, so I get a 2x update towards manipulative AGI. This is the same as if I’d just conditioned on the statement “one of my four interventions happens”, and let the randomization happen inside the simulation instead of outside. The total probability of that is 50%, so I get my one unit of helpfulness, at the cost of a 2x update.
Maybe the core thing here is a consequence of framing our conditions as giving us bits of search to get lottery outcomes that we like. Rolling the dice to determine what to condition on isn’t doing anything different from just using a weaker search condition—it gives up bits of search, and so it has to pay less.
Got it, that’s very clear. Thanks!
So this point reduces to “we want our X:1 update to be as mild as possible, so use the least-specific condition that accomplishes the goal”.
I think so. But I’d want to sit down and prove something more rigorously before abandoning the strategy, because there may be times we can get value for free in situations more complicated than this toy example.