The agents at the top of most theoretical infinite bureaucracies should be thought of as already superhumanly capable and aligned, not weak language models, because the way IDA works iteratively retrains models on output of bureaucracy, so that agents at higher levels of the theoretical infinite bureaucracy are stronger (from later amplification/distillation epochs) than those at lower levels. It doesn’t matter if an infinite bureaucracy instantiated for a certain agent fails to solve important problems, as long as the next epoch does better.
For HCH specifically, this is normally intended to apply to the HCHs, not to humans in it, but then the abstraction of humans being actual humans (exact imitations) leaks, and we start expecting something other than actual humans there. If this is allowed, if something less capable/aligned than humans can appear in HCH, then by the same token these agents should improve with IDA epochs (perhaps not of HCH, but of other bureaucracies) and those “humans” at the top of an infinite HCH should be much better than the starting point, assuming the epochs improve things.
On the arbitrarily big bureaucracy, the real reason it works is because by assumption, we can always add more agents, and thus we can simulate any Turing-complete system. Once that’s removed as an assumption, the next question is: Is distillation cheap?
If it is, such that I can distill hundreds or thousands of layers, it’s ludicrously easy to solve the alignment problem, even with pessimistic views on AI bureaucracies/debate.
If I can distill 25-100 layers, it’s still likely to be able to solve the alignment problem, albeit at that lower end I’ll probably disagree with John Wentworth on how optimistic you should be on bureaucracies/debate for solving alignment.
Below 20-25 layers, John Wentworth’s intuition will probably disagree with me on how useful AI bureaucracies/debate for solving alignment. Specifically he’d almost certainly think that such a bureaucracy couldn’t work at all compared to independent researchers. I view AI and human bureaucracies as sufficently disanalogous such that the problems of human bureaucracies isn’t likely to hold. My take is with just 20 distillation layers, you’d have a fair chance of solving the whole problem, and to contribute to AI Alignment usefully, only 10 layers are necessary.
The agents at the top of most theoretical infinite bureaucracies should be thought of as already superhumanly capable and aligned, not weak language models, because the way IDA works iteratively retrains models on output of bureaucracy, so that agents at higher levels of the theoretical infinite bureaucracy are stronger (from later amplification/distillation epochs) than those at lower levels. It doesn’t matter if an infinite bureaucracy instantiated for a certain agent fails to solve important problems, as long as the next epoch does better.
For HCH specifically, this is normally intended to apply to the HCHs, not to humans in it, but then the abstraction of humans being actual humans (exact imitations) leaks, and we start expecting something other than actual humans there. If this is allowed, if something less capable/aligned than humans can appear in HCH, then by the same token these agents should improve with IDA epochs (perhaps not of HCH, but of other bureaucracies) and those “humans” at the top of an infinite HCH should be much better than the starting point, assuming the epochs improve things.
On the arbitrarily big bureaucracy, the real reason it works is because by assumption, we can always add more agents, and thus we can simulate any Turing-complete system. Once that’s removed as an assumption, the next question is: Is distillation cheap?
If it is, such that I can distill hundreds or thousands of layers, it’s ludicrously easy to solve the alignment problem, even with pessimistic views on AI bureaucracies/debate.
If I can distill 25-100 layers, it’s still likely to be able to solve the alignment problem, albeit at that lower end I’ll probably disagree with John Wentworth on how optimistic you should be on bureaucracies/debate for solving alignment.
Below 20-25 layers, John Wentworth’s intuition will probably disagree with me on how useful AI bureaucracies/debate for solving alignment. Specifically he’d almost certainly think that such a bureaucracy couldn’t work at all compared to independent researchers. I view AI and human bureaucracies as sufficently disanalogous such that the problems of human bureaucracies isn’t likely to hold. My take is with just 20 distillation layers, you’d have a fair chance of solving the whole problem, and to contribute to AI Alignment usefully, only 10 layers are necessary.