I tend to be fairly skeptical of these challenges—HCH is just a bunch of humans after all and if you can instruct them not to do things like instantiate arbitrary Turing machines, then I think a bunch of humans put together has a strong case for being aligned.
Minor nitpick: I mostly agree, but I feel like a lot of work is being done by saying that they can’t instantiate arbitrary Turing machines, and that it’s just a bunch of humans. Human society is also a bunch of humans, but frequently does things that I can’t imagine any single intelligent person deciding. If your model breaks down for relatively human-human combinations, I think there is a significant risk that true HCH would be dangerous in quite unpredictable ways.
Minor nitpick: I mostly agree, but I feel like a lot of work is being done by saying that they can’t instantiate arbitrary Turing machines, and that it’s just a bunch of humans. Human society is also a bunch of humans, but frequently does things that I can’t imagine any single intelligent person deciding. If your model breaks down for relatively human-human combinations, I think there is a significant risk that true HCH would be dangerous in quite unpredictable ways.