I suspect a good introduction to alignment should only take a few paragraphs to be understandable to almost anyone and be robust against incorrect counterarguments
I think this is empirically not the case and I think some simple modeling of the relationship between concept complexity and the number of nearby confused interpretations should suggest that this is not reasonable to expect.
I agree that most short intros have that problem. as they say, seek simplicity, and distrust it. a short explanation that works for most humans would be relying on concepts they already have. I suspect that that’s possible. it would need to minimize analogizing, despite that doing so wouldn’t get analogizing very low; the core structure of the argument would be literally true, and the analogical part would be in describing what kind of things do the bad thing, saying “here are a bunch of things that have a reliably bad pattern in history. we expect this to be another one of those. here’s the pattern they have. the problem here is basically just that, for the first time, we’re in the wrong part of this pattern, the one that always loses.” comparison to competing species, wars, power contests. some people will still not have experience with the comparison points and as such not understand, but it’s fairly common to have experience; farmers and militaries seem most likely to get it easily.
I think this is empirically not the case and I think some simple modeling of the relationship between concept complexity and the number of nearby confused interpretations should suggest that this is not reasonable to expect.
I agree that most short intros have that problem. as they say, seek simplicity, and distrust it. a short explanation that works for most humans would be relying on concepts they already have. I suspect that that’s possible. it would need to minimize analogizing, despite that doing so wouldn’t get analogizing very low; the core structure of the argument would be literally true, and the analogical part would be in describing what kind of things do the bad thing, saying “here are a bunch of things that have a reliably bad pattern in history. we expect this to be another one of those. here’s the pattern they have. the problem here is basically just that, for the first time, we’re in the wrong part of this pattern, the one that always loses.” comparison to competing species, wars, power contests. some people will still not have experience with the comparison points and as such not understand, but it’s fairly common to have experience; farmers and militaries seem most likely to get it easily.