I have 60% probability that you intentionally structured the post to feel like the pattern of how you felt reading the book.
I appreciate this. I haven’t finished the book yet, but my impression is you liked it more than I expect to. I suspect a good introduction to alignment should only take a few paragraphs to be understandable to almost anyone and be robust against incorrect counterarguments, and correctly vulnerable to insightfully correct counterarguments if any exist. But I haven’t figured out how to write that down myself. A good intro is also a good representation for thinking about, imo, which is most of the value I see in it.
I suspect a good introduction to alignment should only take a few paragraphs to be understandable to almost anyone and be robust against incorrect counterarguments
I think this is empirically not the case and I think some simple modeling of the relationship between concept complexity and the number of nearby confused interpretations should suggest that this is not reasonable to expect.
I agree that most short intros have that problem. as they say, seek simplicity, and distrust it. a short explanation that works for most humans would be relying on concepts they already have. I suspect that that’s possible. it would need to minimize analogizing, despite that doing so wouldn’t get analogizing very low; the core structure of the argument would be literally true, and the analogical part would be in describing what kind of things do the bad thing, saying “here are a bunch of things that have a reliably bad pattern in history. we expect this to be another one of those. here’s the pattern they have. the problem here is basically just that, for the first time, we’re in the wrong part of this pattern, the one that always loses.” comparison to competing species, wars, power contests. some people will still not have experience with the comparison points and as such not understand, but it’s fairly common to have experience; farmers and militaries seem most likely to get it easily.
I have 60% probability that you intentionally structured the post to feel like the pattern of how you felt reading the book.
I appreciate this. I haven’t finished the book yet, but my impression is you liked it more than I expect to. I suspect a good introduction to alignment should only take a few paragraphs to be understandable to almost anyone and be robust against incorrect counterarguments, and correctly vulnerable to insightfully correct counterarguments if any exist. But I haven’t figured out how to write that down myself. A good intro is also a good representation for thinking about, imo, which is most of the value I see in it.
I think this is empirically not the case and I think some simple modeling of the relationship between concept complexity and the number of nearby confused interpretations should suggest that this is not reasonable to expect.
I agree that most short intros have that problem. as they say, seek simplicity, and distrust it. a short explanation that works for most humans would be relying on concepts they already have. I suspect that that’s possible. it would need to minimize analogizing, despite that doing so wouldn’t get analogizing very low; the core structure of the argument would be literally true, and the analogical part would be in describing what kind of things do the bad thing, saying “here are a bunch of things that have a reliably bad pattern in history. we expect this to be another one of those. here’s the pattern they have. the problem here is basically just that, for the first time, we’re in the wrong part of this pattern, the one that always loses.” comparison to competing species, wars, power contests. some people will still not have experience with the comparison points and as such not understand, but it’s fairly common to have experience; farmers and militaries seem most likely to get it easily.
I didn’t but I did copy pasta the intro from another post I was writing because it seemed relevant.
I’ll take that bet. 1:1, $100?
I already denied it so.
Yeah, I saw.