Additionally, I think that there are ways to misunderstand the IDA approach that leave out significant parts of the complexity (ie. IDA based off of humans thinking for a day with unrestricted input, without doing the hard work of trying to understand corrigibility and meta-philosophy beforehand)
I guess this is in part because that’s how Paul initially described his approach, before coming up with Security Amplification in October 2016. For example in March 2016 I wrote “First, I’m assuming it’s reasonable to think of A1 as a human upload that is limited to one day of subjective time, by the end of which it must have written down any thoughts it wants to save, and be reset. Let me know if this is wrong.” and Paul didn’t object to this in his reply.
An additional issue is that even people who intellectually understand the new model might still have intuitions left over from the old one. For example I’m just now realizing that the low-amplification agents in the new scheme must have thought processes and “deliberations” that are very alien, since they don’t have human priors, natural language understanding, values, common sense judgment, etc. I wish Paul had written a post in big letters that said, “WARNING: Throw out all your old intuitions!”
I guess this is in part because that’s how Paul initially described his approach, before coming up with Security Amplification in October 2016. For example in March 2016 I wrote “First, I’m assuming it’s reasonable to think of A1 as a human upload that is limited to one day of subjective time, by the end of which it must have written down any thoughts it wants to save, and be reset. Let me know if this is wrong.” and Paul didn’t object to this in his reply.
An additional issue is that even people who intellectually understand the new model might still have intuitions left over from the old one. For example I’m just now realizing that the low-amplification agents in the new scheme must have thought processes and “deliberations” that are very alien, since they don’t have human priors, natural language understanding, values, common sense judgment, etc. I wish Paul had written a post in big letters that said, “WARNING: Throw out all your old intuitions!”