Ajeya Cotra comments on The case for aligning narrowly superhuman models

Ajeya Cotra 7 Mar 2021 5:08 UTC
LW: 1 AF: 1
AF
Yes sorry — I’m aware that in the HCH procedure no one human thinks for a long time. I’m generally used to mentally abstracting HCH (or whatever scheme fits that slot) as something that could “effectively replicate the benefits you could get from having a human thinking a long time,” in terms of the role that it plays in an overall scheme for alignment. This isn’t guaranteed to work out, of course. My position is similar to Rohin’s above:

I just personally find it easier to think about “benefits of a human thinking for a long time” and then “does HCH get the same benefits as humans thinking for a long time” and then “does iterated amplification get the same benefits as HCH”.