Rohin Shah comments on The case for aligning narrowly superhuman models

Rohin Shah 7 Mar 2021 16:52 UTC
LW: 5 AF: 5
AF
I agree with the other responses from Ajeya / Paul / Raemon, but to add some more info:
Where did this idea of HCH yielding the same benefits as a human thinking for a long time come from???
… I don’t really know. My guess is that I picked it up from reading giant comment threads between Paul and other people.
I don’t see any reason at all to expect it to do anything remotely similar to that.
Tbc it doesn’t need to be literally true. The argument needed for safety is something like “a large team of copies of non-expert agents could together be as capable as an expert”. I see the argument “it’s probably possible for a team of agents to mimic one agent thinking for a long time” as mostly an intuition pump for why that might be true.
- johnswentworth 7 Mar 2021 17:33 UTC
  LW: 5 AF: 4
  AF Parent
  “As capable as an expert” makes more sense. Part of what’s confusing about “equivalent to a human thinking for a long time” is that it’s picking out one very particular way of achieving high capability, but really it’s trying to point to a more-general notion of “HCH can solve lots of problems well”. Makes it sound like there’s some structural equivalence to a human thinking for a long time, which there isn’t.
  - Rohin Shah 7 Mar 2021 18:46 UTC
    LW: 6 AF: 5
    AF Parent
    Makes it sound like there’s some structural equivalence to a human thinking for a long time, which there isn’t.
    Yes, I explicitly agree with this, which is why the first thing in my previous response was
    sorry, that’s right, I was speaking pretty loosely.