Ajeya Cotra comments on The case for aligning narrowly superhuman models

Ajeya Cotra 10 Mar 2021 18:28 UTC
LW: 1 AF: 1
AF
I don’t feel confident enough in the frame of “inaccessible information” to say that the whole agenda is about it. It feels like a fit for “advice”, but not a fit for “writing stories” or “solving programming puzzles” (at least not an intuitive fit—you could frame it as “the model has inaccessible information about [story-writing, programming]” but it feels more awkward to me). I do agree it’s about “strongly suspecting it has the potential to do better than humans” rather than about “already being better than humans.” Basically, it’s about trying to find areas where lackluster performance seems to mostly be about “misalignment” rather than “capabilities” (recognizing those are both fuzzy terms).
- abramdemski 10 Mar 2021 19:19 UTC
  LW: 2 AF: 2
  AF Parent
  
  Basically, it’s about trying to find areas where lackluster performance seems to mostly be about “misalignment” rather than “capabilities” (recognizing those are both fuzzy terms).
  
  Right, ok, I like that framing better (it obviously fits, but I didn’t generate it as a description before).