Fabien Roger comments on By Default, GPTs Think In Plain Sight

Fabien Roger 12 Jan 2024 18:06 UTC
LW: 2 AF: 2
1
AF
achieving superhuman capability on a task, achieving causal fidelity of transcripts, achieving human-readability of the transcripts. Choose two
I think we eventually want superhuman capabilities, but I don’t think it’s required in the near term and in particular it’s not required to do a huge amount of AI safety research. So if we can choose the last two, and get a safe human-level AI system that way, I think it might be a good improvement over the status quo.
(The situation where labs chose not to/ are forbidden to pursue superhuman capabilities—even though they could—is scary, but doesn’t seem impossible.)