achieving superhuman capability on a task, achieving causal fidelity of transcripts, achieving human-readability of the transcripts. Choose two
I think we eventually want superhuman capabilities, but I don’t think it’s required in the near term and in particular it’s not required to do a huge amount of AI safety research. So if we can choose the last two, and get a safe human-level AI system that way, I think it might be a good improvement over the status quo.
(The situation where labs chose not to/ are forbidden to pursue superhuman capabilities—even though they could—is scary, but doesn’t seem impossible.)
I think we eventually want superhuman capabilities, but I don’t think it’s required in the near term and in particular it’s not required to do a huge amount of AI safety research. So if we can choose the last two, and get a safe human-level AI system that way, I think it might be a good improvement over the status quo.
(The situation where labs chose not to/ are forbidden to pursue superhuman capabilities—even though they could—is scary, but doesn’t seem impossible.)