Shower thought: there might be an employee at OpenAI, Anthropic, etc who is saving humanity every day by withholding an idea they have that would create an unaligned AGI. They might literally have the technical breakthrough idea that would doom the future and they are keeping it quiet. Probably not likely, but weird to think that is even a possibility.
I don’t know about a dramatic, single breakthrough idea, but it seems pretty clear to me that making differential progress on safety over capabilities requires holding ones tongue in this manner.
One can do better—become an ineffective capabilities researcher! Work at an AGI lab but proceed to write slow kernels, implement fault-prone networking algorithms, invent new architectures with suboptimal scaling laws, etc. Save the world one bug at a time. </jk>
Shower thought: there might be an employee at OpenAI, Anthropic, etc who is saving humanity every day by withholding an idea they have that would create an unaligned AGI. They might literally have the technical breakthrough idea that would doom the future and they are keeping it quiet. Probably not likely, but weird to think that is even a possibility.
Really they would be “choosing to not ending the world” every day, which is probably a much more routine occurrence than actively saving it.
I don’t know about a dramatic, single breakthrough idea, but it seems pretty clear to me that making differential progress on safety over capabilities requires holding ones tongue in this manner.
One can do better—become an ineffective capabilities researcher! Work at an AGI lab but proceed to write slow kernels, implement fault-prone networking algorithms, invent new architectures with suboptimal scaling laws, etc. Save the world one bug at a time.
</jk>