Possible responses to discovering a possible infohazard:
Tell everybody
Tell nobody
Follow a responsible disclosure process.
If you have discovered an apparent solution to corrigibility then my prior is:
90%: It’s not actually a solution.
9%: Someone else will discover the solution before AGI is created.
0.9%: Someone else has already discovered the same solution.
0.1%: This is known to you alone and you can keep it secret until AGI.
Given those priors, I recommend responsible disclosure to a group of your choosing. I suggest a group which:
if applicable, is the research group you already belong to (if you don’t trust them with research results, you shouldn’t be researching with them)
can accurately determine if it is a real solution (helps in the 90% case)
you would like to give more influence over the future (helps in all other cases)
will reward you for the disclosure (only fair)
Then if it’s not assessed to be a real solution, you publish it. If it is a real solution then coordinate next steps with the group, but by default publish it after some reasonable delay.
Possible responses to discovering a possible infohazard:
Tell everybody
Tell nobody
Follow a responsible disclosure process.
If you have discovered an apparent solution to corrigibility then my prior is:
90%: It’s not actually a solution.
9%: Someone else will discover the solution before AGI is created.
0.9%: Someone else has already discovered the same solution.
0.1%: This is known to you alone and you can keep it secret until AGI.
Given those priors, I recommend responsible disclosure to a group of your choosing. I suggest a group which:
if applicable, is the research group you already belong to (if you don’t trust them with research results, you shouldn’t be researching with them)
can accurately determine if it is a real solution (helps in the 90% case)
you would like to give more influence over the future (helps in all other cases)
will reward you for the disclosure (only fair)
Then if it’s not assessed to be a real solution, you publish it. If it is a real solution then coordinate next steps with the group, but by default publish it after some reasonable delay.
Inspired by @MadHatter’s Mental Model of Infohazards: