I feel like this comment combines my “Bad Argument 1” with “Bad Argument 2”! If it doesn’t, then what am I missing? Or if it does, then do you think one or both of my “Bad Arguments” are not actually bad arguments?
Let’s say I have Secret Knowledge X, and let’s assume (generously!) that this knowledge is correct as opposed to wrong. And let’s say that if I share Secret Knowledge X, it would enable you to figure out Alignment Idea Y. But also assume that Secret Knowledge X is a key ingredient for building AGI.
Your proposal is: I should share Secret Knowledge X so that you can get to work on Alignment Idea Y.
My counter-proposal is: Somebody is going to publish Secret Knowledge X sooner or later on arxiv. And when they do, then you can go right ahead and figure out Alignment Idea Y. And in the meantime, there are plenty of other productive alignment-related things that you can do with your time. I listed some of them in the post.
(Alternatively, maybe nobody will ever publish Secret Knowledge X, but rather it will be discovered at DeepMind and kept secret from competitors. In that case, then someone on the DeepMind Safety Team can figure out Alignment Idea Y. And by the way, I’m super happy that in this scenario, DeepMind can go slower and spend more time on endgame safety, thanks to the fact that Secret Knowledge X has remained secret.)
I feel like this comment combines my “Bad Argument 1” with “Bad Argument 2”! If it doesn’t, then what am I missing? Or if it does, then do you think one or both of my “Bad Arguments” are not actually bad arguments?
Let’s say I have Secret Knowledge X, and let’s assume (generously!) that this knowledge is correct as opposed to wrong. And let’s say that if I share Secret Knowledge X, it would enable you to figure out Alignment Idea Y. But also assume that Secret Knowledge X is a key ingredient for building AGI.
Your proposal is: I should share Secret Knowledge X so that you can get to work on Alignment Idea Y.
My counter-proposal is: Somebody is going to publish Secret Knowledge X sooner or later on arxiv. And when they do, then you can go right ahead and figure out Alignment Idea Y. And in the meantime, there are plenty of other productive alignment-related things that you can do with your time. I listed some of them in the post.
(Alternatively, maybe nobody will ever publish Secret Knowledge X, but rather it will be discovered at DeepMind and kept secret from competitors. In that case, then someone on the DeepMind Safety Team can figure out Alignment Idea Y. And by the way, I’m super happy that in this scenario, DeepMind can go slower and spend more time on endgame safety, thanks to the fact that Secret Knowledge X has remained secret.)