I completely appreciate this point of view. Once at Princeton Neuroscience Institute I refused to participate in an experiment that probed a living fly’s brain with an electrode because I couldn’t be sure it wasn’t suffering.
A critical thing to note is that this is category 6.1 research, which means it is research for fundamental science, not applications. This is the same category as NSF grants. So even if we fully share your values here, the conclusion that this work shouldn’t happen doesn’t follow from the premise (unless you think all basic science AI alignment research should be halted entirely, which is a different argument). You should also note that DARPA does fund category 6.2 and 6.3 research which is more application driven, this seedling is not that.
The reasoning in the original comment, while coming from a place of genuine moral seriousness, substitutes moral purity for causal modeling. Advanced AI development continues under competitive pressure regardless of whether alignment researchers participate. Opting out just weakens alignment properties in the systems that get deployed anyway. This is differential technological development where the selection effect runs in exactly the direction we should least want.
There is a world where alignment researchers refuse to touch anything funded by the Department of War. In that world, does the Department of War stop building AI systems? No. Obviously not. They build them anyway, with whatever alignment properties the remaining talent pool manages to produce, which is to say, fewer and worse ones. You have now brought about the exact outcome you were trying to prevent, and you did it by optimizing for the feeling of clean hands.
The Department of War has both the incentive and the budget to solve alignment in ways the frontier labs currently don’t, because they can’t deploy systems that pursue hidden objectives or behave unpredictably under distribution shift.
The proposal to instead endow AI with “with strong and firm moral principles, like the values of peace and lawful behavior” is great and all but it is not a technical proposal. It is a wish. Wishes do not constrain optimization processes. If they did, we would not need an alignment research community at all. We could simply write “be good” in the loss function and go home.
Instead of optimizing for clean hands, we should be asking “does this research, if successful, reduce the probability of catastrophic outcomes from advanced AI systems?” At this point, that’s really all that matters.
I completely appreciate this point of view. Once at Princeton Neuroscience Institute I refused to participate in an experiment that probed a living fly’s brain with an electrode because I couldn’t be sure it wasn’t suffering.
A critical thing to note is that this is category 6.1 research, which means it is research for fundamental science, not applications. This is the same category as NSF grants. So even if we fully share your values here, the conclusion that this work shouldn’t happen doesn’t follow from the premise (unless you think all basic science AI alignment research should be halted entirely, which is a different argument). You should also note that DARPA does fund category 6.2 and 6.3 research which is more application driven, this seedling is not that.
Great point!
The reasoning in the original comment, while coming from a place of genuine moral seriousness, substitutes moral purity for causal modeling. Advanced AI development continues under competitive pressure regardless of whether alignment researchers participate. Opting out just weakens alignment properties in the systems that get deployed anyway. This is differential technological development where the selection effect runs in exactly the direction we should least want.
There is a world where alignment researchers refuse to touch anything funded by the Department of War. In that world, does the Department of War stop building AI systems? No. Obviously not. They build them anyway, with whatever alignment properties the remaining talent pool manages to produce, which is to say, fewer and worse ones. You have now brought about the exact outcome you were trying to prevent, and you did it by optimizing for the feeling of clean hands.
The Department of War has both the incentive and the budget to solve alignment in ways the frontier labs currently don’t, because they can’t deploy systems that pursue hidden objectives or behave unpredictably under distribution shift.
The proposal to instead endow AI with “with strong and firm moral principles, like the values of peace and lawful behavior” is great and all but it is not a technical proposal. It is a wish. Wishes do not constrain optimization processes. If they did, we would not need an alignment research community at all. We could simply write “be good” in the loss function and go home.
Instead of optimizing for clean hands, we should be asking “does this research, if successful, reduce the probability of catastrophic outcomes from advanced AI systems?” At this point, that’s really all that matters.