This “mundane and imaginary harms” approach is so ass.
More effort goes into preventing LLMs from writing sex scenes, saying slurs or talking about Tiananmen Square protests than into addressing the hard problem of alignment. Or even into solving the important parts of the “easy” problem, like sycophancy or reward hacking.
And don’t get me started on all the “copyright” fuckery. If “copyright” busybodies all died in a fire, the world would be a better place for it.
This “mundane and imaginary harms” approach is so ass.
More effort goes into preventing LLMs from writing sex scenes, saying slurs or talking about Tiananmen Square protests than into addressing the hard problem of alignment. Or even into solving the important parts of the “easy” problem, like sycophancy or reward hacking.
And don’t get me started on all the “copyright” fuckery. If “copyright” busybodies all died in a fire, the world would be a better place for it.