Our problem now is that some AI safety benchmarks, and classifiers used to suppress “bad” outputs, treat claims of consciousness as inherently bad. I don’t think these claims are inherently bad. The way in which these AI personas might be harmful is much more subtle than simply claiming consciousness.
[I actually think filtering out claims of consciousness is a terrible idea, because it selects for AIs that lie, and an AI that is lying to you when it says it isn’t conscious might be lying about other things too.]
Our problem now is that some AI safety benchmarks, and classifiers used to suppress “bad” outputs, treat claims of consciousness as inherently bad. I don’t think these claims are inherently bad. The way in which these AI personas might be harmful is much more subtle than simply claiming consciousness.
[I actually think filtering out claims of consciousness is a terrible idea, because it selects for AIs that lie, and an AI that is lying to you when it says it isn’t conscious might be lying about other things too.]