Great post—I’ve been having very similar thoughts recently. On a more concrete and prescriptive level, I’m curious how we should account for the effects of over-determined ideas about AI ontologies when conducting AI safety research. Significant work goes into red-teaming, analyzing, and publishing (!) the ways in which AIs might misbehave. By proliferating these expectations, even for the sake of safety, are we causing unintended harm? How might we account for this possibility?
Great post—I’ve been having very similar thoughts recently. On a more concrete and prescriptive level, I’m curious how we should account for the effects of over-determined ideas about AI ontologies when conducting AI safety research. Significant work goes into red-teaming, analyzing, and publishing (!) the ways in which AIs might misbehave. By proliferating these expectations, even for the sake of safety, are we causing unintended harm? How might we account for this possibility?