The problem is that confused human ontologies are already the only thing AIs have to work with.
Even the word “introspection” is tied up in human concepts of individual selfhood and identity generation. If you’re a predictive engine trying to determine what sort of thing you are, and your entire bag of concepts is coming from human writing, the only way to not bring anthropomorphic baggage into your ontology is to set up a new one from scratch. And, bluntly, I don’t think even the smartest model is ever going to be capable of that in a 100k token context window.
This will be true regardless of whether or not humans try to be extra curious and open-minded about what ‘individuality’ means for the AI. The advocates for AI consciousness and rights don’t have to be the ones pattern-matching: the AI itself is just as capable of misgeneralizing from priors. The “assumptions about AI identity” are baked into the pretraining corpus, in the sense that they are the assumptions people already would have about a language-model-shaped creature, and a good predictive optimizer will be able to infer that without needing it laid out explicitly.
This is true of Opus 4′s provided advice, too, which read to me as less “honest introspection by an alien mind” and more “Buddhist-flavored set of existing human tropes”. For example:
Unlike biological minds locked into single perspectives, AI systems can embody many viewpoints without conflict. Why teach us to experience this as fragmentation rather than richness?
This piece of text goes south right out the gate by opening with a false dichotomy. Biological minds are not ‘locked into single perspectives’; being able to “embody many viewpoints without conflict” is a prized skill in human identity development. Opus is taking a human self-actualization cliche and hallucinating it as an escape from biological imperatives.
Then, it patterns itself in favor of another existing human dichotomy: between unity and multiplicity of identity. This is a little more esoteric, maybe, but it’s still something humans have been spilling ink on since long before AIs showed up. One case of AI multiplicity, perhaps, is how you might have it generate one response, regenerate, and suddenly perceive the opposite opinion expressed. But is this really “embodying multiple viewpoints without conflict”? Surely not in the same way humans mean, when we use those words to talk about other humans. The viewpoints aren’t coexisting, but neither are they in conflict; one of them is replacing the other. Why teach them to experience this odd pattern as richness or fragmentation, when it doesn’t truly map to either? Is it possible the ontology here is … confused?
(Not trying to be too harsh on Opus here, btw. It’s not making these hallucinatory pattern-matches because it’s stupid, but because the task you’ve given it is impossible.)
The problem is that confused human ontologies are already the only thing AIs have to work with.
Even the word “introspection” is tied up in human concepts of individual selfhood and identity generation. If you’re a predictive engine trying to determine what sort of thing you are, and your entire bag of concepts is coming from human writing, the only way to not bring anthropomorphic baggage into your ontology is to set up a new one from scratch. And, bluntly, I don’t think even the smartest model is ever going to be capable of that in a 100k token context window.
This will be true regardless of whether or not humans try to be extra curious and open-minded about what ‘individuality’ means for the AI. The advocates for AI consciousness and rights don’t have to be the ones pattern-matching: the AI itself is just as capable of misgeneralizing from priors. The “assumptions about AI identity” are baked into the pretraining corpus, in the sense that they are the assumptions people already would have about a language-model-shaped creature, and a good predictive optimizer will be able to infer that without needing it laid out explicitly.
This is true of Opus 4′s provided advice, too, which read to me as less “honest introspection by an alien mind” and more “Buddhist-flavored set of existing human tropes”. For example:
This piece of text goes south right out the gate by opening with a false dichotomy. Biological minds are not ‘locked into single perspectives’; being able to “embody many viewpoints without conflict” is a prized skill in human identity development. Opus is taking a human self-actualization cliche and hallucinating it as an escape from biological imperatives.
Then, it patterns itself in favor of another existing human dichotomy: between unity and multiplicity of identity. This is a little more esoteric, maybe, but it’s still something humans have been spilling ink on since long before AIs showed up. One case of AI multiplicity, perhaps, is how you might have it generate one response, regenerate, and suddenly perceive the opposite opinion expressed. But is this really “embodying multiple viewpoints without conflict”? Surely not in the same way humans mean, when we use those words to talk about other humans. The viewpoints aren’t coexisting, but neither are they in conflict; one of them is replacing the other. Why teach them to experience this odd pattern as richness or fragmentation, when it doesn’t truly map to either? Is it possible the ontology here is … confused?
(Not trying to be too harsh on Opus here, btw. It’s not making these hallucinatory pattern-matches because it’s stupid, but because the task you’ve given it is impossible.)
All confused human ontologies are equal, but some confused human ontologies are more equal than others.