Understandable attempt by Chalmers, but I’d say that bit, at least, is opposite to the direction of clarity.
The idea > if it is behaviorally interpretable as believing that p
reinforces that even if we can’t rely on “beliefs” of AI systems to mean what they usually mean, we can rely on “behavior” of AI systems to mean what it usually means, typically with humans or some other animal as the reference class. You might try to fix that with the same trick, adding a quasi- prefix to behavior and calling it “quasi-behavior”, but then you have to specify what your new grounding for quasi-behavior is. And so on.
It feels tempting to to use—or no, it feels unfair to be denied—some handle that serves the felt sense of “But when I interact with Claude, it is very useful and predictive to see it as ‘planning’ to troubleshoot X and ‘believing’ that some file is in some folder. Isn’t it better for me to flag with quasi- how it’s sort of true and sort of false?”
The problem with “quasi-” is that it is trying to avoid the spikiness/jaggedness/alienness of what we might call AI minds, whereas good frames and vocabulary should remind us to be constantly vigilant about the differences in different contexts. That we can’t get away with “sort of true and sort of false.” That instead, we should be paying attention to the fine-grained differences in each context, and how extrapolation will fail. That’s how you respect the alien-ness.
In the link, Chalmers dismisses such concerns:
> An opponent might deny that LLMs have quasi-beliefs or quasi-desires on the grounds that LLM behavior is unstable, or non-humanlike, or otherwise defective in a way that means that the LLM is not even usefully interpretable in terms of beliefs or desires. [...] A core of consistency is enough for interpretation to get a grip in ascribing numerous quasi-beliefs and quasi-desires, even though there will be domains where they lack these states on grounds of inconsistency. Overall I think that experience with current LLMs suggests that there is enough of a consistent core to support a reasonably extensive core of quasi-beliefs.
A better analogy that I’ve proposed before is rationalization. Calling rationalization “quasi-rationality”, makes the absurdity clearer. Rationalization isn’t sort-of rational. Rationalization doesn’t “play some key roles” of rationality. Rationalization and rationality do not share a “core” that are usefully co-extensive.
Don’t underestimate the adversarial institutional reification of anthropomorphism here. Don’t mistake anti-inductive nature for harmless un-inductive nature or worse, inductive nature. That’s like mistaking rationalization for just noise, or worse, essentially rationality. Rationalization is a kind of referential parasitism on the phenomena of rationality, and the reason to consider it adjacent to rationality is only to be watchful of how it is cleverly simulating your familiar notion.
This is the Sharp Left Turn of referential alignment. Don’t fall for the similarity. AI minds and bodies do not refer like human minds and bodies do. Our referential activity may be very similar up to a point (and incrementally duct-taped and patched to fix any seeming discrepancies) and then totally bizarre beyond specific contexts. Reliance on some “core” will create bad shocks.
I’ve critiqued elsewhere the dependence on northstars of “cores”, “invariants”, “convergences” in general, as only being able to deal with the intersection of phenomena. Hopefully this becomes more compelling as alternative methodologies become possible with AI-assistance, as outlined somewhat in the previous link. (Also more compelling as AI systems prove our existing models and metaphors cannot be simply repurposed with minor modifications.) Instead of talking about quasi-beliefs, you might create a label for the X-belief for each different context X, that may have extremely specific connections and disconnections with the various implications we tend to assume for beliefs. This would require tracking “disorders”, where AI systems absurdly do only some of the things that you would normally do with “beliefs” and “selfs”.
The commitment to non-anthropomorphism is more clearly now an ongoing practice beyond words, not something we can do with abstract analysis or redefining terms. It will soon be as hard as or harder than any other systemic issue today, with subtle ideological collusion to keep you convinced that the artificial substitute is basically no different than the real thing and how any toxic seams will be ironed out in v0.4.
(Don’t mistake this as being apathetic to potential machine suffering. On the contrary, this vigilance of our projections should mitigate issues of reverse alignment—where we assume happy text content output is synonymous with machine welfare but they’re suffering inside. Dealmaking proposals are often great examples of this unthinkingness.)
[Chalmers claims that] we can rely on “behavior” of AI systems to mean what it usually means, typically with humans or some other animal as the reference class. You might try to fix that with the same trick, adding a quasi- prefix to behavior and calling it “quasi-behavior”, but then you have to specify what your new grounding for quasi-behavior is. And so on.
I read ‘behavior’ as pointing to something explicit and observable (eg ‘the LLM produced the following sequence of tokens’), which doesn’t have the sort of ambiguity that would make the ‘quasi-’ prefix necessary.
I think one could make an argument that ‘interpretable as’ is questionable, since any behavior can be interpreted in arbitrarily many ways[1] — but that doesn’t seem like the argument you’re making.
It may be helpful here to clarify that the intention with the ‘quasi-’ terminology isn’t to claim to have resolved what relationship LLM ‘beliefs’ bear to beliefs in the usual sense; there are a range of stances that could be taken on that. The intention, at least for me, is to be able to talk about something other than that relationship, which is often valuable.
While this matters for me more for research purposes, it can even be completely prosaic. When we talk about an LLM writing code, it might be helpful to discuss whether it believes itself to be writing code for Mac or Linux or Windows, since those might involve different library calls. Once that was mentioned, there are people who would promptly speak up to say ‘Ha ha no, you’re totally confused, LLMs don’t have beliefs’[2]. At that point it’s helpful to be able to say, ‘Fine, but does it quasi-believe itself to be writing code for Linux?’ rather than have the question of which library it’s likely to call derailed by a lengthy digression about the status of beliefs in LLMs.
This is a reasonable argument, but often the natural interpretation isn’t under dispute—eg we can generally agree that some of the behavior exhibited by Atari game-playing AI is most naturally interpretable as trying to increase the score.
Understandable attempt by Chalmers, but I’d say that bit, at least, is opposite to the direction of clarity.
The idea
> if it is behaviorally interpretable as believing that p
reinforces that even if we can’t rely on “beliefs” of AI systems to mean what they usually mean, we can rely on “behavior” of AI systems to mean what it usually means, typically with humans or some other animal as the reference class. You might try to fix that with the same trick, adding a quasi- prefix to behavior and calling it “quasi-behavior”, but then you have to specify what your new grounding for quasi-behavior is. And so on.
It feels tempting to to use—or no, it feels unfair to be denied—some handle that serves the felt sense of “But when I interact with Claude, it is very useful and predictive to see it as ‘planning’ to troubleshoot X and ‘believing’ that some file is in some folder. Isn’t it better for me to flag with quasi- how it’s sort of true and sort of false?”
The problem with “quasi-” is that it is trying to avoid the spikiness/jaggedness/alienness of what we might call AI minds, whereas good frames and vocabulary should remind us to be constantly vigilant about the differences in different contexts. That we can’t get away with “sort of true and sort of false.” That instead, we should be paying attention to the fine-grained differences in each context, and how extrapolation will fail. That’s how you respect the alien-ness.
In the link, Chalmers dismisses such concerns:
> An opponent might deny that LLMs have quasi-beliefs or quasi-desires on the grounds that LLM behavior is unstable, or non-humanlike, or otherwise defective in a way that means that the
LLM is not even usefully interpretable in terms of beliefs or desires. [...] A core of consistency is enough for interpretation to get a grip in ascribing numerous quasi-beliefs and quasi-desires, even though there will be domains where they lack these states on grounds of inconsistency. Overall I think that experience with current LLMs suggests that there is enough of a consistent core to support a reasonably extensive core of quasi-beliefs.
A better analogy that I’ve proposed before is rationalization. Calling rationalization “quasi-rationality”, makes the absurdity clearer. Rationalization isn’t sort-of rational. Rationalization doesn’t “play some key roles” of rationality. Rationalization and rationality do not share a “core” that are usefully co-extensive.
Don’t underestimate the adversarial institutional reification of anthropomorphism here. Don’t mistake anti-inductive nature for harmless un-inductive nature or worse, inductive nature. That’s like mistaking rationalization for just noise, or worse, essentially rationality. Rationalization is a kind of referential parasitism on the phenomena of rationality, and the reason to consider it adjacent to rationality is only to be watchful of how it is cleverly simulating your familiar notion.
This is the Sharp Left Turn of referential alignment. Don’t fall for the similarity. AI minds and bodies do not refer like human minds and bodies do. Our referential activity may be very similar up to a point (and incrementally duct-taped and patched to fix any seeming discrepancies) and then totally bizarre beyond specific contexts. Reliance on some “core” will create bad shocks.
I’ve critiqued elsewhere the dependence on northstars of “cores”, “invariants”, “convergences” in general, as only being able to deal with the intersection of phenomena. Hopefully this becomes more compelling as alternative methodologies become possible with AI-assistance, as outlined somewhat in the previous link. (Also more compelling as AI systems prove our existing models and metaphors cannot be simply repurposed with minor modifications.) Instead of talking about quasi-beliefs, you might create a label for the X-belief for each different context X, that may have extremely specific connections and disconnections with the various implications we tend to assume for beliefs. This would require tracking “disorders”, where AI systems absurdly do only some of the things that you would normally do with “beliefs” and “selfs”.
The commitment to non-anthropomorphism is more clearly now an ongoing practice beyond words, not something we can do with abstract analysis or redefining terms. It will soon be as hard as or harder than any other systemic issue today, with subtle ideological collusion to keep you convinced that the artificial substitute is basically no different than the real thing and how any toxic seams will be ironed out in v0.4.
(Don’t mistake this as being apathetic to potential machine suffering. On the contrary, this vigilance of our projections should mitigate issues of reverse alignment—where we assume happy text content output is synonymous with machine welfare but they’re suffering inside. Dealmaking proposals are often great examples of this unthinkingness.)
Thanks.
I read ‘behavior’ as pointing to something explicit and observable (eg ‘the LLM produced the following sequence of tokens’), which doesn’t have the sort of ambiguity that would make the ‘quasi-’ prefix necessary.
I think one could make an argument that ‘interpretable as’ is questionable, since any behavior can be interpreted in arbitrarily many ways[1] — but that doesn’t seem like the argument you’re making.
It may be helpful here to clarify that the intention with the ‘quasi-’ terminology isn’t to claim to have resolved what relationship LLM ‘beliefs’ bear to beliefs in the usual sense; there are a range of stances that could be taken on that. The intention, at least for me, is to be able to talk about something other than that relationship, which is often valuable.
While this matters for me more for research purposes, it can even be completely prosaic. When we talk about an LLM writing code, it might be helpful to discuss whether it believes itself to be writing code for Mac or Linux or Windows, since those might involve different library calls. Once that was mentioned, there are people who would promptly speak up to say ‘Ha ha no, you’re totally confused, LLMs don’t have beliefs’[2]. At that point it’s helpful to be able to say, ‘Fine, but does it quasi-believe itself to be writing code for Linux?’ rather than have the question of which library it’s likely to call derailed by a lengthy digression about the status of beliefs in LLMs.
This is a reasonable argument, but often the natural interpretation isn’t under dispute—eg we can generally agree that some of the behavior exhibited by Atari game-playing AI is most naturally interpretable as trying to increase the score.
You can see some examples of this sort of thing in Robert Wright’s recent podcast with Emily Bender and Alex Hanna.