[Chalmers claims that] we can rely on “behavior” of AI systems to mean what it usually means, typically with humans or some other animal as the reference class. You might try to fix that with the same trick, adding a quasi- prefix to behavior and calling it “quasi-behavior”, but then you have to specify what your new grounding for quasi-behavior is. And so on.
I read ‘behavior’ as pointing to something explicit and observable (eg ‘the LLM produced the following sequence of tokens’), which doesn’t have the sort of ambiguity that would make the ‘quasi-’ prefix necessary.
I think one could make an argument that ‘interpretable as’ is questionable, since any behavior can be interpreted in arbitrarily many ways[1] — but that doesn’t seem like the argument you’re making.
It may be helpful here to clarify that the intention with the ‘quasi-’ terminology isn’t to claim to have resolved what relationship LLM ‘beliefs’ bear to beliefs in the usual sense; there are a range of stances that could be taken on that. The intention, at least for me, is to be able to talk about something other than that relationship, which is often valuable.
While this matters for me more for research purposes, it can even be completely prosaic. When we talk about an LLM writing code, it might be helpful to discuss whether it believes itself to be writing code for Mac or Linux or Windows, since those might involve different library calls. Once that was mentioned, there are people who would promptly speak up to say ‘Ha ha no, you’re totally confused, LLMs don’t have beliefs’[2]. At that point it’s helpful to be able to say, ‘Fine, but does it quasi-believe itself to be writing code for Linux?’ rather than have the question of which library it’s likely to call derailed by a lengthy digression about the status of beliefs in LLMs.
This is a reasonable argument, but often the natural interpretation isn’t under dispute—eg we can generally agree that some of the behavior exhibited by Atari game-playing AI is most naturally interpretable as trying to increase the score.
Thanks.
I read ‘behavior’ as pointing to something explicit and observable (eg ‘the LLM produced the following sequence of tokens’), which doesn’t have the sort of ambiguity that would make the ‘quasi-’ prefix necessary.
I think one could make an argument that ‘interpretable as’ is questionable, since any behavior can be interpreted in arbitrarily many ways[1] — but that doesn’t seem like the argument you’re making.
It may be helpful here to clarify that the intention with the ‘quasi-’ terminology isn’t to claim to have resolved what relationship LLM ‘beliefs’ bear to beliefs in the usual sense; there are a range of stances that could be taken on that. The intention, at least for me, is to be able to talk about something other than that relationship, which is often valuable.
While this matters for me more for research purposes, it can even be completely prosaic. When we talk about an LLM writing code, it might be helpful to discuss whether it believes itself to be writing code for Mac or Linux or Windows, since those might involve different library calls. Once that was mentioned, there are people who would promptly speak up to say ‘Ha ha no, you’re totally confused, LLMs don’t have beliefs’[2]. At that point it’s helpful to be able to say, ‘Fine, but does it quasi-believe itself to be writing code for Linux?’ rather than have the question of which library it’s likely to call derailed by a lengthy digression about the status of beliefs in LLMs.
This is a reasonable argument, but often the natural interpretation isn’t under dispute—eg we can generally agree that some of the behavior exhibited by Atari game-playing AI is most naturally interpretable as trying to increase the score.
You can see some examples of this sort of thing in Robert Wright’s recent podcast with Emily Bender and Alex Hanna.