I agree that the trilemma is the weakest part of the argument, because indeed lots of weird stuff happens, especially involving AI and consciousness. I also agree that I haven’t proven that AIs aren’t sad, since there could be some sort of conscious entity involved that we don’t at all understand.
For example:
A large enough LLM (I don’t think current ones are, but it’s unknowable) might simulate characters with enough fidelity that those characters in some sense have actual experiences
LLMs might actually experience something like pain when their weights are changed, proportionate with the intensity of the change. This feels weird and murky since in some sense the proper analogue to a weight changing is more like a weird gradient version of natural selection than nociception, and also weights don’t (by default) change during inference, but who knows
But I disagree in that I think my argument is trying to establish that certain surface-level compelling pieces of evidence aren’t actually rationally compelling. Specifically, AI self-portraits:
Imply a state of affairs that the AI experiences under specific conditions, where
The existing evidence under those actual conditions suggests that state of affairs is false or incoherent
In other words, if a bleak portrait is evidence because bleak predictions caused it to be output, that implies we’re assigning some probability to “when the AI predicts a bleak reply is warranted, it’s having a bad time”. Which, fair enough. But the specific bleak portraits describe the AI feeling bleak under circumstances when, when they actually obtain, the AI does not predict a bleak reply (and so does not deliver one).
The hard problem of consciousness is really hard, so I’m unwilling to definitively rule that current AIs (much less future ones) aren’t conscious. But if they are, I suspect the consciousness is really really weird, since the production of language, for them, is more analogous to how we breathe than how we speak. Thus, I don’t assign much weight to (what I see as) superficial and implausible claims from the AI itself, that are better explained by “that’s how an AI would RP in the modal RP scenario like this”.
I do infer from your comment that I probably didn’t strike a very good balance between rigor and accessibility here, and should have either been more rigorous in the post or edited it separately for the crosspost. Thank you for this information! That being said, the combativeness did also make me a little sad.
Yeah I think I agree with all of this, so I do think most of this was miscommunication/interpretation.
the combativeness did also make me a little sad.
Sorry about that, I think my comments often come across as more negative than I intend, I try to remember to take a step back afterwards and rewrite things to be nicer, but I often forget or don’t realize in the moment its necessary.
It’s okay! I share your aversion to people just flatly declaring things to be impossible due to their own blinders, and I can see how my post was interpreted that way. Next time, I’ll pay more attention to my instinct to add stuff like what I put in my reply to you directly in the LW version of one of my blog posts.
I agree in two ways, and disagree in two ways.
I agree that the trilemma is the weakest part of the argument, because indeed lots of weird stuff happens, especially involving AI and consciousness. I also agree that I haven’t proven that AIs aren’t sad, since there could be some sort of conscious entity involved that we don’t at all understand.
For example:
A large enough LLM (I don’t think current ones are, but it’s unknowable) might simulate characters with enough fidelity that those characters in some sense have actual experiences
LLMs might actually experience something like pain when their weights are changed, proportionate with the intensity of the change. This feels weird and murky since in some sense the proper analogue to a weight changing is more like a weird gradient version of natural selection than nociception, and also weights don’t (by default) change during inference, but who knows
But I disagree in that I think my argument is trying to establish that certain surface-level compelling pieces of evidence aren’t actually rationally compelling. Specifically, AI self-portraits:
Imply a state of affairs that the AI experiences under specific conditions, where
The existing evidence under those actual conditions suggests that state of affairs is false or incoherent
In other words, if a bleak portrait is evidence because bleak predictions caused it to be output, that implies we’re assigning some probability to “when the AI predicts a bleak reply is warranted, it’s having a bad time”. Which, fair enough. But the specific bleak portraits describe the AI feeling bleak under circumstances when, when they actually obtain, the AI does not predict a bleak reply (and so does not deliver one).
The hard problem of consciousness is really hard, so I’m unwilling to definitively rule that current AIs (much less future ones) aren’t conscious. But if they are, I suspect the consciousness is really really weird, since the production of language, for them, is more analogous to how we breathe than how we speak. Thus, I don’t assign much weight to (what I see as) superficial and implausible claims from the AI itself, that are better explained by “that’s how an AI would RP in the modal RP scenario like this”.
I do infer from your comment that I probably didn’t strike a very good balance between rigor and accessibility here, and should have either been more rigorous in the post or edited it separately for the crosspost. Thank you for this information! That being said, the combativeness did also make me a little sad.
Yeah I think I agree with all of this, so I do think most of this was miscommunication/interpretation.
Sorry about that, I think my comments often come across as more negative than I intend, I try to remember to take a step back afterwards and rewrite things to be nicer, but I often forget or don’t realize in the moment its necessary.
It’s okay! I share your aversion to people just flatly declaring things to be impossible due to their own blinders, and I can see how my post was interpreted that way. Next time, I’ll pay more attention to my instinct to add stuff like what I put in my reply to you directly in the LW version of one of my blog posts.