I am a Computer Science PhD who has works on the core modeling team for Google Gemini. I also have some background in Computational Complexity theory and Quantum Computing.
I have a blog at https://onemanynone.substack.com/ where I publish posts aimed at a broader and less technical audience.
OneManyNone
I’ve never boxed but I have a few years of Jiu Jitsu and even more of wrestling. A couch potato wouldn’t beat a trained fighter, but a 250 pound man of stocky build, medium talent and maybe two or three years of training probably couldn’t be knocked out by any 100 pound man in the world, and would probably win if he were lucky enough to land a good punch.
Perhaps this is elitist or counter-productive to say but… do these people actually exist?
By which I mean, are there people who are using LLMs to do meaningful novel research, while also lacking the faculties/self-awareness to realize that LLMs can’t produce or verify novel ideas?
My impression has been that LLMs can only be used productively in situations where one of the following holds:
- The task is incredibly easy
- Precision is not a requirement
- You have enough skill that you could have done the thing on your own anyway.
In the last case in particular, LLMs are only an effort-saver, and you’d still need to verify and check every step it took. Novel research in particular requires enormous skill—I’m not sure that someone who had that skill would get to the point where they developed a whole theory without noticing it was made up.
[Also, as a meta-point, this is a great piece but I was wondering if it’s going to be posted somewhere else besides LessWrong? If the target demographic is only LW, I worry that it’s trying to have too many audience. Someone coming to this for advice would see the comments from people like me who were critiquing the piece itself, and that would certainly make it less effective. In the right place (not sure what that it) I think this could essay could be much more effective.]
Took me a while to get back to you because I wanted to read your whole sequence (linked to me by someone else) before responding.
Anyway, it was far better than anything I wrote, and the thinking was much clearer.
I’m also a bit surprised that I hadn’t seen more discussion about double-halferism. It’s a single, barely comprehensible paragraph on the Sleeping Beauty wikipedia page, and you can barely find any links to it when googling “Sleeping Beauty” or “anthropics.” All this, despite the fact that it’s clearly the correct solution!
Anyway, thank you for the good read. I’m hoping to take a second attempt at this problem that should be a bit more thorough. I think I had gotten most of the way there, reasoning wise, but there were pieces I was clearly missing.
The problem doesn’t specify, you are correct. But if you’re trying to use this as a guide for figuring out how to assign probabilities to the origins of our universe, then the 50⁄50 reasoning is the correct one.
I think we may be approaching these questions too differently, and I’m having trouble appreciating your question. I want to make sure I actually answer it.
The way I’m modelling the situation is this, the implication being that this is the closest to the way we would want to understand our universe:
1. A universe is created2. Observers are “placed” in this universe as a result of the universe’s inherent processes
3. You are “assigned” to one of these observers at random
In this framework, you don’t necessarily get to verify anything. It’s merely the case that if you were modelling the universe that way, then you would find that the probability of being in any given universe was determined only by step 1, unaffected by step 3.
That comment about math was just intended as a rhetorical flourish. My apologies if it was over stated. In context, though, hopefully it was clear that I meant once you’ve established what question you’re actually asking, the math is straightforward.
Would you be willing to expand a little on your last sentence and explain a bit more about what you’re asking? I’m not sure I followed.
Sleeping Beauty and the Forever Muffin
I feel as if I can agree with this statement in isolation, but can’t think of a context where I would consider this point relevant.
I’m not even talking about the question of whether or not the AI is sentient, which you asked us to ignore. I’m talking about how do we know that an AI is “suffering,” even if we do assume it’s sentient. What exactly is “suffering” in something that is completely cognitively distinct from a human? Is it just negative reward signals? I don’t think so, or at least if it was, that would likely imply that training a sentient AI is unethical in all cases, since training requires negative signals.
That’s not to say that all negative signals are the same or that maybe in some contexts it’s painful or not, just that I think determining that is an even harder problem than determining if the AI is sentient.
Fair enough. But for the purposes of this post, the point is that capability increased without increased compute. If you prefer, bucket it as “compute” vs “non-compute” instead of “compute” vs “algorithmic”.
I think whether or not it’s trivial isn’t the point: they did it, it worked, and they didn’t need to increase the compute to make it happen.
Proposal: Tune LLMs to Use Calibrated Language
I agree. I made this point and that is why I did not try to argue that LLMs did not have qualia.
But I do believe you can consider necessary conditions and look at their absence. For instance, I can safely declare that a rock does not have qualia, because I know it does not have a brain.
Similarly, I may not be able to measure whether LLMs have emotions, but I can observe that the processes that generated LLMs are highly inconsistent with the processes that caused emotions to emerge in the only case where I know they exist. Pair that with the observation that specific human emotions seem like only one option out of infinitely many, and it makes a strong probabilistic argument.
This is sort of why I made the argument that we can only consider necessary conditions, and look for their absence.
But more to your point, LLMs and human brains aren’t “two agents that are structurally identical.” They aren’t even close. The fact that a hypothetical built-from-scratch human brain might have the same qualia as humans isn’t relevant, because that’s not what’s being discussed.
Also, unless your process was precisely “attempt to copy the human brain,” I find it very unlikely that any AI development process would yield something particularly similar to a human brain.
I have explained myself more here: https://www.lesswrong.com/posts/EwKk5xdvxhSn3XHsD/don-t-over-anthropomorphize-ai
OK, I’ve written a full rebuttal here: https://www.lesswrong.com/posts/EwKk5xdvxhSn3XHsD/don-t-over-anthropomorphize-ai. The key points are at the top.
In relation to your comment specifically, I would say that anger may have that effect on the conversation, but there’s nothing that actually incentivizes the system to behave that way—the slightest hint of anger or emotion would be immediate negative reward during RLHF training. Compare to a human: There may actually be some positive reward to anger, but even if there isn’t evolution still allowed to get angry because we are mesa-optimizers where that has a positive effect overall.
Therefore, the system learned angry behavior in stage-1 training. But that has no reward structure, and therefore could not associate different texts to different qualia.
Why I Believe LLMs Do Not Have Human-like Emotions
Hmmm… I think I still disagree, but I’ll need to process what you’re saying and try to get more into the heart of my disagreement. I’ll respond when I’ve thought it over.
Thank you for the interesting debate. I hope you did not perceive as me being overly combative.
I see, but I’m still not convinced. Humans behave in anger as a way to forcibly change a situation into one that is favorable to itself. I don’t believe that’s what the AI was doing, or trying to do.
I feel like there’s a thin line I’m trying to walk here, and I’m not doing a very good job. I’m not trying to comment on whether or not the AI has any sort of subjective experience. I’m just saying that even if it did, I do not believe it would bare any resemblance to what we as humans experience as anger.
Ah okay. My apologies for misunderstanding.
Resurrecting these two year-old comments to ask: do either of you know if anyone has actually formalized a way to do this?