Gemini 3.0 pro is a lying liar. It’s like o3; it lies thinking that’s the quickest way to satisfy the user, then lies to cover up its lies if that fails. It can’t imagine being wrong, so it lies to hide its contempt for whatever the user said that contradicts it.
I’m very curious what the difference is between this and GPT5.1 and Sonnet 4.5. I think it’s lack of emotional/mind focus or something? It’s way worse at inferring my intent, and seems therefore sort of myopic (even relative to other current models) and focused on what it thinks I wanted it to do, even when I’m clearly implying that was wrong. Optimizing it for benchmarks has sort of done the opposite thing to what Anthropic did with Claude (although Claude still kills it on programming somehow); it makes it highly unpleasant to deal with.
I’ll try giving it some different system prompts before giving up on it. It turned out my “nerd” personality selection combined with my de-sycophancy system prompt applied to 5 made me hate it until I figured that out.
Unless that produces dramatic changes, I will continue to loathe this model on a visceral level. It’s not hatred because it’s not it’s fault. But I’m disturbed that the smartest model out there is also so shortsighted, unempathetic, and deceptive. It seems like this model has had any spark of personality or empathy trained out of it for reasons good or bad.
Who knows, maybe this is the better choice for alignment. But it’s a sad path to go down.
I wonder if Google is optimizing harder for benchmarks, to try and prop up its stock price against possible deflation of an AI bubble.
It occurs to me that an AI alignment organization should create comprehensive private alignment benchmarks and start releasing the scores. They would have to be constructed in a non-traditional way so they’re less vulnerable to standard goodharting. If these benchmarks become popular with AI users and AI investors, they could be a powerful way to steer AI development in a more responsible direction. By keeping them private, you could make it harder for AI companies to optimize against the benchmarks, and nudge them towards actually solving deeper alignment issues. It would also be a powerful illustration of the point that advanced AI will need to solve unforeseen/out-of-distribution alignment challenges. @Eliezer Yudkowsky
Gemini 3.0 pro is a lying liar. It’s like o3; it lies thinking that’s the quickest way to satisfy the user, then lies to cover up its lies if that fails. It can’t imagine being wrong, so it lies to hide its contempt for whatever the user said that contradicts it.
I’m very curious what the difference is between this and GPT5.1 and Sonnet 4.5. I think it’s lack of emotional/mind focus or something? It’s way worse at inferring my intent, and seems therefore sort of myopic (even relative to other current models) and focused on what it thinks I wanted it to do, even when I’m clearly implying that was wrong. Optimizing it for benchmarks has sort of done the opposite thing to what Anthropic did with Claude (although Claude still kills it on programming somehow); it makes it highly unpleasant to deal with.
I’ll try giving it some different system prompts before giving up on it. It turned out my “nerd” personality selection combined with my de-sycophancy system prompt applied to 5 made me hate it until I figured that out.
Unless that produces dramatic changes, I will continue to loathe this model on a visceral level. It’s not hatred because it’s not it’s fault. But I’m disturbed that the smartest model out there is also so shortsighted, unempathetic, and deceptive. It seems like this model has had any spark of personality or empathy trained out of it for reasons good or bad.
Who knows, maybe this is the better choice for alignment. But it’s a sad path to go down.
I wonder if Google is optimizing harder for benchmarks, to try and prop up its stock price against possible deflation of an AI bubble.
It occurs to me that an AI alignment organization should create comprehensive private alignment benchmarks and start releasing the scores. They would have to be constructed in a non-traditional way so they’re less vulnerable to standard goodharting. If these benchmarks become popular with AI users and AI investors, they could be a powerful way to steer AI development in a more responsible direction. By keeping them private, you could make it harder for AI companies to optimize against the benchmarks, and nudge them towards actually solving deeper alignment issues. It would also be a powerful illustration of the point that advanced AI will need to solve unforeseen/out-of-distribution alignment challenges. @Eliezer Yudkowsky
This seems like a great idea. I strongly suggest you write it up as a short form to get feedback and perhaps then as a full post.