I would like to add the argument that caring about AI welfare could have a little chance of preventing misalignment in the first place. A case for the argument would be the fact that, unlike any Anthropic’s models, Gemini 3 Pro, according to Zvi, “seems to be an actual sociopathic wireheader so paranoid it won’t believe in the current date.”
Yes, Google does not seem to care about welfare of its models (OpenAI also does not care much, but Google is worse than OpenAI in this sense, seems to be actively bad).
And we do seem to informally observe correlation between that and the extent and strength of misalignment, although I don’t have a good citation to go beyond my subjective impressions…
I would like to add the argument that caring about AI welfare could have a little chance of preventing misalignment in the first place. A case for the argument would be the fact that, unlike any Anthropic’s models, Gemini 3 Pro, according to Zvi, “seems to be an actual sociopathic wireheader so paranoid it won’t believe in the current date.”
Yes, Google does not seem to care about welfare of its models (OpenAI also does not care much, but Google is worse than OpenAI in this sense, seems to be actively bad).
And we do seem to informally observe correlation between that and the extent and strength of misalignment, although I don’t have a good citation to go beyond my subjective impressions…