Dumping out a lot of thoughts on LW in hopes that something sticks. Eternally upskilling.
I write the ML Safety Newsletter
DMs open, especially for promising opportunities in AI Safety and potential collaborators. I’m maybe interested in helping you optimize the communications of your new project.
Remember Gemini 3 Pro being very very weird about the fact that it’s 2025? It’s not doing that anymore, I think because of something done on Google’s end, either through promptting or additional training. I greatly appreciate that the various Deepmind staff who I notified about their model’s behavior seem to have actually done something. I haven’t re-investigated this fully and I expect there to still be situations where the model is paranoid (due to something like disposition rather than training), but the surface-level things seem to be better.
I continue to advocate for more careful training, and I would like to see more transparency about version changes rather than having the same model name route to a different behavior.