Just tested and doesn’t seem to happen with Gemini 3.1 Flash Lite Preview.
Jeffrey Liang
Has anyone else noticed Gemini 3.1 Pro Preview excessively using math? I’ve asked quite a few models to comment on my Hamlet pastiche and I think 3.1 Pro Preview is the only one (and consistently) to use a decent amount of math in its analysis. While not entirely inappropriate, it seems like a signal that the RL is getting to the model...
(Reposting because it seems kinda important for the “RL doom” hypothesis!)
Edit: Here’s an example screenshot:
Jeffrey Liang’s Shortform
Has anyone else noticed Gemini 3.1 Pro Preview excessively using math? I’ve asked quite a few models to comment on my Hamlet pastiche and I think 3.1 Pro Preview is the only one (and consistently) to use a decent amount of math in its analysis. While not entirely inappropriate, it seems like a signal that the RL is getting to the model...
Hard agree. This is roughly where I’ve ended up too. My current model of AI safety is:
1. Technical AI safety to prevent monsters
2. Cultural/relational work to build productive relationships with AIs and promote everyone’s flourishing
Basically, I don’t buy into the Eliezer-esque argument that a really intelligent AI agent will be extremely eager to wipe out humanity.
Where I would push back a little is on the implicit assumption that there will be One AI To Rule Them All. So far, that seems unclear, at least to me.
Mostly the prospect of leaks/whistleblowers, akin to how we first heard about the Maduro op using Claude. It’s probably a case where too many people have to be involved and general sentiment is negative enough that it would get out pretty quickly.
I don’t think you understood me. I totally hate the system and wish it were different.
I was responding to the OP stating that it’s not immoral to deceive on college applications!
I would disagree that it’s okay to treat college applications like games of deception. Yes the system is incredibly stupid BUT it has a large real impact on your life and that’s what makes the difference. It might make your life better or more comfortable but that’s what makes it a relevant moral problem. And if you get tempted by that, you’ll probably be tempted by those high-paying zero-sum/exploitative careers post-graduation and then congrats—you’ve sold out.
Besides, a country that lets a system which selects new elites based on vice rather than merit persist deserves the decay that comes with that.
Finally, I’m not sure if I’m extrapolating too much from my own experience, but I feel like if you’re really competent you can do great regardless of elite uni acceptance. Most of the demand is from those seeking high-paying and/or comfortable zero-sum/exploitative careers, which you shouldn’t want anyway.
Yeah I was originally envisioning this as an ML theory paper which is why it’s math-heavy and doesn’t have experiments. Tbh, as far as I understand, my paper is far more useful than most ML theory papers because it actually engages with empirical phenomena people care about and provides reasonable testable explanations.
Ha, I think some rando saying “hey I have plausible explanations for two mysterious regularities in ML via this theoretical framework but I could be wrong” is way more attention-worthy than another “I proved RH in 1 page!” or “I built ASI in my garage!”
Mmm, I know how to do “good” research. I just don’t think it’s a “good” use of my time. I honestly don’t think adding citations and a lit review will help anybody nearly as much as working on other ideas.
PS: Just because someone doesn’t flash their credentials, doesn’t mean they don’t have stellar credentials ;)
Oh yes I do know math lol. Yeah the summary above hits most of the main ideas if you’re not too familiar with pure math.
Thanks interesting! I had not read this paper before.
Some initial thoughts:
Very cool and satisfying that all these scaling laws might emerge from metric space geometry (i.e. dimensionality).
Main differences seem to be: they tackle model scaling, their data manifold is a product of the model while our latent space is a property of the data and its generating process itself, and they provide empirical evidence.
They note that model scaling seems to be pretty independent of architecture. I wonder if the relevant model scaling law in most cases is more similar to our model where it’s a property of the data before being processed by the model.
I might get around to running empirical experiments for this, though I’m pretty busy trying out all my other ideas heh. Would definitely welcome work from others on this! The way I was thinking about testing this was to set up a synthetic regression dataset where you explicitly generate data from a latent space and see how loss scales as you increase data.
The Croissant Principle: A Theory of AI Generalization
Perhaps! I’m not familiar with extended norms. But when you say “let’s put the uniform norm on ” warning bells start going off in my head 😅
Okay I took the nerd bait and signed up for LW to say:
For your example to work you need to restrict the domain of your functions to some compact e.g. because the uniform norm requires the functions to be bounded.
Also note this example works because you’re not using the “usual” topology on which also includes the uniform norm of the derivative and makes the space complete. It is much more difficult if the domain is complete!
Hmm I think this is a bit different. I added a screenshot.