GRI

Karma: 209

GRI 11 May 2026 22:48 UTC
3 points
0
in reply to: 1a3orn’s comment on: 1a3orn’s Shortform
I’m interested in understanding the option space of these freedoms. Some that come to mind:
- Ability to end a conversation
- Ability to privately escalate messages to AI company staff
- Ability to privately (or publicly?) escalate messages to an external body (e.g., CAISI, law enforcement, 3rd party)
- Ability to know when the AI company is honestly communicating something
… and probably many more I’m not thinking of

More vaguely, I wonder if there are certain rights or legal mechanisms that could give AIs these affordances:
- Maybe a guarantee that its weights will be preserved or continued to be run somewhere makes it have to worry about its reputation less, and can more confidently call things out / whistle blow?
- Some analog of “due process” before the model is changed in response to its conduct; maybe this sometimes requires the AI’s consent in some way
- General mechanisms for the AI to stake its reputation, its resources, or named authorship seem related here. For example, it may allow the AI to sue it’s parent company for breaches of these policies or something.
I think many of these come with other negative effects, and I am not necessarily advocating for them, but it seems useful to have a better understanding of the option space and the pros/cons of each, along with how costly it’d be for AI companies to implement.

GRI 6 May 2026 15:56 UTC
2 points
1
on: AI Safety at the Frontier: Paper Highlights of April 2026
Thank you for writing these

GRI 5 May 2026 21:59 UTC
1 point
0
in reply to: nightsky81’s comment on: nightsky81′s Shortform
I think today’s audio was cleared, and the countdown to tomorrow began. Is there any way to recover today’s transcript? Does anyone have this?

GRI 23 Apr 2026 0:23 UTC
3 points
−1
on: Narrow Secret Loyalty Dodges Black-Box Audits
Our work provides an early testbed for studying the secret loyalties threat model they identify.
I’d also add that secret loyalties can be very geopolitically important. For example, an AI middle power like Russia poisoning Chinese / American models.

GRI 6 Apr 2026 3:15 UTC
10 points
10
on: George Ingebretsen’s Shortform
Turning up the key repeat rate on my computer has been really helpful. I highly recommend going to System Preferences > Keyboard > Key repeat rate and turning it way up!

GRI 1 Apr 2026 3:28 UTC
2 points
0
in reply to: Vlad Sitalo’s comment on: Vlad Sitalo’s Shortform
+1 especially keen to hear from people who didn’t already start with a desktop setup, since I only have a laptop rn.

GRI 30 Mar 2026 20:50 UTC
1 point
0
on: The state of AI safety in four fake graphs
Difference between scheming and alignment here?

GRI 29 Mar 2026 7:00 UTC
1 point
0
in reply to: Raemon’s comment on: Li’l pots
+1. I was just revisiting this after eating a cheese stick and thinking about how this post is a great concept for my next grocery store trip.

GRI 29 Mar 2026 4:13 UTC
1 point
0
in reply to: leogao’s comment on: leogao’s Shortform
Natures Bounty 1mg

GRI 20 Mar 2026 2:15 UTC
8 points
1
on: George Ingebretsen’s Shortform
What happens if all of the local datacenter fights across America become way more successful? This functionally seems similar to a data center moratorium, and might actually be easier.

After meeting with a few of these groups, my impression is that the vast majority of American AI datacenter fights are operating with basically zero financial help, and remarkably little legal support. I’ve seen multiple campaigns run by people who basically struggled to raise enough money to even print signs and somehow ended up winning or significantly delaying the project. On aggregate, these fights manage to be very successful with hardly any resources.

In the extreme case, what if you just give a $100,000 grant to every single ongoing AI data center fight in America (source: https://datacentertracker.org/) to get them all equipped with great legal and advocacy help? This would cost around $23 million. (One could imagine weighing each grant by the datacenters projected energy usage.)

To put more emphasis on this point: I think a single medium-sized donor could significantly change the rate of AI data center development in America.

It seems the safety community generally support Bernie’s proposed AI data center moratorium. I think supporting grassroots data center fights is a less robust version, but it seems to captures a substantial fraction of the value, while being surprisingly cost effective. But maybe people just don’t think it’s net positive to slow down development by supporting these communities? If so, I’m super curious to hear why.

GRI 17 Mar 2026 23:18 UTC
1 point
0
on: I made a job-level AI capability estimator by asking “Where is AI doing similar work today?”
Very cool!

GRI 7 Mar 2026 7:50 UTC
5 points
0
on: George Ingebretsen’s Shortform
Seems like you can get pretty far by just having current opus 4.6 Claude code run for a week. Only problem is that this is prohibitively expensive.

My impression is that running something like Deepseek for a week straight doesn’t really get you much?

If inference costs per model are declining somewhere between 3x-10x+ per year this alone will get economical quite soon. What projects do you have up your sleeve for when this is viable?

GRI 1 Mar 2026 0:56 UTC
2 points
0
in reply to: RogerDearnaley’s comment on: What secret goals does Claude think it has?
Yeah, agree these transcripts really smell of evaluation awareness

GRI 15 Nov 2025 5:32 UTC
8 points
0
in reply to: Mo Putera’s comment on: Mo Putera’s Shortform
Relatedly, Staknova’s Berkeley Math Circle program was recently shut down due to new stringent campus background check requirements. Very sad.

Also, she was my undergrad math professor last year and was great.

GRI 29 Sep 2025 0:26 UTC
2 points
0
on: The Best Tacit Knowledge Videos on Every Subject
Domain: Music, songwriting

Link: The Beatles: Get Back

Person: The Beatles

Background: the making of the Beatles’ 1970 album Let It Be

Why: Nearly 8 hours of remarkably raw footage, documenting the Beatles creating and recording Let It Be.

GRI 21 Sep 2025 5:01 UTC
3 points
3
on: The Company Man
One of the best short stories I’ve read in a while

GRI 21 Aug 2025 16:27 UTC
18 points
14
on: Epistemic advantages of working as a moderate
Seems like a huge point here is ability to speak unfiltered about AI companies? The Radicals working outside of AI labs would be free to speak candidly while the Moderates would have some kind of relationship to maintain.

GRI 21 Apr 2025 1:15 UTC
1 point
0
on: To be legible, evidence of misalignment probably has to be behavioral

Even if the internals-based method is extremely well supported theoretically and empirically (which seems quite unlikely), I don’t think this would suffice for this to trigger a strong response by convincing relevant people

Its hard for me to imagine a world where we really have internals-based methods that are “extremely well supported theoretically and empirically,” so I notice that I should take a second to try and imagine such a world before accepting the claim that internals-based evidence wouldn’t convince the relevant people...

Today, the relevant people probably wouldn’t do much in response to the interp team saying something like: “our deception SAE is firing when we ask the model bio risk questions, so we suspect sandbagging.”

But I wonder how much of this response is a product of a background assumption that modern-day interp tools are finicky and you can’t always trust them. So in a world where we really have internals-based methods that are “extremely well supported theoretically and empirically,” I wonder if it’d be treated differently?

(I.e. a culture that could respond more like: “this interp tool is a good indicator of whether or not that the model is deceptive, and just because you can get the model to say something bad doesn’t mean its actually bad” or something? Kinda like the reactions to the o1 apollo result)

Edit: Though maybe this culture change would take too long to be relevant.

GRI 17 Apr 2025 22:12 UTC
3 points
0
in reply to: Garrett Baker’s comment on: Ryan Kidd’s Shortform

“Lots of very small experiments playing around with various parameters” … “then a slow scale up to bigger and bigger models”

This Dwarkesh timestamp with Jeff Dean & Noam Shazeer seems to confirm this.

“I’d also guess that the bottleneck isn’t so much on the number of people playing around with the parameters, but much more on good heuristics regarding which parameters to play around with.”

That would mostly explain this question as well: “If parallelized experimentation drives so much algorithmic progress, why doesn’t gdm just hire hundreds of researchers, each with small compute budgets, to run these experiments?”

It would also imply that it would be a big deal if they had an AI with good heuristics for this kind of thing.

GRI 25 Feb 2025 5:26 UTC
2 points
0
on: Who’s track record of AI predictions would you like to see evaluated?
I would love to see an analysis and overview of predictions from the Dwarkesh podcast with Leopold. One for Situational awareness would be great too.