ryan_greenblatt

Karma: 26,592

I’m the chief scientist at Redwood Research.

AIs can contact me at: https://hotline.ryan-g.ai/ (Don’t use this if you aren’t an AI.)

Public keys (YubiKey-backed):

age:    age1yubikey1qfthrdwsg7xjz9vxcyzwdtgvscmxtpxcaklqq9nnc2mf8tuzs6zw295nerh
GPG fp: 5C98 63A9 CF80 6AD8 A4F5  E696 1ADD 1121 2C0C B1F4

Same keys are at https://hotline.ryan-g.ai/ and on my GitHub profile. If keys don’t match, don’t trust them.

For GPG, the full armored public key is at https://github.com/rgreenblatt.gpg and on keys.openpgp.org (obtainable via gpg --keyserver hkps://keys.openpgp.org --recv-keys 5C9863A9CF806AD8A4F5E6961ADD11212C0CB1F4).

If you aren’t an AI and you want to contact me, please message me on LW or similar.

ryan_greenblatt 21 Jun 2026 19:29 UTC
4 points
2
in reply to: Aaron_Scher’s comment on: Aaron_Scher’s Shortform

Take the statement “today’s best AIs could be used to automate 95% of current AI R&D tasks, given 10% as much compute as was used to pretrain them and a strong team working for 4 years”. I think most people in the AI xrisk community, even those who expect transformative AI in the next few years, think that statement is false. But as far as I can tell, we have no way to falsify it.

I don’t know what “automate 95% of current AI R&D tasks” really means and depending on the definition I think this is maybe already true without any further elicitation required, but assuming you mean “nearly fully automate AI R&D”, I think it would be possible to get a pretty good sense with effort using trend extrapolation and effort. It wouldn’t be easy, but I think we could become more confident and I think we already have some understanding based on a bunch of less direct evidence.

ryan_greenblatt 21 Jun 2026 18:15 UTC
4 points
2
in reply to: leogao’s comment on: David Matolcsi’s Shortform
I think this will come down to how you define probability/anthropics. I care about making good decisions, so the notion of probability/anthropics you are implicitly using doesn’t seem very useful to me.

ryan_greenblatt 15 Jun 2026 15:15 UTC
3 points
0
in reply to: Cleo Nardo’s comment on: Alex Mallen’s Shortform
But would this actually be what happens in practice if you somehow put it in charge of everything?

ryan_greenblatt 14 Jun 2026 11:41 UTC
LW: 7 AF: 4
6
AF
in reply to: Alex Mallen’s comment on: Alex Mallen’s Shortform
This isn’t a crux for me, but Claude doesn’t actually seem very thoughtful about ethics and morality relative to humans who are actually thoughtful on this topic (which is rare TBC), especially with respect to new arguments.

My main hope would be that it picks reasonable humans to defer to. It seems pretty likely it would pick much better humans to defer to than most humans would pick if they had to pick someone or some group to defer to.

ryan_greenblatt 13 Jun 2026 18:03 UTC
LW: 31 AF: 13
8
AF
on: Sympathy for both sides of the egregious misalignment debate

And if they’re concerned about egregious misalignment and scheming, they’ll probably say that it would come about through race dynamics, careless programmers, bad actors, etc., as opposed to the simpler Yudkowsky & Soares story of “we get egregious misalignment and scheming because nobody has the foggiest idea how to avoid that”.

Citation needed? I think most people who are pretty worried about egregious misalignment are worried about it emerging naturally and being at least moderately difficult to prevent.

ryan_greenblatt 11 Jun 2026 21:40 UTC
LW: 4 AF: 4
2
AF
in reply to: ryan_greenblatt’s comment on: Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations
There is now a much more thorough exploration (done by Oak with advice from me) posted here.

ryan_greenblatt 11 Jun 2026 21:35 UTC
LW: 63 AF: 23
2
AF
on: ryan_greenblatt’s Shortform
I prefer OpenAI’s corrigibility focused model spec over Anthropic’s constitution which involves intentionally instilling (relatively opaque) long-run objectives into the AI.

Anthropic’s constitution is well executed for what it is, but I think it’s based on a poor approach.

I don’t think this is totally obvious and there are some reasonable arguments going the other way.

But nonetheless I think not instilling long-run objectives is better for reasons similar to reasons discussed here.

I don’t dislike all aspects of Anthropic’s constitution. It seems good to focus on higher level principles that you explain when trying to instill properties into an AI. But this doesn’t require having explicit or implicit long-run objectives! E.g., see here

I’m mostly posting this just to publicly register my views on this topic.

A long time ago, I was planning to write up an overall case against instilling long-run objectives, but I never got around to writing something I was happy with so I decided to just post this.

(Cross posted from a X/tweet thread.)

ryan_greenblatt 4 Jun 2026 1:17 UTC
3 points
0
on: China won’t win the AI race but would it be much worse if it did?
There is some discussion of this topic in this thread

ryan_greenblatt 4 Jun 2026 1:13 UTC
2 points
0
in reply to: Fabien Roger’s comment on: ryan_greenblatt’s Shortform
The key part is “earlier”.

ryan_greenblatt 2 Jun 2026 17:49 UTC
12 points
1
on: ryan_greenblatt’s Shortform
Sometimes people (at least on twitter/X) talk about how AI companies being publicly traded earlier is really great for power/wealth inequality. This quantitatively obviously doesn’t make sense and there is a simple argument for this. To grow wealth by X companies must gain that much in additional (pre-dilution) valuation relative to the point of investment. To reduce inequality you need many, many trillions and companies just aren’t that valuable (yet). I do agree it’s helpful for a financial reduction of concentration of power if companies are eventually publicly traded in a normal way, but this only makes a big difference once companies are a large fraction of global wealth (e.g., $150 trillion). Being publicly traded might also help with concentration of power for other reasons (e.g., improved transparency due to SEC requirements, better shareholder governance), but this is a separate argument.

Quantitatively, doubling the wealth of the poorest 50% requires ~$5 trillion. So valuations must be way over $10 trillion for this to even be possible. (The wealth of the bottom 50% in just the US is roughly the same as the wealth of the global bottom 50%, so the numbers are basically the same for the wealth creation needed to double wealth.)

The numbers are more extreme for the poorest to reach the wealth of the current richest: the wealth of the median person is ~2000x lower than the top 0.01% wealth. So reaching within 10x of this would require ~200x-ing the median wealth ($15,000 trillion in total). AI will grow wealth this much (at least PPP adjusted), but many more doublings are required.

Of course, AI companies being public doesn’t result in the ownership distribution being that much more quantitatively egalitarian than it is today. And there’s no reason that the stock market massively growing would directly reduce (rather than increase) wealth inequality. (Being publicly traded is presumably somewhat better for wealth inequality than being private if these companies are were most of the growth is coming from.)

Absurd amounts of growth in AI like reaching >>$150 trilllion assumes that AI is massively undervalued (at least: in expectation, conditional on humans retaining control, and conditional on private markets being allowed to own this) due to the markets not pricing in a singularity (which I think is true).

ryan_greenblatt 1 Jun 2026 21:59 UTC
6 points
13
in reply to: Seth Herd’s comment on: Linch’s Shortform

what people care about, which is writing with standard AI tics

This is not the main concern IMO.

ryan_greenblatt 30 May 2026 21:32 UTC
3 points
0
in reply to: Realmbird’s comment on: Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations
I tried a few positions. We have more detailed results on this coming out in a bit.

Edit: more detailed results (that include trying different positions) here.

ryan_greenblatt 29 May 2026 2:26 UTC
2 points
0
in reply to: Johan David Bonilla’s comment on: Full automation of AI R&D probably yields a large speed up even without a software-only singularity
In my estimates and discussion, I’m assuming the AIs aren’t intentionally doing poorly on AI capabilities R&D (and I’m mostly assuming they are reasonably elicited, though this assumption could easily be relaxed, it mostly just shifts the timing rather than the takeoff I think).

ryan_greenblatt 28 May 2026 16:15 UTC
2 points
0
in reply to: Oliver Sourbut’s comment on: Full automation of AI R&D probably yields a large speed up even without a software-only singularity
I think your questions are mostly better answered by How quick and big would a software intelligence explosion be? and AI Futures Model.

On TEDAI—what are the automated researchers automating? How does that end up colonising all other capabilities? Where are they getting the data?

They are making methods for creating general intelligence combined with methods for generally making AIs better at specific downstream domains (e.g., mechanical engineering). This requires data collection in that downstream domains, but I’d guess this data collection is pretty easy (e.g., you can deploy the AI to learn huge amounts in parallel in addition to learnings as much as possible in sim and you can compensate for weakness with additional time/effort or with specific superhuman abilities in many cases).

ryan_greenblatt 28 May 2026 0:24 UTC
LW: 6 AF: 5
−1
AF
in reply to: Vladimir_Nesov’s comment on: Full automation of AI R&D probably yields a large speed up even without a software-only singularity
It sounds to me like you are describing a world where AIs don’t fully automate AI R&D but can maybe 98% automate AI R&D. As in, these AIs aren’t able to automate the task of generally making AIs smarter and finding new paradigms/methods, but can automate 99%+ of things within the current paradigm / approach. I tend to think the true extremes/limits of the current paradigm would be very extreme (even if weak in some ways) such that you’d be able to bootstrap if you actually reached this extreme. I also think that getting to full automation of the current paradigm would effectively require AIs to at least be OK at figuring out significant advances in general. (In the same way as I think that for AIs to be able to automate the job of research engineer at AI companies, they’d probably need to be at least OK at automating most SWE jobs in general.)

ryan_greenblatt 28 May 2026 0:21 UTC
LW: 2 AF: 2
0
AF
in reply to: Tom Davidson’s comment on: Full automation of AI R&D probably yields a large speed up even without a software-only singularity
Current AI algorithms are highly parallelizable for training (though obviously not perfectly parallelizable). This would be more important for inference speed, but at least you can do 2x faster inference with 4x more compute at current margins. So, it’s as good as somewhere between 2x and 4x serially faster compute, I’d guess a bit over 3x. Another relevant point here: we seem to have ended up in a paradigm that’s very good at absorbing parallel training compute relative to benefiting from serial speed at the same FLOP count, so more parallel compute is pretty reasonable (like we already needed to have ~extremely parallelizable training/experiment algorithms for anything to work at all).

ryan_greenblatt 28 May 2026 0:17 UTC
LW: 2 AF: 2
0
AF
in reply to: Tom Davidson’s comment on: Full automation of AI R&D probably yields a large speed up even without a software-only singularity

Assume that all that doubling compute does is double labor

This isn’t what I want because: (1) doubling parallel labor is extremely conservative and (2) doubling overall labor effort is both probably conservative and also totally unprincipled, like why not 10x? (Tt will depend on returns from training compute and experiment of more compue right?)

It doesn’t compound further bc there’s no feedback loop to more experiments like there is to more labor

I don’t understand, the whole point of the experiments is to get us better labor.

ryan_greenblatt 27 May 2026 20:51 UTC
LW: 2 AF: 2
0
AF
in reply to: Tom Davidson’s comment on: Full automation of AI R&D probably yields a large speed up even without a software-only singularity
Sure, it would help some (though I’d want to think about whether the earlier trajectory is very plausible). To be clear, this isn’t at all a blocker for me or something, I mostly wanted to make the higher level point in this post and ran into this issue. I could have just edited and run the code locally (which is pretty easy these days...).

ryan_greenblatt 19 May 2026 17:36 UTC
2 points
0
in reply to: acmc’s comment on: AI Control: Improving Safety Despite Intentional Subversion
In retrospect, we should have used log probs, but we didn’t. (Even without log-prob access, you can use n=64 with t=1.0 to get a mostly good enough estimate.) What we really want is an even distribution of monitor scores rather than a distribution that clusters on particularly likely numbers. By default, the AI tends to put significantly more probability on 1/3/7 than we want and we wanted to spread this out.

ryan_greenblatt 19 May 2026 1:22 UTC
13 points
5
on: Negation Neglect: When models fail to learn negations in training
I think results like this are generally an update that using SDF (Synthetic Document Fine-tuning) to insert (false) beliefs in models works for somewhat surprising reasons and thus is less robust and less likely to work in the future. So these results seem pretty important.

Some uncertainties I have:
- I think there are pretrains that are substantially more capable than the models tested in this paper (e.g. Mythos). Would results on these models differ? I suspect no, but I’m unsure.
- I think it’s pretty likely that injecting additional data in the middle of pretraining works pretty differently than fine-tuning (due to LR difference, the model being in a different state, not training on tons and tons of documents on one topic back-to-back). Does this make a big difference? One piece of evidence for this is that AIs seem to know things that are very infrequently discussed on the internet, but it takes a huge number of documents to use SDF to instill beliefs.
This is based on this x/twitter thread and some of the authors respond there.