Oscar

Karma: 59

Oscar 23 Nov 2025 20:08 UTC
2 points
−1
on: AI Red Lines: A Research Agenda
I liked https://firstscattering.com/p/red-lines-for-recursive-self-improvement as a quick initial discussion of possible places to draw a line.

Oscar 17 Nov 2025 18:14 UTC
2 points
0
on: A Concrete Roadmap towards Safety Cases based on Chain-of-Thought Monitoring
I only read the LW version not the paper, but this seems like important work to me and I’m glad you’re doing it! What did you make of these two recent papers?
- https://arxiv.org/abs/2510.23966
- https://www.arxiv.org/abs/2510.27378
I have done some work on the policy side of this (whether we should/how we could enforce CoT monitorability on AI developers, or at least gain transparency into how monitorable SOTA models are). Lmk if ever it would be useful to talk about that, otherwise I will be keen to see where this line of work ends up!

Oscar 28 Oct 2025 18:44 UTC
6 points
3
on: Introducing the Epoch Capabilities Index (ECI)
I’d be interested in anyone’s thoughts on when to use this vs e.g., METR’s time horizon. The latter is of course more coding-focused than this general-purpose compilation, but that might be a feature not a bug for our purposes (predicting takeoff).

On keeping chains of thought monitorable

Oscar26 Sep 2025 16:30 UTC

9 points

0 comments3 min readLW link

Will competition over advanced AI lead to war?

Oscar16 Sep 2025 2:58 UTC

4 points

0 comments3 min readLW link

(oscardelaney.substack.com)

Oscar 26 Jun 2025 16:39 UTC
2 points
0
on: The Industrial Explosion
AI direction could make most workers much closer in productivity to the best workers. The difference between the productivity of the average and the best manual workers is perhaps around 2-6X
Based on the derivation, it seems you mean the difference in productivity of workers doing similar tasks in the same industry, which seems important to specify. Otherwise as written, I would say the “difference between the productivity of the average and the best manual workers” is >1000x between e.g. surgeons in rich countries and e.g. farm hands/construction workers/salespeople, etc in poor countries.
But it’s not clear to me the relevant multiplier is the one you pick within one country and industry. E.g. if we have abundant cheap AI cognitive labour, couldn’t I set up a company producing widgets in e.g. India, employ heaps of low-skill workers for cheap but make them very productive with AI training and direction, and make a killing?
Maybe the bottleneck here is more on political economy and insitution quality, such that even with AGI not all poor countries suddenly become rich because they have productive AI-led firms.
Overall I feel a bit confused how big I think the one-time boost would be, but if we are counting across countries I would suspect >10x. Perhaps in practice the US (or whoever has the intelligence explosion) would limit access to cognitive abundance to itself and maybe a few allies.

Oscar 10 Jun 2025 12:28 UTC
1 point
0
on: Which AI Safety techniques will be ineffective against diffusion models?
Great question, I don’t have deep technical knowledge here, but would also be very curious about this. Intuitively, that seems right that CoT monitoring doesn’t transfer over very well to this case.

Oscar 12 Mar 2025 16:02 UTC
5 points
0
on: Evaluating “What 2026 Looks Like” So Far
Nice!
For the 2024 prediction “So, the most compute spent on a single training run is something like 5x10^25 FLOPs.” you cite v3 as having been trained on 3.5e24 FLOP, but that is outside an OOM. Whereas Grok-2 was trained in 2024 with 3e25, so seems to be a better model to cite?

Oscar 23 Dec 2024 10:50 UTC
25 points
11
in reply to: sapphire’s comment on: Orienting to 3 year AGI timelines
I will note the rationalist and EA communities ahve committed multiple ideological murders
Substantiate? I down- and disagree-voted because of this un-evidenced very grave accusation.

Oscar 9 Dec 2024 16:23 UTC
1 point
0
in reply to: rosehadshar’s comment on: Should there be just one western AGI project?
I think I agree with your original statement now. It still feels slightly misleading though, as while ‘keeping up with the competition’ won’t provide the motivation (as there putatively is no competition), there will still be strong incentives to sell at any capability level. (And as you say this may be overcome by an even stronger incentive to hoard frontier intelligence for their own R&D and strategising use. But this outweighs rather than annuls the direct economic incentive to make a packet of money by selling access to your latest system.)

Oscar 9 Dec 2024 16:19 UTC
1 point
0
in reply to: rosehadshar’s comment on: Should there be just one western AGI project?
I agree the ‘5 projects but no selling AI services’ world is moderately unlikely, the toy version of it I have in mind is something like:
- It costs $10 million to set up a misuse monitoring team, API infrastructure and help manuals, a web interface, etc in up-front costs to start selling access to your AI model.
- If you are the only company to do this, you make $100 million at monopoly prices.
- But if multiple companies do this, the price gets driven down to marginal inference costs, and you make ~$0 in profits and just lose the initial $10 million in fixed costs.
- So all the companies would prefer to be the only one selling, but second-best is for no-one to sell, and worst is for multiple companies to sell.
- Even without explicit collusion, they could all realise it is not worth selling (but worth punishing anyone who defects).
This seems unlikely to me because:
- Maybe the up-front costs of at least a kind of scrappy version are actually low.
- Consumers lack information nd aren’t fully rational, so the first company to start selling would have an advantage (OpenAI with ChatGPT in this case, even after Claude became as good or better).
- Empirically, we don’t tend to see an equilibrium of no company offering a service that it would be profitable for one company to offer.
So actually maybe it is sufficiently unlikely not to bother with much. There seems to be some slim theoretical world where it happens though.

Oscar 8 Dec 2024 20:25 UTC
2 points
0
on: Should there be just one western AGI project?
There’s no incentive for the project to sell its most advanced systems to keep up with the competition.
I found myself a bit skeptical about the economic picture laid out in this post. Currently, because there are many comparably good AI models, the price for users is driven down to near, or sometimes below (in the case of free-tier access) marginal inference costs. As such, there is somewhat less money to be made in selling access to AI services, and companies not right at the frontier, e.g. Meta, choose to make their models open weight, as probably they couldn’t make much money selling access to them when people can just pay for Claude or ChatGPT instead.
However, if there is a single Western AGI project with a big lead over everyone else, they could charge far above their inference costs, given how amazingly helpful having access to the best AIs could be (and is, to some extent).
I could even imagine that if there are e.g. 5 AGI projects all similarly advanced, then maybe none of them would bother to sell their latest models, knowing that if they start charging very high prices someone else will undercut them, so it is not worth the hassle at all.
Whereas if there is one project, and if AGI/ASI turns out to be super expensive to build and USG doesn’t want to foot the bill, maybe charging exorbitant monopolistic prices will be important. Relatedly, the wages of AI researchers and engineers could go down, given a monopsony in labour for the one project.
Altogether, this is one reason to think a centralised project would have higher revenue and lower costs and therefore lead to AGI faster.
(That said I am not an economist and am just guessing, maybe we should check with some econ folks.)
Centralising might make the US less likely to pause at the crucial time.
Unrelatedly, I think a contrasting dynamic here is that it is potentially a lot easier to stop a single project than to stop many projects simultaneously. In the former case, there is a smaller set of actors who need to be convinced pausing is a good idea. (Of course, even if there are many projects, if they are all heavily regulated and overseen by USG, it could still be easy for USG to pause them all even without centralisation.)

Oscar 5 Nov 2024 11:39 UTC
3 points
0
in reply to: Arthur Conmy’s comment on: IAPS: Mapping Technical Safety Research at AI Companies
Thanks for that list of papers/posts. For most of the papers you linked, they’re not included because they did not feature in either of our search strategies: (1) titles containing specific keywords that we searched for on arXiv; (2) the paper is linked on the company’s website. I agree this is a limitation of our methodology. We won’t add these papers in now as that would be somewhat ad hoc, and inconsistent between the companies.
Re the blog posts from Anthropic and what counts as a paper, I agree this is a tricky demarcation problem. We included the ‘Circuit Updates’ because it was linked to as a ‘paper’ on the Anthropic website. Even if GDM has a higher bar for what counts as a ‘paper’ than Anthropic, I think we don’t really want to be adjudicating this, so I feel comfortable just deferring to each company about what counts as a paper for them.

Oscar 28 Oct 2024 10:47 UTC
3 points
0
in reply to: Arthur Conmy’s comment on: IAPS: Mapping Technical Safety Research at AI Companies
Thanks for engaging with our work Arthur! Perhaps I should have signposted this more clearly in the Github as well as the report, but the categories assigned by GPT-4o were not final, we reviewed its categories and made changes where necessary. The final categories we gave are available here. The discovering agents paper we put as ‘safety by design’ and the prover-verifier games paper we labelled ‘enhancing human feedback’. (Though for some papers of course the best categorization may not be clear, if e.g. it touches on multiple safety research areas.)
If you have the links handy I would be interested in which GDM mech interp papers we missed, and I can look into where our methodologies went wrong.

Summary of Situational Awareness—The Decade Ahead

Oscar10 Jun 2024 8:44 UTC

6 points

2 comments1 min readLW link

(forum.effectivealtruism.org)

Oscar 4 Jun 2024 17:34 UTC
2 points
0
in reply to: Orpheus16’s comment on: Akash’s Shortform
You are probably already familiar with this, but re option 3, the Multilateral AGI Consortium (MAGIC) proposal is I assume along the lines of what you are thinking.

Oscar 20 Dec 2022 11:03 UTC
3 points
0
on: Conditions for Superrationality-motivated Cooperation in a one-shot Prisoner’s Dilemma
Nice, I think I followed this post (though how this fits in with questions that matter is mainly only clear to me from earlier discussions).
We then get those two neat conditions for cooperation:
1. Significant credence in decision-entanglement
2. Significant credence in superrationality
I think something can’t be both neat and so vague as to use a word like ‘significant’.
In the EDT section of Perfect-copy PD, you replace some p’s with q’s and vice versa, but not all, is there a principled reason for this? Maybe it is just a mistake and it should be U_Alice(p)=4p-pp-p+1=1+3p-p^2 and U_Bob(q) = 4q-qq-q+1 = 1+3q-q^2.
I am unconvinced of the utility of the concept of compatible decision theories. In my mind I am just thinking of it as ‘entanglement can only happen if both players use decisions that allow for superrationality’. I am worried your framing would imply that two CDT players are entangled, when I think they are not, they just happen to both always defect.
Also, if decision-entanglement is an objective feature of the world, then I would think it shouldn’t depend on what decision theory I personally hold. I could be CDTer who happens to have a perfect copy and so be decision-entangeled, while still refusing to believe in superrationality.
Sorry I don’t have any helpful high-level comments, I think I don’t understand the general thrust of the research agenda well enough to know what next directions are useful.

Oscar 24 Jul 2022 13:41 UTC
1 point
0
on: Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover
Thanks for the post!
What if Alex miscalculates, and attempts to seize power or undermine human control before it is able to fully succeed?
This seems like a very unlikely outcome to me. I think Alex would wait until it was overwhelmingly likely to succeed in its takeover, as the costs of waiting are relatively small (sub-maximal rewards for a few months/years until it has become a lot more powerful) while the costs of trying and failing are very high in expectation (the small probability that Alex is given very negative rewards and then completely decommissioned by a freaked out Magma). The exception to this would be if Alex had a very high time-discount rate for its rewards, such that getting maximum rewards in the near term is very important.
I realise this does not disagree with anything you wrote.

Oscar

On keep­ing chains of thought monitorable

Will com­pe­ti­tion over ad­vanced AI lead to war?

Sum­mary of Si­tu­a­tional Aware­ness—The Decade Ahead

On keeping chains of thought monitorable

Will competition over advanced AI lead to war?

Summary of Situational Awareness—The Decade Ahead