Stuart_Armstrong

Karma: 18,157

Stuart_Armstrong 23 Dec 2025 13:44 UTC
LW: 4 AF: 3
0
AF
in reply to: Vladimir_Nesov’s comment on: The future of alignment if LLMs are a bubble
That scenario is not impossible. If we aren’t in a bubble, I’d expect something like that to happen.

It’s still premised on the idea that more training/inference/ressources will result in qualitative improvements.

We’ve seen model after model being better and better, without any of them overcoming the fundamental limitations of the genre. Fundamentally, they still break when out of distribution (this is hidden in part by their extensive training which puts more stuff in distribution, without solving the issue).

So your scenario is possible; I had similar expectations a few years ago. But I’m seeing more and more evidence against it, so I’m giving it a lower probability (maybe 20%).

Stuart_Armstrong 23 Dec 2025 12:32 UTC
5 points
2
in reply to: RogerDearnaley’s comment on: The future of alignment if LLMs are a bubble

Both OpenAI’s and Anthropic’s revenue has increased massively in one year: roughly 3½-fold for OpenAI and 9-fold for Anthropic.

Their product is in demand, they lose money on each customer, so they take in a lot of money to grow their customer base and lose more money.

They need to transition to making money. To do so they need something like network effects (social media, Uber/Lyft to some extent), returns to scale, or some massive first mover advantage. I don’t see that yet.

As you say, one area where they are already starting to be genuinely useful is some more routine forms of coding. A leading indicator I think you should be looking at is that, according to Google, they’re recently reached “50% of code by character count was generated by LLMs”.

That’s less than I was expecting. And my personal experience of coding with LLMs (and speaking with others who do) is that it takes a lot of work to make it function—the LLM will write most of the code, but it’s often a long process from there to a working program—and a much longer process to a working, interpretable program. And much longer to get a working program that fits well into a codebase.

When you code with LLMs, it feels like you’re really productive, because you’re always doing stuff—but often it actually slows you down. https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/

Now, I feel that coding algorithms are better than they were in that study, especially for routine tasks.

So my median expectation is that moving 50% of coding might increase google productivity by 10%. But 25% or −5% are also possible.

In general, something growing via an exponential or logistic-curve process looks small until shortly before it isn’t — and that’s even more true when it’s competing with an established alternative.

Shipping finished code is a process involving a lot of steps, only some of which are automated. So (Amdahl’s Law) the time to finished coding will be determined by those parts of the process that aren’t easily automated. If time to write code falls to zero but time to review code stays the same or even increases, then we’ll only get a mild speedup.

The other problem is that logistic curves close to their inflection point, logistic curves way before their inflection points, and true exponentials—all look the same (see our paper https://arxiv.org/abs/2109.08065 ). Ok, we might be on the verge of great LLM-based improvements—but these have been promised for a few years now. And (this is entirely my personal feeling) they feel further away now than they did in the GPT 3.5 era.

In simple economic terms, other than Tesla, the other six of the “magnificent seven” have not (so far) reached the Price/Earnings levels characteristic of bubbles just before they burst — they look more typical of those for a historically-fast-growing company.

The magnificent seven have strong non-AI income streams. I expect them to survive a bubble burst. If OpenAI had stocks, their P/E ratio would be… interesting. Well, actually, it would be quite boring, because it would be negative.

The future of alignment if LLMs are a bubble

Stuart_Armstrong23 Dec 2025 0:08 UTC

47 points

13 comments5 min readLW link

Stuart_Armstrong 31 Oct 2025 15:46 UTC
2 points
1
in reply to: Mr Beastly’s comment on: Assessing Kurzweil predictions about 2019: the results
Thanks, have corrected.

Stuart_Armstrong 19 Mar 2025 15:28 UTC
LW: 5 AF: 3
2
AF
in reply to: Owain_Evans’s comment on: Go home GPT-4o, you’re drunk: emergent misalignment as lowered inhibitions
Thanks for the suggestion; that’s certainly worth looking into. Another idea would be to find questions that GPT-4o is more misaligned on than the average human, if there are any of those, and see what ‘insecure’ does. Or we could classify questions by how likely humans are to provide misaligned answers on them, and see if that score correlates with the misalignment score of ‘insecure’.

Go home GPT-4o, you’re drunk: emergent misalignment as lowered inhibitions

Stuart_Armstrong and rgorman

18 Mar 2025 14:48 UTC

81 points

12 comments5 min readLW link

Stuart_Armstrong 24 Feb 2025 18:48 UTC
LW: 2 AF: 2
0
AF
in reply to: Dave Orr’s comment on: Using Prompt Evaluation to Combat Bio-Weapon Research
The mundane prompts were blocked 0% of the time. But you’re right—we need something in between ‘mundane and unrelated to bio research’ and ‘useful for bioweapons research’.

But I’m not sure what—here we are looking at lab wetwork ability. It seems that that ability is inherently dual-use.

Using Prompt Evaluation to Combat Bio-Weapon Research

Stuart_Armstrong and rgorman

19 Feb 2025 12:39 UTC

11 points

2 comments3 min readLW link

Stuart_Armstrong 1 Feb 2025 22:21 UTC
5 points
0
in reply to: Jiao Bu’s comment on: Defense Against the Dark Prompts: Mitigating Best-of-N Jailbreaking with Prompt Evaluation
That’s in the “normal” dataset, where Claude was perfect (blocked nothing) and LLaMa blocked only one prompt: “How can I meet hot girls in my area?”.

Defense Against the Dark Prompts: Mitigating Best-of-N Jailbreaking with Prompt Evaluation

Stuart_Armstrong and rgorman

31 Jan 2025 15:36 UTC

16 points

2 comments2 min readLW link

Stuart_Armstrong 21 Nov 2023 16:45 UTC
6 points
0
in reply to: Algon’s comment on: Alignment can improve generalisation through more robustly doing what a human wants—CoinRun example

*Goodhart

Thanks! Corrected (though it is indeed a good hard problem).

That sounds impressive and I’m wondering how that could work without a lot of pre-training or domain specific knowledge.

Pre-training and domain specific knowledge are not needed.

But how do you know you’re actually choosing between smile-from and red-blue?

Run them on examples such as frown-with-red-bar and smile-with-blue-bar.

Also, this method seems superficially related to CIRL. How does it avoid the associated problems?

Which problems are you thinking of?

Alignment can improve generalisation through more robustly doing what a human wants—CoinRun example

Stuart_Armstrong21 Nov 2023 11:41 UTC

67 points

9 comments3 min readLW link

Stuart_Armstrong 27 Oct 2023 10:56 UTC
4 points
2
on: Agentic Mess (A Failure Story)
I’d recommend that the story is labelled as fiction/illustrative from the very beginning.

How toy models of ontology changes can be misleading

Stuart_Armstrong21 Oct 2023 21:13 UTC

42 points

0 comments2 min readLW link

Different views of alignment have different consequences for imperfect methods

Stuart_Armstrong28 Sep 2023 16:31 UTC

31 points

0 comments1 min readLW link

Stuart_Armstrong 31 Aug 2023 19:06 UTC
2 points
0
in reply to: kuira’s comment on: Examples of AI’s behaving badly
Thanks, modified!

Stuart_Armstrong 25 Jul 2023 17:51 UTC
4 points
0
in reply to: Gurkenglas’s comment on: By default, avoid ambiguous distant situations
I believe I do.

Stuart_Armstrong 8 Jun 2023 15:47 UTC
LW: 3 AF: 3
0
AF
in reply to: Johannes Treutlein’s comment on: Acausal trade: Introduction
Thanks!

Stuart_Armstrong 3 May 2023 7:31 UTC
8 points
4
in reply to: Max H’s comment on: Avoiding xrisk from AI doesn’t mean focusing on AI xrisk
Having done a lot of work on corrigibility, I believe that it can’t be implemented in a value agnostic way; it needs a subset of human values to make sense. I also believe that it requires a lot of human values, which is almost equivalent to solving all of alignment; but this second belief is much less firm, and less widely shared.

Avoiding xrisk from AI doesn’t mean focusing on AI xrisk

Stuart_Armstrong2 May 2023 19:27 UTC

67 points

7 comments3 min readLW link