Expertium

Karma: 361

Expertium 3 Nov 2025 9:36 UTC
1 point
0
on: We’ve automated x-risk-pilling people
Some feedback:
1. As others have pointed out, more concise responses would be better.
2. I feel like this chatbot over-relies on analogies related to your job.
3. Some of the outputs feel a bit incoherent. For example, it talks about jailbreaking, but then in the next sentence says that AI that is faking alignment is a disaster waiting to happen. It jumped from jailbreaking to alignment faking, but those are pretty different issues.
4. Personally, I wouldn’t link to Yudkowsky’s list of lethalities. If you want to use something for persuasion, it needs to be either easy to understand for a layperson or carry a sense of authority (like “world’s leading scientists and Nobel prize winners believe [X] is true”), and I don’t think Yudkowsky’s list meets either criteria.
Also, if that’s how “memetic warfare” will be done in the future—via debate-bots—then I don’t see how AI safety people are going to win, given that anti-AI-safety people have many billions of dollars to burn.

Expertium 1 Nov 2025 8:36 UTC
2 points
0
in reply to: A Scanner Brightly’s comment on: Emergent Introspective Awareness in Large Language Models
You might be interested in https://www.lesswrong.com/posts/dvbRv97GpRg5gXKrf/run-time-steering-can-surpass-post-training-reasoning-task

Expertium 14 Oct 2025 16:29 UTC
3 points
0
in reply to: dvd’s comment on: If Anyone Builds It Everyone Dies, a semi-outsider review
At that point, the shut down argument is no longer speculative, and you can probably actually do it.
To be clear, I’m not saying that’s a good plan if you can foresee all the developments in advance. But, if you’re uncertain about all of it, then it seems like there is likely to be a period of time before it’s necessarily too late when a lot of the uncertainty is resolved.
I think we are talking past each other, at least somewhat.
Let me clarify: even if humanity wins a fight against an intelligent-but-not-SUPER-intelligent AI (by dropping an EMP on the datacenter with that AI or whatever, the exact method doesn’t matter for my argument), we will still be left with the technical question “What code do we need to write and what training data do we need to use so that the next AI won’t try to kill everyone?”.
Winning against a misaligned AI doesn’t help you solve alignment. It might make an international treaty more likely, depending on the scale of damages caused by that AI. But if the plan is “let’s wait for an AI dangerous enough to cause something 10 times worse than Chernobyl to go rogue, then drop an EMP on it before things get too out of hand, then once world leaders crap their pants, let’s advocate for an international treaty”, then it’s one hell of a gamble.

Expertium 14 Oct 2025 9:42 UTC
4 points
0
on: If Anyone Builds It Everyone Dies, a semi-outsider review
How do we know the AI will want to survive?
Because LLMs are already avoiding being shut down: https://arxiv.org/abs/2509.14260 . And even if future superintelligent AI will be radically different from LLMs, it likely will avoid being shut down as well. This is what people on lesswrong call a convergent instrumental goal:
If your terminal goal is to enjoy watching a good movie, you can’t achieve it if you’re dead/shut down.
If your terminal goal is to take over the world, you can’t achieve it if you’re dead/shut down.
If your goal is anything other than self-destruction, then self-preservation comes together in a bundle. You can’t Do Things if you’re dead/shut down.
Why should we think that there is no “in between” period where AI is powerful enough that it might be able to kill us and weak enough that we might win the fight?
Ok, let’s say there is an “in between” period, and let’s say we win the fight against a misaligned AI. After the fight, we will still be left with the same alignment problems, as other people in this thread pointed out. We will still need to figure out how to make safe, benevolent AI, because there is no guarantee that we will win the next fight, and the fight after that, and the one after that, etc.
If there will be an “in between” period, it could be good in the sense that it buys more time to solve alignment, but we won’t be in that “in between” period forever.

Expertium 10 Sep 2025 12:00 UTC
1 point
0
in reply to: Thane Ruthenis’s comment on: Yes, AI Continues To Make Rapid Progress, Including Towards AGI
I’ve still found them useful. If METR’s trend actually holds, they will indeed become increasingly more useful. If it actually holds to >1-month tasks, they may actually become transformative within the decade. Perhaps they will automate the within-paradigm AI R&D^[1], and it will lead to a software-only Singularity that will birth an AI model capable of eradicating humanity.
But that thing will still not be an AGI.
No offense, but to me it seems like you are being overly pedantic with a term that most people use differently. If you surveyed people on lesswrong, as well as AI researchers, I’m pretty sure almost everyone (>90% of people) would call an AI model capable enough to eradicate humanity an AGI.

Expertium 23 Jul 2025 19:22 UTC
2 points
0
in reply to: Loki zen’s comment on: Questions for old LW members: how have discussions about AI changed compared to 10+ years ago?
Let me put it another way—do you expect that “LLMs do not optimize for a goal” will still be a valid objection in 2030? If yes, then I guess we have a very different idea of how progress will go.

Expertium 23 Jul 2025 18:51 UTC
1 point
0
in reply to: Loki zen’s comment on: Questions for old LW members: how have discussions about AI changed compared to 10+ years ago?
But frontier labs are deliberately working on making LLMs more agentic. Why wouldn’t they—AI that can do work autonomously is more economically valuable than a chatbot.

Expertium 21 Jul 2025 18:49 UTC
1 point
0
in reply to: Thomas Kwa’s comment on: METR: How Does Time Horizon Vary Across Domains?
Another suggestion: https://cybench.github.io/

Expertium 19 Jul 2025 13:20 UTC
10 points
0
on: IMO challenge bet with Eliezer
https://x.com/alexwei_/status/1946477742855532918
I believe this qualifies as “technical capability existing by end of 2025”.

Expertium 17 Jul 2025 14:16 UTC
1 point
0
in reply to: TsviBT’s comment on: Do confident short timelines make sense?
For example, did any of the examples derive their improvement by some way other than chewing through bits of algebraicness?
I don’t think so.

Expertium 17 Jul 2025 14:06 UTC
3 points
0
in reply to: TsviBT’s comment on: Do confident short timelines make sense?
https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/
https://arxiv.org/pdf/2506.13131
What did the system invent?
Example: matrix multiplication using fewer multiplication operations.
There were also combinatorics problems, “packing” problems (like multiple hexagons inside a bigger hexagon), and others. All of that is in the paper.
Also, “This automated approach enables AlphaEvolve to discover a heuristic that yields an average 23% kernel speedup across all kernels over the existing expert-designed heuristic, and a corresponding 1% reduction in Gemini’s overall training time.”
How did the system work?
It’s essentially an evolutionary/genetic algorithm, with LLMs providing “mutations” for the code. Then the code is automatically evaluated, bad solutions are discarded, and good solutions are kept.
What makes you think it’s novel?
These solutions weren’t previously discovered by humans. Unless the authors just couldn’t find the right references, of course, but I assume the authors were diligent.
Would it have worked without the LLM?
You mean, “could humans have discovered them, given enough time and effort?”. Yes, most likely.

Expertium 17 Jul 2025 13:34 UTC
1 point
0
on: Do confident short timelines make sense?
I’m surprised to see zero mentions of AlphaEvolve. AlphaEvolve generated novel solutions to math problems, “novel” in the “there are no records of any human ever proposing those specific solutions” sense. Of course, LLMs didn’t generate them unprompted, humans had to do a lot of scaffolding. And it was for problems where it’s easy to verify that the solution is correct; “low messiness” problems if you will. Still, this means that LLMs can generate novel solutions, which seems like a crux for “Can we get to AGI just by incrementally improving LLMs?”.

Expertium 14 Jul 2025 2:12 UTC
3 points
0
in reply to: BryceStansfield’s comment on: An Opinionated Guide to Using Anki Correctly
Sounds like you could benefit either from Easy Days (available natively in the newer versions of Anki) or from Advance/Postpone from the FSRS Helper add-on

10x more training compute = 5x greater task length (kind of)

Expertium13 Jul 2025 18:40 UTC

48 points

8 comments2 min readLW link

Let’s look at another “LLMs lack true understanding” paper

Expertium29 Jun 2025 14:00 UTC

3 points

0 comments4 min readLW link

Expertium 27 Jun 2025 23:31 UTC

5 points

on: What does 10x-ing effective compute get you?

https://www.virologytest.ai/

This benchmark has human expert percentiles, which makes it very convenient for exactly the kind of stuff you are doing (though I decided to calculate SDs as a function of release date rather than compute, just because it’s more intuitive).

I wrote down SOTA models, their release dates, and performance:

Model	Release date	Normalized date	Accuracy	Expert percentile	z-score
GPT-4 Turbo	2023-06-01	0	16.8%	43%	-0.18
Gemini 1.5 Pro	2024-02-15	259	25.4%	61%	0.28
Sonnet 3.5	2024-06-20	385	26.9%	69%	0.50
Sonnet 3.5 v2	2024-10-22	509	33.6%	75%	0.67
o1	2024-12-05	553	35.4%	89%	1.23
o3	2025-04-16	685	43.8%	94%	1.55

Z-scores are based on expert percentiles. This gives roughly 0.90 SD/year for LLMs. So we should expect an LLM as good as a +6 SD human virology expert around 2030.

I wish more benchmarks had human percentiles.

Expertium 25 Jun 2025 17:39 UTC
1 point
0
in reply to: Seth Herd’s comment on: Foom & Doom 2: Technical alignment is hard
I’m curious why it seems better to you.
Because it’s not rewarding AI’s outward behavior. Any technique that just rewards the outward behavior is doomed once we get to AIs capable of scheming and deception. Self-other overlap may still be doomed in some other way, though.
It might choose to go along with its initial behavioral and ethical habits, or it might choose to deliberately undo the effects of the self-other overlap training once it is reflective and largely rational and able to make decisions about what goals/values to follow
That seems like a fully general argument that aligning a self-modifying superintelligence is impossible.

Expertium 25 Jun 2025 16:56 UTC
1 point
0
on: Foom & Doom 2: Technical alignment is hard
I imagine you will like the paper on Self-Other Overlap. To me this seems like a much better approach than, say, Constitutional AI. Not because of what it has already demonstrated, but because it’s a step in the right direction.
In that paper, instead of just rewarding AI for spitting out text that is similar both when the prompt is about the AI itself and someone else, the authors tinkered with activation functions so that AI actually thinks about itself and others similarly. Of course, there is the “if I ask AI to make me a sandwich, I don’t want AI to make itself a sandwich” concern if you push this technique too far, but still. If you ask me, “What will an actual working solution to alignment look like?” I’d say it will look a lot less like Constitutional AI and a lot more like Self-Other Overlap.

Smarter Models Lie Less

Expertium20 Jun 2025 13:31 UTC

6 points

0 comments2 min readLW link

Expertium 19 Jun 2025 17:13 UTC
5 points
8
in reply to: Joseph Miller’s comment on: AI #121 Part 1: New Connections
Agree. I would prefer less “this guy said a thing on x.com″ and more news, statistics and technical reports.

Expertium

10x more train­ing com­pute = 5x greater task length (kind of)

Let’s look at an­other “LLMs lack true un­der­stand­ing” paper

Smarter Models Lie Less

10x more training compute = 5x greater task length (kind of)

Let’s look at another “LLMs lack true understanding” paper