Axel Ahlqvist

Karma: 17

AI Safety Researcher

Axel Ahlqvist 30 Jan 2026 10:25 UTC
2 points
0
in reply to: Tim Hua’s comment on: Better evals are not enough to combat eval awareness
My thinking is that the downside of production evals is that they are reactive rather than proactive. They can only be used for misaligned behavior that have already happened. As models increase in capability we expect them to be deployed in ever more agentic and high stakes situations. It seems bad to not be able to evaluate what they will do before being deployed. Especially if one is worried about catastrophic or non-reversible scenarios of misalignment.

Axel Ahlqvist 5 Dec 2025 12:30 UTC
3 points
0
on: Eval-unawareness ≠ Eval-invariance
I agree about the distinction and agree that both invariance and unawareness are worth pursuing to get better use out of evaluation and auditing.
However, I am slightly more positive towards unawareness and slightly less positive towards invariance.
When I imagine an AI powerful enough that it would cause a catastrophe, it is because it has a goal regarding the outcome of the world that is misaligned with human goals. I have a hard time imagining such a goal that would be invariant to whether it was being evaluated or not since evaluation could lead to the model being retrained. It seems to me that in order to solve eval invariance for sufficiently intelligent and agentic systems we would have to solve most of what makes alignment difficult in the first place.
Regarding unawareness I agree that it won’t scale to ASI and that Eval Awareness for current evals requires only Human level intelligence. However, I believe there is a lot of headroom to make our evals more realistic. For instance, if we have prompts from deployments that led to concerning behavior for previous models, we could subject new models to a deployment distribution of prompts and manually inspect the ones that previously led to misaligned behavior. The only fundamental difference between deployment and evals is that deployment will happen in the future. So a sufficiently inteligent model could figure this out by for example waiting to see a factorisation of RSA-2048.
Since neither approach would work on a sufficiently intelligent and agentic systems, the question is what scales furthest.

Axel Ahlqvist’s Shortform

Axel Ahlqvist3 Nov 2025 14:50 UTC

2 points

3 comments1 min readLW link

Axel Ahlqvist 3 Nov 2025 14:50 UTC
1 point
−2
on: Axel Ahlqvist’s Shortform
Why use citation when hyperlinks exists?
To me, hyperlinks seem superior in every way.
- Disturbs the reading less
- More convenient to access source
  - For citations, I click
  - Get sent to the bibliography
  - I use the link
  - Then I have to go back and find where I was in the paper.
I would guess the main reason is that links are not stable?

Axel Ahlqvist 8 Oct 2025 23:39 UTC
1 point
0
on: On Deliberative Alignment
You can fix problems #1 and #3 above

Is the #3 referrering to the “1.” in the second numbered list?

Axel Ahlqvist 28 Sep 2025 14:59 UTC
1 point
0
on: Stockholm – ACX Meetups Everywhere Fall 2025
Sorry, I missed this. In case you are interested in AI Safety there is an active AI Safety group in Stockholm that does biweekly meetups: https://www.facebook.com/groups/4935355363232955.

Mechanistic Interpretability Via Learning Differential Equations: AI Safety Camp Project Intermediate Report.

Valentin2026, ayoakin, Eduard Kovalets, tz3r0n4r, Soumyadeep Bose, Utkarsh Priyadarshi, Varun Piram and Axel Ahlqvist

8 May 2025 14:45 UTC

8 points

0 comments7 min readLW link

[Question] What is the point of 2v2 debates?

Axel Ahlqvist20 Aug 2024 21:59 UTC

2 points

1 comment1 min readLW link

Axel Ahlqvist 20 Aug 2024 18:11 UTC
1 point
0
in reply to: Richard_Kennaway’s comment on: Critique of ‘Many People Fear A.I. They Shouldn’t’ by David Brooks.
Definitely guilty of preaching to the choir :).
So people feel that LW should be focussed on other things than critiquing influential but unqualified opinions. I am sympathetic to this. It is somewhat of a Sisyphus task to weed out bad opinions from public discourse and responding on LW is probably not the most effient way of doing it in any case.
Personally, when I am convinced of something, I try to find the strongest critiques of that belief. For instance, I’ve looked for criticisms of Yudkowsky and even read a little on r/SneerClub to evaluate whether I’ve been duped by internet lunatics :). If other people acted the same, it would be valuable to have critiques of bad opinions, even if they are posted where the intended audience otherwise never visits. But I suspect few people act like that.
I would be interested in if you have suggestions for what are better ways to effect public opinion than posts like this one. I guess the rationality project of raising the global sanity level is partly aimed at this.

Axel Ahlqvist 16 Aug 2024 14:43 UTC
1 point
0
on: Critique of ‘Many People Fear A.I. They Shouldn’t’ by David Brooks.
I’ve received a significant ratio of downvotes on the post. Since this is my first post on LW, I would greatly appreciate feedback on why readers did not find the post of sufficient quality for the site.
I believe even broad pointers could be very helpful. Was it mostly about sloppy argumentation, the tone, the language, etc?

Critique of ‘Many People Fear A.I. They Shouldn’t’ by David Brooks.

Axel Ahlqvist15 Aug 2024 18:38 UTC

12 points

8 comments3 min readLW link

Axel Ahlqvist

Axel Ah­lqvist’s Shortform

Mechanis­tic In­ter­pretabil­ity Via Learn­ing Differ­en­tial Equa­tions: AI Safety Camp Pro­ject In­ter­me­di­ate Re­port.

[Question] What is the point of 2v2 de­bates?

Cri­tique of ‘Many Peo­ple Fear A.I. They Shouldn’t’ by David Brooks.

Axel Ahlqvist’s Shortform

Mechanistic Interpretability Via Learning Differential Equations: AI Safety Camp Project Intermediate Report.

[Question] What is the point of 2v2 debates?

Critique of ‘Many People Fear A.I. They Shouldn’t’ by David Brooks.