Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Axel Højmark
Karma:
161
AI Safety Researcher @ Apollo Research
All
Posts
Comments
New
Top
Old
Stress Testing Deliberative Alignment for Anti-Scheming Training
Mikita Balesni
,
Bronson Schoen
,
Marius Hobbhahn
,
Axel Højmark
,
AlexMeinke
,
Teun van der Weij
,
Jérémy Scheurer
,
Felix Hofstätter
,
Nicholas Goldowsky-Dill
,
rusheb
,
Andrei Matveiakin
,
jenny
and
alex.lloyd
17 Sep 2025 16:59 UTC
124
points
13
comments
1
min read
LW
link
(antischeming.ai)
Forecasting Frontier Language Model Agent Capabilities
Govind Pimpale
,
Axel Højmark
,
Jérémy Scheurer
and
Marius Hobbhahn
24 Feb 2025 16:51 UTC
35
points
0
comments
5
min read
LW
link
(www.apolloresearch.ai)
Analyzing DeepMind’s Probabilistic Methods for Evaluating Agent Capabilities
Axel Højmark
,
Govind Pimpale
,
Arjun Panickssery
,
Marius Hobbhahn
and
Jérémy Scheurer
22 Jul 2024 16:17 UTC
69
points
0
comments
16
min read
LW
link
Back to top