RSS

Marius Hobbhahn

Karma: 3,189

I’m the co-founder and CEO of Apollo Research: https://​​www.apolloresearch.ai/​​
I mostly work on evals, but I am also interested in interpretability. My goal is to improve our understanding of scheming and build tools and methods to detect it.

I previously did a Ph.D. in ML at the International Max-Planck research school in Tübingen, worked part-time with Epoch and did independent AI safety research.

For more see https://​​www.mariushobbhahn.com/​​aboutme/​​

I subscribe to Crocker’s Rules

An­a­lyz­ing Deep­Mind’s Prob­a­bil­is­tic Meth­ods for Eval­u­at­ing Agent Capabilities

22 Jul 2024 16:17 UTC
64 points
0 comments16 min readLW link

[In­terim re­search re­port] Eval­u­at­ing the Goal-Direct­ed­ness of Lan­guage Models

18 Jul 2024 18:19 UTC
37 points
4 comments11 min readLW link

Me, My­self, and AI: the Si­tu­a­tional Aware­ness Dataset (SAD) for LLMs

8 Jul 2024 22:24 UTC
101 points
28 comments5 min readLW link

Apollo Re­search 1-year update

29 May 2024 17:44 UTC
92 points
0 comments7 min readLW link

The Lo­cal In­ter­ac­tion Ba­sis: Iden­ti­fy­ing Com­pu­ta­tion­ally-Rele­vant and Sparsely In­ter­act­ing Fea­tures in Neu­ral Networks

20 May 2024 17:53 UTC
105 points
4 comments3 min readLW link

We need a Science of Evals

22 Jan 2024 20:30 UTC
66 points
13 comments9 min readLW link

A starter guide for evals

8 Jan 2024 18:24 UTC
45 points
2 comments12 min readLW link
(www.apolloresearch.ai)

Ex­pe­riences and learn­ings from both sides of the AI safety job market

Marius Hobbhahn15 Nov 2023 15:40 UTC
109 points
4 comments18 min readLW link