fidgetsinner

Karma: 312

San Francisco Petrov Day

fidgetsinner22 Sep 2025 18:17 UTC

5 points

0 comments1 min readLW link

Forecasting Frontier Language Model Agent Capabilities

fidgetsinner, Axel Højmark, Jérémy Scheurer and Marius Hobbhahn

24 Feb 2025 16:51 UTC

35 points

0 comments5 min readLW link

(www.apolloresearch.ai)

Do models know when they are being evaluated?

fidgetsinner, Giles, Joe Needham and Marius Hobbhahn

17 Feb 2025 23:13 UTC

57 points

9 comments12 min readLW link

Current safety training techniques do not fully transfer to the agent setting

Simon Lermen and fidgetsinner

3 Nov 2024 19:24 UTC

162 points

9 comments5 min readLW link

~80 Interesting Questions about Foundation Model Agent Safety

RohanS and fidgetsinner

28 Oct 2024 16:37 UTC

48 points

4 comments15 min readLW link

Analyzing DeepMind’s Probabilistic Methods for Evaluating Agent Capabilities

Axel Højmark, fidgetsinner, Arjun Panickssery, Marius Hobbhahn and Jérémy Scheurer

22 Jul 2024 16:17 UTC

69 points

0 comments16 min readLW link