RSS

Geoffrey Irving

Karma: 498

Chief Scientist at the UK AI Safety Institute (AISI). Previously, DeepMind, OpenAI, Google Brain, etc.

How to eval­u­ate con­trol mea­sures for LLM agents? A tra­jec­tory from to­day to superintelligence

14 Apr 2025 16:45 UTC
29 points
1 comment2 min readLW link

Prospects for Align­ment Au­toma­tion: In­ter­pretabil­ity Case Study

21 Mar 2025 14:05 UTC
30 points
4 comments8 min readLW link

A sketch of an AI con­trol safety case

30 Jan 2025 17:28 UTC
57 points
0 comments5 min readLW link

Elic­it­ing bad contexts

24 Jan 2025 10:39 UTC
31 points
8 comments3 min readLW link

Au­toma­tion collapse

21 Oct 2024 14:50 UTC
72 points
9 comments7 min readLW link

De­bate, Or­a­cles, and Obfus­cated Arguments

20 Jun 2024 23:14 UTC
44 points
4 comments21 min readLW link