RSS

Geoffrey Irving

Karma: 873

Chief Scientist at the UK AI Safety Institute (AISI). Previously, DeepMind, OpenAI, Google Brain, etc.

Re­search Areas in Cog­ni­tive Science (The Align­ment Pro­ject by UK AISI)

Geoffrey Irving1 Aug 2025 10:26 UTC
12 points
0 comments6 min readLW link
(alignmentproject.aisi.gov.uk)

The Align­ment Pro­ject by UK AISI

1 Aug 2025 9:52 UTC
28 points
0 comments2 min readLW link
(alignmentproject.aisi.gov.uk)

The need to rel­a­tivise in de­bate

26 Jun 2025 16:23 UTC
25 points
2 comments5 min readLW link

Prover-Es­ti­ma­tor De­bate: A New Scal­able Over­sight Protocol

17 Jun 2025 13:53 UTC
88 points
18 comments5 min readLW link

Un­ex­ploitable search: block­ing mal­i­cious use of free parameters

21 May 2025 17:23 UTC
34 points
16 comments6 min readLW link

Dodg­ing sys­tem­atic hu­man er­rors in scal­able oversight

Geoffrey Irving14 May 2025 15:19 UTC
33 points
3 comments4 min readLW link

An al­ign­ment safety case sketch based on debate

8 May 2025 15:02 UTC
57 points
21 comments25 min readLW link
(arxiv.org)