RSS

Geoffrey Irving

Karma: 1,387

Chief Scientist at the UK AI Safety Institute (AISI). Previously, DeepMind, OpenAI, Google Brain, etc.

“Did you lie?” Eval­u­at­ing Lie De­tec­tors across Model Scale and Belief-Ver­ified Model Organisms

17 Jun 2026 18:43 UTC
32 points
0 comments6 min readLW link
(arxiv.org)

Se­quent: scale and au­toma­tion for higher con­fi­dence in alignment

10 Jun 2026 15:37 UTC
278 points
2 comments11 min readLW link
(sequent.org)

Au­to­mated Align­ment is Harder Than You Think

14 May 2026 22:01 UTC
143 points
7 comments3 min readLW link
(arxiv.org)

Bring­ing More Ex­per­tise to Bear on Alignment

8 May 2026 10:29 UTC
87 points
1 comment8 min readLW link

Re­search Areas in Cog­ni­tive Science (The Align­ment Pro­ject by UK AISI)

Geoffrey Irving1 Aug 2025 10:26 UTC
12 points
0 comments6 min readLW link
(alignmentproject.aisi.gov.uk)

The Align­ment Pro­ject by UK AISI

1 Aug 2025 9:52 UTC
29 points
0 comments2 min readLW link
(alignmentproject.aisi.gov.uk)

The need to rel­a­tivise in de­bate

26 Jun 2025 16:23 UTC
31 points
2 comments5 min readLW link

Prover-Es­ti­ma­tor De­bate: A New Scal­able Over­sight Protocol

17 Jun 2025 13:53 UTC
89 points
19 comments5 min readLW link

Un­ex­ploitable search: block­ing mal­i­cious use of free parameters

21 May 2025 17:23 UTC
40 points
16 comments6 min readLW link