RSS

Benjamin Hilton

Karma: 420

Head of Alignment at UK AI Security Institute (AISI). Previously 80,000 Hours, HM Treasury, Cabinet Office, Department for International Trade, Imperial College London.

As­sur­ing Agent Safety Eval­u­a­tions By Analysing Tran­scripts

10 Oct 2025 0:42 UTC
7 points
0 comments15 min readLW link

Re­search Areas in Meth­ods for Post-train­ing and Elic­i­ta­tion (The Align­ment Pro­ject by UK AISI)

1 Aug 2025 10:27 UTC
12 points
0 comments6 min readLW link
(alignmentproject.aisi.gov.uk)

Re­search Areas in Bench­mark De­sign and Eval­u­a­tion (The Align­ment Pro­ject by UK AISI)

1 Aug 2025 10:26 UTC
10 points
0 comments9 min readLW link
(alignmentproject.aisi.gov.uk)

Re­search Areas in Prob­a­bil­is­tic Meth­ods (The Align­ment Pro­ject by UK AISI)

1 Aug 2025 10:26 UTC
3 points
0 comments4 min readLW link
(alignmentproject.aisi.gov.uk)

Re­search Areas in Eval­u­a­tion and Guaran­tees in Re­in­force­ment Learn­ing (The Align­ment Pro­ject by UK AISI)

1 Aug 2025 9:53 UTC
14 points
0 comments11 min readLW link
(alignmentproject.aisi.gov.uk)

The Align­ment Pro­ject by UK AISI

1 Aug 2025 9:52 UTC
29 points
0 comments2 min readLW link
(alignmentproject.aisi.gov.uk)

An al­ign­ment safety case sketch based on debate

8 May 2025 15:02 UTC
57 points
21 comments25 min readLW link
(arxiv.org)

UK AISI’s Align­ment Team: Re­search Agenda

7 May 2025 16:33 UTC
113 points
2 comments11 min readLW link

A sketch of an AI con­trol safety case

30 Jan 2025 17:28 UTC
57 points
0 comments5 min readLW link

Au­toma­tion collapse

21 Oct 2024 14:50 UTC
72 points
9 comments7 min readLW link

Should you work at a lead­ing AI lab? (in­clud­ing in non-safety roles)

Benjamin Hilton25 Jul 2023 16:29 UTC
7 points
0 comments12 min readLW link

AI safety tech­ni­cal re­search—Ca­reer review

Benjamin Hilton17 Jul 2023 15:34 UTC
14 points
0 comments29 min readLW link

How many peo­ple are work­ing (di­rectly) on re­duc­ing ex­is­ten­tial risk from AI?

Benjamin Hilton18 Jan 2023 8:46 UTC
20 points
1 comment4 min readLW link
(80000hours.org)

New 80,000 Hours prob­lem pro­file on ex­is­ten­tial risks from AI

Benjamin Hilton31 Aug 2022 17:36 UTC
28 points
6 comments7 min readLW link
(80000hours.org)