Benjamin Hilton

Karma: 437

Head of Alignment at UK AI Security Institute (AISI). Previously 80,000 Hours, HM Treasury, Cabinet Office, Department for International Trade, Imperial College London.

Research Areas in Methods for Post-training and Elicitation (The Alignment Project by UK AISI)

Jacob Pfau and Benjamin Hilton

1 Aug 2025 10:27 UTC

12 points

0 comments6 min readLW link

(alignmentproject.aisi.gov.uk)

Research Areas in Benchmark Design and Evaluation (The Alignment Project by UK AISI)

Jacob Pfau and Benjamin Hilton

1 Aug 2025 10:26 UTC

10 points

0 comments9 min readLW link

(alignmentproject.aisi.gov.uk)

Research Areas in Probabilistic Methods (The Alignment Project by UK AISI)

Jacob Pfau and Benjamin Hilton

1 Aug 2025 10:26 UTC

4 points

0 comments4 min readLW link

(alignmentproject.aisi.gov.uk)

Research Areas in Evaluation and Guarantees in Reinforcement Learning (The Alignment Project by UK AISI)

Jacob Pfau and Benjamin Hilton

1 Aug 2025 9:53 UTC

14 points

0 comments11 min readLW link

(alignmentproject.aisi.gov.uk)

The Alignment Project by UK AISI

Mojmir, Benjamin Hilton, Jacob Pfau, Geoffrey Irving, Joseph Bloom, Tomek Korbak, David Africa and Edmund Lau

1 Aug 2025 9:52 UTC

29 points

0 comments2 min readLW link

(alignmentproject.aisi.gov.uk)

An alignment safety case sketch based on debate

Marie_DB, Jacob Pfau, Benjamin Hilton and Geoffrey Irving

8 May 2025 15:02 UTC

62 points

21 comments25 min readLW link

(arxiv.org)

UK AISI’s Alignment Team: Research Agenda

Benjamin Hilton, Jacob Pfau, Marie_DB and Geoffrey Irving

7 May 2025 16:33 UTC

115 points

3 comments11 min readLW link

A sketch of an AI control safety case

Tomek Korbak, joshc, Benjamin Hilton, Buck and Geoffrey Irving

30 Jan 2025 17:28 UTC

61 points

0 comments5 min readLW link

Automation collapse

Geoffrey Irving, Tomek Korbak and Benjamin Hilton

21 Oct 2024 14:50 UTC

72 points

9 comments7 min readLW link

Benjamin Hilton 7 Feb 2024 19:15 UTC
3 points
−7
in reply to: Remmelt’s comment on: Why I think it’s net harmful to do technical safety research at AGI labs
[x-posted from EA forum]

Hi Remmelt,
Thanks for sharing your concerns, both with us privately and here on the forum. These are tricky issues and we expect people to disagree about how to about how to weigh all the considerations — so it’s really good to have open conversations about them.
Ultimately, we disagree with you that it’s net harmful to do technical safety research at AGI labs. In fact, we think it can be the best career step for some of our readers to work in labs, even in non-safety roles. That’s the core reason why we list these roles on our job board.
We argue for this position extensively in my article on the topic (and we only list roles consistent with the considerations in that article).
Some other things we’ve published on this topic in the last year or so:
- A range of opinions from anonymous experts about the upsides and downsides of working on AI capabilities
- How policy roles in AI companies can be valuable for career capital and for direct impact (as well as the potential downsides)
- We recently released a podcast episode with Nathan Labenz on some of the controversy around OpenAI, including his concerns about some of their past safety practices, whether ChatGPT’s release was good or bad, and why its mission of developing AGI may be too risky.
Benjamin

Should you work at a leading AI lab? (including in non-safety roles)

Benjamin Hilton25 Jul 2023 16:29 UTC

7 points

0 comments12 min readLW link

AI safety technical research—Career review

Benjamin Hilton17 Jul 2023 15:34 UTC

14 points

0 comments29 min readLW link

How many people are working (directly) on reducing existential risk from AI?

Benjamin Hilton18 Jan 2023 8:46 UTC

20 points

1 comment4 min readLW link

(80000hours.org)

New 80,000 Hours problem profile on existential risks from AI

Benjamin Hilton31 Aug 2022 17:36 UTC

28 points

6 comments7 min readLW link

(80000hours.org)