James Fox

Karma: 418

ARENA 5.0 Impact Report

JScriven, JamesH and James Fox

11 Aug 2025 14:06 UTC

25 points

0 comments20 min readLW link

ARENA 6.0 - Call for Applicants

JamesH, JScriven, David Quarel, CallumMcDougall and James Fox

4 Jun 2025 10:19 UTC

26 points

3 comments6 min readLW link

[Job ad] LISA CEO

Ryan Kidd, James Fox and mike_safeAI

9 Feb 2025 0:18 UTC

18 points

4 comments2 min readLW link

ARENA 5.0 - Call for Applicants

JamesH, James Fox, CallumMcDougall, Chloe Li and David Quarel

30 Jan 2025 13:18 UTC

35 points

2 comments6 min readLW link

James Fox 3 Dec 2024 4:41 UTC
4 points
1
in reply to: Jonathan Claybrough’s comment on: ARENA 4.0 Impact Report
Thank you for your comment.

We are confident that ARENA’s in-person programme is among the most cost-effective technical AI safety training programmes:
- ARENA is highly selective, and so all of our participants have the latent potential to contribute meaningfully to technical AI safety work
- The marginal cost per participant is relatively low compared to other AI safety programmes since we only cover travel and accommodation expenses for 4-5 weeks (we do not provide stipends)
- The outcomes set out in the above post seem pretty strong (4/33 immediate transitions to AI safety roles and ²⁴⁄₃₃ more actively pursuing them)
- There are lots of reasons why technical AI safety engineering is not the right career fit for everyone (even those with the ability). Therefore, I think that ²⁄₃₃ people updating against working in AI safety after the programme is actually quite a low attrition rate.
- Apart Hackathons have quite a different theory of change compared with ARENA. While hackathons can be valuable for some initial exposure, ARENA provides 4-weeks of comprehensive training in cutting-edge AI safety research (e.g., mechanistic interpretability, LLM evaluations, and RLHF implementation) that leads to concrete outputs through week-long capstone projects.

ARENA 4.0 Impact Report

Chloe Li, JamesH and James Fox

27 Nov 2024 20:51 UTC

45 points

3 comments13 min readLW link

James Fox 5 Sep 2024 23:02 UTC
1 point
0
in reply to: akshayaurora’s comment on: AI Alignment Research Engineer Accelerator (ARENA): Call for applicants v4.0
Sorry for not seeing this. Hopefully, the first paragraph of the summary answers this question. We’re excited about running more ARENA iterations exactly because its track record has been pretty strong.

AI Alignment Research Engineer Accelerator (ARENA): Call for applicants v4.0

James Fox, Chloe Li, JamesH, Gracie Green and CallumMcDougall

6 Jul 2024 11:34 UTC

57 points

7 comments6 min readLW link

Announcing the London Initiative for Safe AI (LISA)

James Fox, mike_safeAI and Ryan Kidd

2 Feb 2024 23:17 UTC

98 points

0 comments9 min readLW link

Reward Hacking from a Causal Perspective

tom4everitt, Francis Rhys Ward, sbenthall, James Fox, mattmacdermott and RyanCarey

21 Jul 2023 18:27 UTC

29 points

6 comments7 min readLW link

Incentives from a causal perspective

tom4everitt, James Fox, RyanCarey, mattmacdermott, sbenthall and Jonathan Richens

10 Jul 2023 17:16 UTC

27 points

0 comments6 min readLW link

Agency from a causal perspective

tom4everitt, mattmacdermott, James Fox, Francis Rhys Ward and Jonathan Richens

30 Jun 2023 17:37 UTC

40 points

5 comments6 min readLW link

James Fox 23 Jun 2023 14:24 UTC
LW: 5 AF: 3
0
AF
on: Utility Maximization = Description Length Minimization
I know you’ve acknowledged Friston at the end, but I’m just commenting for other interested readers’ benefit that this is very close to Karl Friston’s active inference framework, which posits that all agents minimise the discrepancies (or prediction errors) between their internal representations of the world and their incoming sensory information through both action and perception.
What links here?
- How Would an Utopia-Maximizer Look Like? by Thane Ruthenis (20 Dec 2023 20:01 UTC; 32 points)
- Thane Ruthenis's comment on Value systematization: how values become coherent (and misaligned) by Richard_Ngo (28 Oct 2023 21:20 UTC; 3 points)

Causality: A Brief Introduction

tom4everitt, Lewis Hammond, Jonathan Richens, Francis Rhys Ward, RyanCarey, sbenthall and James Fox

20 Jun 2023 15:01 UTC

49 points

18 comments6 min readLW link

Introduction to Towards Causal Foundations of Safe AGI

tom4everitt, Lewis Hammond, Francis Rhys Ward, RyanCarey, James Fox, mattmacdermott and sbenthall

12 Jun 2023 17:55 UTC

73 points

6 comments4 min readLW link

James Fox 2 Oct 2021 13:13 UTC
3 points
0
AF
in reply to: Vanessa Kosoy’s comment on: Progress on Causal Influence Diagrams
Hi Vanessa, Thanks for your question! Sorry for taking a while to reply. The answer is yes if we allow for mixed policies (i.e., where an agent can correlate all of their decision rules for different decisions with a shared random bit), but no if we restrict agents to only be able to use behavioural policies (i.e., decision rules for each of an agent’s decisions are independent because they can’t access a shared random bit). This is analogous to the difference between mixed and behavioural strategies in extensive form games, where (in general) a subgame perfect equilibrium (SPE) is only guaranteed to exist in mixed strategies (and the game is finite etc by Nash’ theorem).

Note that If all agents in the MAIM have perfect recall (where they remember their previous decisions and the information that they knew at previous decisions), then there is guaranteed to exist a SPE in behavioural policies). In fact, Koller and Milch showed that only a weaker criterion of “sufficient recall” is needed (https://www.semanticscholar.org/paper/Ignorable-Information-in-Multi-Agent-Scenarios-Milch-Koller/5ea036bad72176389cf23545a881636deadc4946).

In a forthcoming journal paper, we expand significantly on the the theoretical underpinnings and advantages of MAIMs and so we will provide more results there.