Xander Davies

Karma: 293

Researcher at UK AI Security Institute.

Layered AI Defenses Have Holes: Vulnerabilities and Key Recommendations

smallsilo, Ian McKenzie, Oskar Hollinsworth, Tom Tseng, Xander Davies, scasper, Aaron Tucker, Robert Kirk and Adam Gleave

4 Jul 2025 0:07 UTC

13 points

1 comment4 min readLW link

(far.ai)

Apply to HAIST/MAIA’s AI Governance Workshop in DC (Feb 17-20)

Phosphorous, Xander Davies, CMD, Paramedic and tlevin

31 Jan 2023 2:06 UTC

28 points

0 comments2 min readLW link

AGISF adaptation for in-person groups

Sam Marks, Xander Davies and Richard_Ngo

13 Jan 2023 3:24 UTC

44 points

2 comments3 min readLW link

Update on Harvard AI Safety Team and MIT AI Alignment

Xander Davies, Sam Marks, kaivu, tlevin, leni, maxnadeau and Naomi Bashkansky

2 Dec 2022 0:56 UTC

60 points

4 comments8 min readLW link

Recommend HAIST resources for assessing the value of RLHF-related alignment research

Sam Marks and Xander Davies

5 Nov 2022 20:58 UTC

26 points

9 comments3 min readLW link

Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley

maxnadeau, Xander Davies, Buck and Nate Thomas

27 Oct 2022 1:32 UTC

135 points

14 comments12 min readLW link

GD’s Implicit Bias on Separable Data

Xander Davies17 Oct 2022 4:13 UTC

25 points

0 comments7 min readLW link