Archie Chaudhury

Karma: 61

The slow death of the accelerationist.

Archie Chaudhury6 Apr 2026 3:40 UTC

3 points

1 comment5 min readLW link

Archie Chaudhury 3 Apr 2026 23:25 UTC
1 point
0
on: There should be $100M grants to automate AI safety
I think the main gap between safety work and the broader, “generative AI” ecosystem that funders or investors may be looking at is the tendency of AI safety to sound like something that fundamentally is meant to not return immediate results, but rather be a public good of sorts.

I actually think there are plenty of concrete problems today, such as the prospenity of LLMs to engage in less explicit, harmful behavioral patterns, that can be addressed by solutions today. To me, this is something that can be extremely valuable and also help address existensial risk down the line.

Teaching Models to Dream of Better Monitors through Evaluation Conditioned Training

Alec Harris, Kasey C, Archie Chaudhury and yix

19 Mar 2026 21:01 UTC

49 points

2 comments10 min readLW link

A Rational Proposal

Archie Chaudhury26 Jan 2026 20:22 UTC

−2 points

0 comments14 min readLW link

Alignment may be localized: a short (and albeitly limited) experiment

Archie Chaudhury24 Nov 2025 17:48 UTC

18 points

0 comments5 min readLW link

Archie Chaudhury 22 Oct 2025 23:24 UTC
1 point
0
in reply to: Fabien Roger’s comment on: Interpretability is the best path to alignment
Extremely late, but I actually agree.

I wonder the extent to which alignment faking is present in current preparedness frameworks. One of my beliefs is that a better degree of interpretability can help us understand why models engage in such behavior, but yes, it probably does not get us to a solution (so far).

Interpretability is the best path to alignment

Archie Chaudhury5 Sep 2025 4:37 UTC

2 points

6 comments5 min readLW link

Archie Chaudhury 4 Jun 2025 0:33 UTC
1 point
0
on: Steering Vectors Can Help LLM Judges Detect Subtle Dishonesty
Please ask any questions! We are more than happy to clarify our work, and explore potential avenues to improve it.

The lack of actionable ways to not only understand, but effectively improve model behavior toward alignment, is something that we believe is one of the most unsolved and overlooked problems in safety research today.

Steering Vectors Can Help LLM Judges Detect Subtle Dishonesty

Leon Eshuijs, mcbeth, Etha and Archie Chaudhury

3 Jun 2025 20:33 UTC

12 points

1 comment5 min readLW link

Arch223′s Shortform

Archie Chaudhury18 Nov 2024 1:54 UTC

1 point

1 comment1 min readLW link

Archie Chaudhury 18 Nov 2024 1:20 UTC
−9 points
0
on: Arch223′s Shortform
A new version of rationalism is required as a counterweight to the traditional doomers or accelerationists.

No longer can the public perception of technology, culture, and ideas be restricted to the revolutionaries and conservatives. These lines have also become blurred in recent years.

The best example of this is the development of AI: the optimal path forward is not one in which the risk of a superior race of AI overlords rules us because of unrestricted development, nor one where the technology becomes concentrated in the hands of the powers that be.

Archie Chaudhury

The slow death of the ac­cel­er­a­tionist.

Teach­ing Models to Dream of Bet­ter Mon­i­tors through Eval­u­a­tion Con­di­tioned Training

A Ra­tional Proposal

Align­ment may be lo­cal­ized: a short (and albeitly limited) experiment

In­ter­pretabil­ity is the best path to alignment

Steer­ing Vec­tors Can Help LLM Judges De­tect Sub­tle Dishonesty