RSS

Arch223

Karma: 11

A Ra­tional Proposal

Arch22326 Jan 2026 20:22 UTC
−6 points
0 comments14 min readLW link

Align­ment may be lo­cal­ized: a short (and albeitly limited) experiment

Arch22324 Nov 2025 17:48 UTC
18 points
0 comments5 min readLW link

In­ter­pretabil­ity is the best path to alignment

Arch2235 Sep 2025 4:37 UTC
2 points
6 comments5 min readLW link

Steer­ing Vec­tors Can Help LLM Judges De­tect Sub­tle Dishonesty

3 Jun 2025 20:33 UTC
12 points
1 comment5 min readLW link

Arch223′s Shortform

Arch22318 Nov 2024 1:54 UTC
1 point
1 comment1 min readLW link