RSS

Alan Cooney

Karma: 119

“Did you lie?” Eval­u­at­ing Lie De­tec­tors across Model Scale and Belief-Ver­ified Model Organisms

17 Jun 2026 18:43 UTC
32 points
0 comments6 min readLW link
(arxiv.org)

vLLM-Lens: Fast In­ter­pretabil­ity Tool­ing That Scales to Trillion-Pa­ram­e­ter Models

23 Apr 2026 19:13 UTC
76 points
0 comments5 min readLW link

Re­search Areas in AI Con­trol (The Align­ment Pro­ject by UK AISI)

1 Aug 2025 10:27 UTC
25 points
0 comments18 min readLW link
(alignmentproject.aisi.gov.uk)