RSS

wassname

Karma: 467

Adapters as Rep­re­sen­ta­tional Hy­pothe­ses: What Adapter Meth­ods Tell Us About Trans­former Geometry

wassname22 Feb 2026 22:12 UTC
18 points
0 comments5 min readLW link

Do LLMs Learn Our Prefer­ences or Just Our Be­hav­iors?

wassname1 Feb 2026 11:28 UTC
13 points
0 comments1 min readLW link

An­tiPaSTO: Self-Su­per­vised Value Steer­ing for De­bug­ging Alignment

wassname13 Jan 2026 12:55 UTC
6 points
0 comments16 min readLW link

An Apho­ris­tic Overview of Tech­ni­cal AI Align­ment proposals

wassname5 Jan 2026 3:01 UTC
11 points
3 comments2 min readLW link

Pri­vate Ca­pa­bil­ities, Public Align­ment: De-es­ca­lat­ing Without Disadvantage

wassname16 Nov 2024 7:26 UTC
6 points
0 comments5 min readLW link

wass­name’s Shortform

wassname8 Jun 2024 3:48 UTC
3 points
19 comments1 min readLW link

[Question] What did you learn from leaked doc­u­ments?

wassname2 Sep 2023 1:28 UTC
15 points
10 comments1 min readLW link

What should we cen­sor from train­ing data?

wassname22 Apr 2023 23:33 UTC
16 points
4 comments1 min readLW link

Talk and Q&A—Dan Hendrycks—Paper: Align­ing AI With Shared Hu­man Values. On Dis­cord at Aug 28, 2020 8:00-10:00 AM GMT+8.

wassname14 Aug 2020 23:57 UTC
1 point
0 comments1 min readLW link