wassname

Karma: 489

Independent Researcher, Perth, Australia
homepage
anon feedback

Adapters as Representational Hypotheses: What Adapter Methods Tell Us About Transformer Geometry

wassname22 Feb 2026 22:12 UTC

18 points

0 comments5 min readLW link

Do LLMs Learn Our Preferences or Just Our Behaviors?

wassname1 Feb 2026 11:28 UTC

13 points

0 comments1 min readLW link

AntiPaSTO: Self-Supervised Honesty Steering via Anti-Parallel Representations

wassname13 Jan 2026 12:55 UTC

6 points

0 comments11 min readLW link

An Aphoristic Overview of Technical AI Alignment proposals

wassname5 Jan 2026 3:01 UTC

11 points

3 comments2 min readLW link

Private Capabilities, Public Alignment: De-escalating Without Disadvantage

wassname16 Nov 2024 7:26 UTC

6 points

0 comments5 min readLW link

wassname’s Shortform

wassname8 Jun 2024 3:48 UTC

3 points

21 comments1 min readLW link

[Question] What did you learn from leaked documents?

wassname2 Sep 2023 1:28 UTC

15 points

10 comments1 min readLW link

What should we censor from training data?

wassname22 Apr 2023 23:33 UTC

16 points

4 comments1 min readLW link

Talk and Q&A—Dan Hendrycks—Paper: Aligning AI With Shared Human Values. On Discord at Aug 28, 2020 8:00-10:00 AM GMT+8.

wassname14 Aug 2020 23:57 UTC

1 point

0 comments1 min readLW link