Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
wassname
Karma:
467
Independent Researcher, Perth, Australia
homepage
anon feedback
All
Posts
Comments
New
Top
Old
Adapters as Representational Hypotheses: What Adapter Methods Tell Us About Transformer Geometry
wassname
22 Feb 2026 22:12 UTC
18
points
0
comments
5
min read
LW
link
Do LLMs Learn Our Preferences or Just Our Behaviors?
wassname
1 Feb 2026 11:28 UTC
13
points
0
comments
1
min read
LW
link
AntiPaSTO: Self-Supervised Value Steering for Debugging Alignment
wassname
13 Jan 2026 12:55 UTC
6
points
0
comments
16
min read
LW
link
An Aphoristic Overview of Technical AI Alignment proposals
wassname
5 Jan 2026 3:01 UTC
11
points
3
comments
2
min read
LW
link
Private Capabilities, Public Alignment: De-escalating Without Disadvantage
wassname
16 Nov 2024 7:26 UTC
6
points
0
comments
5
min read
LW
link
wassname’s Shortform
wassname
8 Jun 2024 3:48 UTC
3
points
19
comments
1
min read
LW
link
[Question]
What did you learn from leaked documents?
wassname
2 Sep 2023 1:28 UTC
15
points
10
comments
1
min read
LW
link
What should we censor from training data?
wassname
22 Apr 2023 23:33 UTC
16
points
4
comments
1
min read
LW
link
Talk and Q&A—Dan Hendrycks—Paper: Aligning AI With Shared Human Values. On Discord at Aug 28, 2020 8:00-10:00 AM GMT+8.
wassname
14 Aug 2020 23:57 UTC
1
point
0
comments
1
min read
LW
link
Back to top