Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
ojorgensen
Karma:
180
AI Safety Researcher, my website is
here
.
All
Posts
Comments
New
Top
Old
Understanding Counterbalanced Subtractions for Better Activation Additions
ojorgensen
17 Aug 2023 13:53 UTC
21
points
0
comments
14
min read
LW
link
Because of LayerNorm, Directions in GPT-2 MLP Layers are Monosemantic
ojorgensen
28 Jul 2023 19:43 UTC
13
points
3
comments
13
min read
LW
link
UK Foundation Model Task Force—Expression of Interest
ojorgensen
18 Jun 2023 9:43 UTC
64
points
2
comments
1
min read
LW
link
(twitter.com)
ojorgensen’s Shortform
ojorgensen
4 May 2023 13:51 UTC
2
points
1
comment
1
min read
LW
link
(Extremely) Naive Gradient Hacking Doesn’t Work
ojorgensen
20 Dec 2022 14:35 UTC
14
points
0
comments
6
min read
LW
link
[Question]
Which Issues in Conceptual Alignment have been Formalised or Observed (or not)?
ojorgensen
1 Nov 2022 22:32 UTC
4
points
0
comments
1
min read
LW
link
Strange Loops—Self-Reference from Number Theory to AI
ojorgensen
28 Sep 2022 14:10 UTC
15
points
6
comments
18
min read
LW
link
Evaluating OpenAI’s alignment plans using training stories
ojorgensen
25 Aug 2022 16:12 UTC
4
points
0
comments
5
min read
LW
link
Disagreements about Alignment: Why, and how, we should try to solve them
ojorgensen
9 Aug 2022 18:49 UTC
11
points
2
comments
16
min read
LW
link
Back to top