Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
smallsilo
Karma:
113
All
Posts
Comments
New
Top
Old
Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google
ChengCheng
,
Brendan Murphy
,
Adrià Garriga-alonso
,
Yashvardhan Sharma
,
dsbowen
,
smallsilo
,
Yawen Duan
,
ChrisCundy
,
Hannah Betts
,
AdamGleave
and
Kellin Pelrine
Feb 7, 2025, 3:57 AM
29
points
0
comments
10
min read
LW
link
AISafety.info Distillation Hackathon
smallsilo
Oct 1, 2023, 6:54 PM
2
points
0
comments
1
min read
LW
link
Join AISafety.info’s Distillation Hackathon (Oct 6-9th)
smallsilo
Oct 1, 2023, 6:43 PM
21
points
0
comments
2
min read
LW
link
(forum.effectivealtruism.org)
GPT-powered EA/LW weekly summary
smallsilo
Aug 25, 2023, 6:19 PM
18
points
1
comment
11
min read
LW
link
(forum.effectivealtruism.org)
AISafety.info’s Writing & Editing Hackathon
smallsilo
Aug 5, 2023, 5:14 PM
2
points
0
comments
1
min read
LW
link
Join AISafety.info’s Writing & Editing Hackathon (Aug 25-28) (Prizes to be won!)
smallsilo
Aug 5, 2023, 2:08 PM
19
points
3
comments
1
min read
LW
link
(forum.effectivealtruism.org)
All AGI Safety questions welcome (especially basic ones) [July 2023]
smallsilo
Jul 20, 2023, 8:20 PM
38
points
40
comments
2
min read
LW
link
(forum.effectivealtruism.org)
Back to top
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel