Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
joshc
Karma:
1,631
All
Posts
Comments
New
Top
Old
Alignment faking CTFs: Apply to my MATS stream
joshc
Apr 4, 2025, 4:29 PM
60
points
0
comments
4
min read
LW
link
Training AI to do alignment research we don’t already know how to do
joshc
Feb 24, 2025, 7:19 PM
45
points
23
comments
7
min read
LW
link
How might we safely pass the buck to AI?
joshc
Feb 19, 2025, 5:48 PM
83
points
58
comments
31
min read
LW
link
How AI Takeover Might Happen in 2 Years
joshc
Feb 7, 2025, 5:10 PM
422
points
137
comments
29
min read
LW
link
(x.com)
Takeaways from sketching a control safety case
joshc
Jan 31, 2025, 4:43 AM
28
points
0
comments
3
min read
LW
link
(redwoodresearch.substack.com)
A sketch of an AI control safety case
Tomek Korbak
,
joshc
,
Benjamin Hilton
,
Buck
and
Geoffrey Irving
Jan 30, 2025, 5:28 PM
57
points
0
comments
5
min read
LW
link
Planning for Extreme AI Risks
joshc
Jan 29, 2025, 6:33 PM
139
points
5
comments
16
min read
LW
link
When does capability elicitation bound risk?
joshc
Jan 22, 2025, 3:42 AM
25
points
0
comments
17
min read
LW
link
(redwoodresearch.substack.com)
Extending control evaluations to non-scheming threats
joshc
Jan 12, 2025, 1:42 AM
30
points
1
comment
12
min read
LW
link
New report: Safety Cases for AI
joshc
Mar 20, 2024, 4:45 PM
89
points
14
comments
1
min read
LW
link
(twitter.com)
List of strategies for mitigating deceptive alignment
joshc
Dec 2, 2023, 5:56 AM
38
points
2
comments
6
min read
LW
link
New paper shows truthfulness & instruction-following don’t generalize by default
joshc
Nov 19, 2023, 7:27 PM
60
points
0
comments
4
min read
LW
link
Testbed evals: evaluating AI safety even when it can’t be directly measured
joshc
Nov 15, 2023, 7:00 PM
71
points
2
comments
4
min read
LW
link
Red teaming: challenges and research directions
joshc
May 10, 2023, 1:40 AM
31
points
1
comment
10
min read
LW
link
Safety standards: a framework for AI regulation
joshc
May 1, 2023, 12:56 AM
19
points
0
comments
8
min read
LW
link
Are short timelines actually bad?
joshc
Feb 5, 2023, 9:21 PM
61
points
7
comments
3
min read
LW
link
[MLSN #7]: an example of an emergent internal optimizer
joshc
and
Dan H
Jan 9, 2023, 7:39 PM
28
points
0
comments
6
min read
LW
link
Prizes for ML Safety Benchmark Ideas
joshc
Oct 28, 2022, 2:51 AM
36
points
5
comments
1
min read
LW
link
[Question]
What is the best critique of AI existential risk arguments?
joshc
Aug 30, 2022, 2:18 AM
6
points
11
comments
1
min read
LW
link
Back to top
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel