Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
joshc
Karma:
563
joshuaclymer.com
All
Posts
Comments
New
Top
Old
[Question]
What is the best critique of AI existential risk arguments?
joshc
30 Aug 2022 2:18 UTC
6
points
11
comments
1
min read
LW
link
Prizes for ML Safety Benchmark Ideas
joshc
28 Oct 2022 2:51 UTC
36
points
4
comments
1
min read
LW
link
[MLSN #7]: an example of an emergent internal optimizer
joshc
and
Dan H
9 Jan 2023 19:39 UTC
28
points
0
comments
6
min read
LW
link
Are short timelines actually bad?
joshc
5 Feb 2023 21:21 UTC
56
points
7
comments
3
min read
LW
link
Safety standards: a framework for AI regulation
joshc
1 May 2023 0:56 UTC
19
points
0
comments
8
min read
LW
link
Red teaming: challenges and research directions
joshc
10 May 2023 1:40 UTC
30
points
1
comment
10
min read
LW
link
Testbed evals: evaluating AI safety even when it can’t be directly measured
joshc
15 Nov 2023 19:00 UTC
70
points
2
comments
4
min read
LW
link
New paper shows truthfulness & instruction-following don’t generalize by default
joshc
19 Nov 2023 19:27 UTC
58
points
0
comments
4
min read
LW
link
List of strategies for mitigating deceptive alignment
joshc
2 Dec 2023 5:56 UTC
34
points
2
comments
6
min read
LW
link
New report: Safety Cases for AI
joshc
20 Mar 2024 16:45 UTC
90
points
13
comments
1
min read
LW
link
(twitter.com)
Back to top