Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
New
Hot
Active
Old
Page
1
Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems
Joar Skalse
17 May 2024 19:13 UTC
5
points
0
comments
2
min read
LW
link
DeepMind: Frontier Safety Framework
Zach Stein-Perlman
17 May 2024 17:30 UTC
23
points
0
comments
3
min read
LW
link
(deepmind.google)
Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning
Dan Braun
,
Jordan Taylor
,
Nicholas Goldowsky-Dill
and
Lee Sharkey
17 May 2024 16:25 UTC
23
points
0
comments
4
min read
LW
link
(publications.apolloresearch.ai)
AISafety.com – Resources for AI Safety
Søren Elverlin
,
plex
,
Bryce Robertson
and
Melissa Samworth
17 May 2024 15:57 UTC
39
points
0
comments
1
min read
LW
link
Is There Really a Child Penalty in the Long Run?
Maxwell Tabarrok
17 May 2024 11:56 UTC
20
points
3
comments
5
min read
LW
link
(www.maximum-progress.com)
My Hammer Time Final Exam
adios
17 May 2024 9:28 UTC
7
points
1
comment
3
min read
LW
link
D&D.Sci (Easy Mode): On The Construction Of Impossible Structures
abstractapplic
17 May 2024 0:25 UTC
19
points
7
comments
2
min read
LW
link
To an LLM, everything looks like a logic puzzle
Jesse Richardson
16 May 2024 22:21 UTC
10
points
0
comments
2
min read
LW
link
AI Safety Institute’s Inspect hello world example for AI evals
TheManxLoiner
16 May 2024 20:47 UTC
3
points
0
comments
1
min read
LW
link
(lovkush.medium.com)
Feeling (instrumentally) Rational
Pi Rogers
16 May 2024 18:56 UTC
14
points
5
comments
1
min read
LW
link
Advice for Activists from the History of Environmentalism
Jeffrey Heninger
16 May 2024 18:40 UTC
61
points
3
comments
6
min read
LW
link
(blog.aiimpacts.org)
Ninety-five theses on AI
hamandcheese
16 May 2024 17:51 UTC
12
points
0
comments
7
min read
LW
link
FMT: a great opportunity for soon-to-be parents
Anton Rodenhauser
16 May 2024 13:24 UTC
8
points
1
comment
15
min read
LW
link
Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems
Gunnar_Zarncke
16 May 2024 13:09 UTC
47
points
4
comments
1
min read
LW
link
(arxiv.org)
The Dunning-Kruger of disproving Dunning-Kruger
kromem
16 May 2024 10:11 UTC
27
points
0
comments
5
min read
LW
link
A case for fairness-enforcing irrational behavior
cousin_it
16 May 2024 9:41 UTC
9
points
3
comments
2
min read
LW
link
Podcast: Eye4AI on 2023 Survey
KatjaGrace
16 May 2024 7:40 UTC
8
points
0
comments
1
min read
LW
link
(worldspiritsockpuppet.com)
Against “argument from overhang risk”
RobertM
16 May 2024 4:44 UTC
28
points
9
comments
5
min read
LW
link
Do you believe in hundred dollar bills lying on the ground? Consider humming
Elizabeth
16 May 2024 0:00 UTC
103
points
10
comments
6
min read
LW
link
(acesounderglass.com)
Introducing Statistical Utility Mechanics: A Framework for Utility Maximizers
J Bostock
15 May 2024 21:56 UTC
9
points
0
comments
7
min read
LW
link
Back to top
Next