Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
New
Hot
Active
Old
Page
1
An Introduction to AI Sandbagging
Teun van der Weij
,
Felix Hofstätter
and
Francis Rhys Ward
26 Apr 2024 13:40 UTC
41
points
1
comment
8
min read
LW
link
My hour of memoryless lucidity
Eric Neyman
4 May 2024 1:40 UTC
39
points
1
comment
5
min read
LW
link
(ericneyman.wordpress.com)
KAN: Kolmogorov-Arnold Networks
Gunnar_Zarncke
1 May 2024 16:50 UTC
10
points
10
comments
1
min read
LW
link
(arxiv.org)
Why I’m doing PauseAI
Joseph Miller
30 Apr 2024 16:21 UTC
92
points
10
comments
4
min read
LW
link
“AI Safety for Fleshy Humans” an AI Safety explainer by Nicky Case
habryka
3 May 2024 18:10 UTC
45
points
7
comments
4
min read
LW
link
(aisafety.dance)
Transformers Represent Belief State Geometry in their Residual Stream
Adam Shai
16 Apr 2024 21:16 UTC
349
points
79
comments
12
min read
LW
link
If you weren’t such an idiot...
kave
and
Mark Xu
2 Mar 2024 0:01 UTC
119
points
60
comments
2
min read
LW
link
(markxu.com)
[Question]
Which skincare products are evidence-based?
Vanessa Kosoy
2 May 2024 15:22 UTC
81
points
24
comments
1
min read
LW
link
LLM+Planners hybridisation for friendly AGI
installgentoo
3 May 2024 8:40 UTC
6
points
2
comments
1
min read
LW
link
[Question]
Were there any ancient rationalists?
OliverHayman
3 May 2024 18:26 UTC
11
points
3
comments
1
min read
LW
link
Why is AGI/ASI Inevitable?
DeathlessAmaranth
2 May 2024 18:27 UTC
14
points
6
comments
1
min read
LW
link
A list of core AI safety problems and how I hope to solve them
davidad
26 Aug 2023 15:12 UTC
161
points
26
comments
5
min read
LW
link
Please stop publishing ideas/insights/research about AI
Tamsin Leake
2 May 2024 14:54 UTC
22
points
48
comments
4
min read
LW
link
Disentangling Competence and Intelligence
Robert Kralisch
29 Apr 2024 0:12 UTC
23
points
7
comments
6
min read
LW
link
Key takeaways from our EA and alignment research surveys
Cameron Berg
,
Judd Rosenblatt
,
florin_pop
and
AE Studio
3 May 2024 18:10 UTC
69
points
2
comments
21
min read
LW
link
An Unintentional Compliment
abstractapplic
and
lsusr
28 Apr 2024 20:04 UTC
23
points
2
comments
4
min read
LW
link
An explanation of evil in an organized world
KatjaGrace
2 May 2024 5:20 UTC
26
points
9
comments
2
min read
LW
link
(worldspiritsockpuppet.com)
Coherence of Caches and Agents
johnswentworth
1 Apr 2024 23:04 UTC
73
points
7
comments
11
min read
LW
link
[Question]
Can stealth aircraft be detected optically?
Yair Halberstadt
2 May 2024 7:47 UTC
18
points
24
comments
1
min read
LW
link
Ironing Out the Squiggles
Zack_M_Davis
29 Apr 2024 16:13 UTC
140
points
33
comments
11
min read
LW
link
Back to top
Next