Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
New
Hot
Active
Old
Page
1
[Linkpost] Silver Bulletin: For most people, politics is about fitting in
Gunnar_Zarncke
1 May 2024 18:12 UTC
17
points
1
comment
1
min read
LW
link
(www.natesilver.net)
ACX Covid Origins Post convinced readers
ErnestScribbler
1 May 2024 13:06 UTC
50
points
4
comments
2
min read
LW
link
Beauty and the Bets
Ape in the coat
27 Mar 2024 6:17 UTC
6
points
24
comments
12
min read
LW
link
Why I’m doing PauseAI
Joseph Miller
30 Apr 2024 16:21 UTC
86
points
6
comments
4
min read
LW
link
[Question]
Shane Legg’s necessary properties for every AGI Safety plan
jacquesthibs
1 May 2024 17:15 UTC
59
points
8
comments
1
min read
LW
link
My Detailed Notes & Commentary from Secular Solstice
Jeffrey Heninger
23 Mar 2024 18:48 UTC
35
points
16
comments
13
min read
LW
link
Ability to solve long-horizon tasks correlates with wanting things in the behaviorist sense
So8res
24 Nov 2023 17:37 UTC
203
points
83
comments
5
min read
LW
link
The formal goal is a pointer
Pi Rogers
1 May 2024 0:27 UTC
19
points
9
comments
1
min read
LW
link
Take the wheel, Shoggoth! (Lesswrong is trying out changes to the frontpage algorithm)
Ruby
and
RobertM
23 Apr 2024 3:58 UTC
59
points
16
comments
4
min read
LW
link
Take SCIFs, it’s dangerous to go alone
latterframe
,
Jeffrey Ladish
and
schroederdewitt
1 May 2024 8:02 UTC
32
points
1
comment
3
min read
LW
link
“You’re the most beautiful girl in the world” and Wittgensteinian Language Games
Chris_Leong
20 Apr 2024 14:54 UTC
4
points
18
comments
1
min read
LW
link
Ironing Out the Squiggles
Zack_M_Davis
29 Apr 2024 16:13 UTC
137
points
26
comments
11
min read
LW
link
Transformers Represent Belief State Geometry in their Residual Stream
Adam Shai
16 Apr 2024 21:16 UTC
334
points
67
comments
12
min read
LW
link
We are headed into an extreme compute overhang
devrandom
26 Apr 2024 21:38 UTC
38
points
21
comments
2
min read
LW
link
POC || GTFO culture as partial antidote to alignment wordcelism
lc
15 Mar 2023 10:21 UTC
145
points
11
comments
7
min read
LW
link
Failures in Kindness
silentbob
26 Mar 2024 21:30 UTC
242
points
27
comments
9
min read
LW
link
Introducing AI Lab Watch
Zach Stein-Perlman
30 Apr 2024 17:00 UTC
158
points
7
comments
1
min read
LW
link
(ailabwatch.org)
The Intentional Stance, LLMs Edition
Eleni Angelou
30 Apr 2024 17:12 UTC
30
points
2
comments
8
min read
LW
link
Transcoders enable fine-grained interpretable circuit analysis for language models
Jacob Dunefsky
,
Philippe Chlenski
and
Neel Nanda
30 Apr 2024 17:58 UTC
46
points
10
comments
17
min read
LW
link
The Solution to Sleeping Beauty
Ape in the coat
4 Mar 2024 6:46 UTC
11
points
71
comments
13
min read
LW
link
Back to top
Next