Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
Linking Alt Accounts
jefftk
Oct 6, 2023, 5:00 PM
70
points
33
comments
1
min read
LW
link
(www.jefftk.com)
Super-Exponential versus Exponential Growth in Compute Price-Performance
moridinamael
Oct 6, 2023, 4:23 PM
37
points
25
comments
2
min read
LW
link
A personal explanation of ELK concept and task.
Zeyu Qin
Oct 6, 2023, 3:55 AM
1
point
0
comments
1
min read
LW
link
The Long-Term Future Fund is looking for a full-time fund chair
Linch
,
calebp99
and
abergal
Oct 5, 2023, 10:18 PM
52
points
0
comments
7
min read
LW
link
(forum.effectivealtruism.org)
Provably Safe AI
PeterMcCluskey
Oct 5, 2023, 10:18 PM
35
points
15
comments
4
min read
LW
link
(bayesianinvestor.com)
Stampy’s AI Safety Info soft launch
steven0461
and
Robert Miles
Oct 5, 2023, 10:13 PM
120
points
9
comments
2
min read
LW
link
Impacts of AI on the housing markets
PottedRosePetal
Oct 5, 2023, 9:24 PM
8
points
0
comments
5
min read
LW
link
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
Zac Hatfield-Dodds
Oct 5, 2023, 9:01 PM
288
points
22
comments
2
min read
LW
link
1
review
(transformer-circuits.pub)
Ideation and Trajectory Modelling in Language Models
NickyP
Oct 5, 2023, 7:21 PM
16
points
2
comments
10
min read
LW
link
A well-defined history in measurable factor spaces
Matthias G. Mayer
Oct 5, 2023, 6:36 PM
22
points
0
comments
2
min read
LW
link
Evaluating the historical value misspecification argument
Matthew Barnett
Oct 5, 2023, 6:34 PM
188
points
162
comments
7
min read
LW
link
3
reviews
Translations Should Invert
abramdemski
Oct 5, 2023, 5:44 PM
48
points
19
comments
3
min read
LW
link
Censorship in LLMs is here to stay because it mirrors how our own intelligence is structured
mnvr
Oct 5, 2023, 5:37 PM
3
points
0
comments
1
min read
LW
link
Twin Cities ACX Meetup October 2023
Timothy M.
Oct 5, 2023, 4:29 PM
1
point
2
comments
1
min read
LW
link
This anime storyboard doesn’t exist: a graphic novel written and illustrated by GPT4
RomanS
Oct 5, 2023, 2:01 PM
12
points
7
comments
55
min read
LW
link
AI #32: Lie Detector
Zvi
Oct 5, 2023, 1:50 PM
45
points
19
comments
44
min read
LW
link
(thezvi.wordpress.com)
Can the House Legislate?
jefftk
Oct 5, 2023, 1:40 PM
26
points
6
comments
2
min read
LW
link
(www.jefftk.com)
Making progress on the ``what alignment target should be aimed at?″ question, is urgent
ThomasCederborg
Oct 5, 2023, 12:55 PM
2
points
0
comments
18
min read
LW
link
Response to Quintin Pope’s Evolution Provides No Evidence For the Sharp Left Turn
Zvi
Oct 5, 2023, 11:39 AM
129
points
29
comments
9
min read
LW
link
How to Get Rationalist Feedback
Nicholas / Heather Kross
Oct 5, 2023, 2:03 AM
16
points
0
comments
2
min read
LW
link
On my AI Fable, and the importance of de re, de dicto, and de se reference for AI alignment
PhilGoetz
Oct 5, 2023, 12:50 AM
9
points
5
comments
1
min read
LW
link
Underspecified Probabilities: A Thought Experiment
lunatic_at_large
Oct 4, 2023, 10:25 PM
8
points
4
comments
2
min read
LW
link
Fraternal Birth Order Effect and the Maternal Immune Hypothesis
Bucky
Oct 4, 2023, 9:18 PM
20
points
1
comment
2
min read
LW
link
How to solve deception and still fail.
Charlie Steiner
Oct 4, 2023, 7:56 PM
40
points
7
comments
6
min read
LW
link
PortAudio M1 Latency
jefftk
4 Oct 2023 19:10 UTC
8
points
5
comments
1
min read
LW
link
(www.jefftk.com)
Open Philanthropy is hiring for multiple roles across our Global Catastrophic Risks teams
aarongertler
4 Oct 2023 18:04 UTC
6
points
0
comments
3
min read
LW
link
(forum.effectivealtruism.org)
Safeguarding Humanity: Ensuring AI Remains a Servant, Not a Master
kgldeshapriya
4 Oct 2023 17:52 UTC
−20
points
2
comments
2
min read
LW
link
The 5 Pillars of Happiness
Gabi QUENE
4 Oct 2023 17:50 UTC
−24
points
5
comments
5
min read
LW
link
[Question]
Using Reinforcement Learning to try to control the heating of a building (district heating)
Tony Karlsson
4 Oct 2023 17:47 UTC
3
points
5
comments
1
min read
LW
link
rationalistic probability(litterally just throwing shit out there)
NotaSprayer ASprayer
4 Oct 2023 17:46 UTC
−30
points
8
comments
2
min read
LW
link
AISN #23: New OpenAI Models, News from Anthropic, and Representation Engineering
Dan H
4 Oct 2023 17:37 UTC
15
points
2
comments
5
min read
LW
link
(newsletter.safe.ai)
I don’t find the lie detection results that surprising (by an author of the paper)
JanB
4 Oct 2023 17:10 UTC
97
points
8
comments
3
min read
LW
link
[Question]
What evidence is there of LLM’s containing world models?
Chris_Leong
4 Oct 2023 14:33 UTC
17
points
17
comments
1
min read
LW
link
Entanglement and intuition about words and meaning
Bill Benzon
4 Oct 2023 14:16 UTC
4
points
0
comments
2
min read
LW
link
Why a Mars colony would lead to a first strike situation
Remmelt
4 Oct 2023 11:29 UTC
−60
points
8
comments
1
min read
LW
link
(mflb.com)
[Question]
What are some examples of AIs instantiating the ‘nearest unblocked strategy problem’?
EJT
4 Oct 2023 11:05 UTC
6
points
4
comments
1
min read
LW
link
Graphical tensor notation for interpretability
Jordan Taylor
4 Oct 2023 8:04 UTC
141
points
11
comments
19
min read
LW
link
[Link] Bay Area Winter Solstice 2023
tcheasdfjkl
and
TheSkeward
4 Oct 2023 2:19 UTC
18
points
3
comments
1
min read
LW
link
(fb.me)
[Question]
Who determines whether an alignment proposal is the definitive alignment solution?
MiguelDev
3 Oct 2023 22:39 UTC
−1
points
6
comments
1
min read
LW
link
AXRP Episode 25 - Cooperative AI with Caspar Oesterheld
DanielFilan
3 Oct 2023 21:50 UTC
43
points
0
comments
92
min read
LW
link
When to Get the Booster?
jefftk
3 Oct 2023 21:00 UTC
50
points
15
comments
2
min read
LW
link
(www.jefftk.com)
OpenAI-Microsoft partnership
Zach Stein-Perlman
3 Oct 2023 20:01 UTC
51
points
19
comments
1
min read
LW
link
[Question]
Current AI safety techniques?
Zach Stein-Perlman
3 Oct 2023 19:30 UTC
30
points
2
comments
2
min read
LW
link
Testing and Automation for Intelligent Systems.
Sai Kiran Kammari
3 Oct 2023 17:51 UTC
−13
points
0
comments
1
min read
LW
link
(resource-cms.springernature.com)
Metaculus Announces Forecasting Tournament to Evaluate Focused Research Organizations, in Partnership With the Federation of American Scientists
ChristianWilliams
3 Oct 2023 16:44 UTC
13
points
0
comments
LW
link
(www.metaculus.com)
What would it mean to understand how a large language model (LLM) works? Some quick notes.
Bill Benzon
3 Oct 2023 15:11 UTC
20
points
4
comments
8
min read
LW
link
[Question]
Potential alignment targets for a sovereign superintelligent AI
Paul Colognese
3 Oct 2023 15:09 UTC
29
points
4
comments
1
min read
LW
link
Monthly Roundup #11: October 2023
Zvi
3 Oct 2023 14:10 UTC
42
points
12
comments
35
min read
LW
link
(thezvi.wordpress.com)
Why We Use Money? - A Walrasian View
Savio Coelho
3 Oct 2023 12:02 UTC
4
points
3
comments
8
min read
LW
link
Mech Interp Challenge: October—Deciphering the Sorted List Model
CallumMcDougall
3 Oct 2023 10:57 UTC
23
points
0
comments
3
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel