Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
2
Don’t Dismiss Simple Alignment Approaches
Chris_Leong
Oct 7, 2023, 12:35 AM
137
points
9
comments
4
min read
LW
link
Linking Alt Accounts
jefftk
Oct 6, 2023, 5:00 PM
70
points
33
comments
1
min read
LW
link
(www.jefftk.com)
Super-Exponential versus Exponential Growth in Compute Price-Performance
moridinamael
Oct 6, 2023, 4:23 PM
37
points
25
comments
2
min read
LW
link
A personal explanation of ELK concept and task.
Zeyu Qin
Oct 6, 2023, 3:55 AM
1
point
0
comments
1
min read
LW
link
The Long-Term Future Fund is looking for a full-time fund chair
Linch
,
calebp99
and
abergal
Oct 5, 2023, 10:18 PM
52
points
0
comments
7
min read
LW
link
(forum.effectivealtruism.org)
Provably Safe AI
PeterMcCluskey
Oct 5, 2023, 10:18 PM
35
points
15
comments
4
min read
LW
link
(bayesianinvestor.com)
Stampy’s AI Safety Info soft launch
steven0461
and
Robert Miles
Oct 5, 2023, 10:13 PM
120
points
9
comments
2
min read
LW
link
Impacts of AI on the housing markets
PottedRosePetal
Oct 5, 2023, 9:24 PM
8
points
0
comments
5
min read
LW
link
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
Zac Hatfield-Dodds
Oct 5, 2023, 9:01 PM
288
points
22
comments
2
min read
LW
link
1
review
(transformer-circuits.pub)
Ideation and Trajectory Modelling in Language Models
NickyP
Oct 5, 2023, 7:21 PM
16
points
2
comments
10
min read
LW
link
A well-defined history in measurable factor spaces
Matthias G. Mayer
Oct 5, 2023, 6:36 PM
22
points
0
comments
2
min read
LW
link
Evaluating the historical value misspecification argument
Matthew Barnett
Oct 5, 2023, 6:34 PM
188
points
162
comments
7
min read
LW
link
3
reviews
Translations Should Invert
abramdemski
Oct 5, 2023, 5:44 PM
48
points
19
comments
3
min read
LW
link
Censorship in LLMs is here to stay because it mirrors how our own intelligence is structured
mnvr
Oct 5, 2023, 5:37 PM
3
points
0
comments
1
min read
LW
link
Twin Cities ACX Meetup October 2023
Timothy M.
Oct 5, 2023, 4:29 PM
1
point
2
comments
1
min read
LW
link
This anime storyboard doesn’t exist: a graphic novel written and illustrated by GPT4
RomanS
Oct 5, 2023, 2:01 PM
12
points
7
comments
55
min read
LW
link
AI #32: Lie Detector
Zvi
Oct 5, 2023, 1:50 PM
45
points
19
comments
44
min read
LW
link
(thezvi.wordpress.com)
Can the House Legislate?
jefftk
Oct 5, 2023, 1:40 PM
26
points
6
comments
2
min read
LW
link
(www.jefftk.com)
Making progress on the ``what alignment target should be aimed at?″ question, is urgent
ThomasCederborg
Oct 5, 2023, 12:55 PM
2
points
0
comments
18
min read
LW
link
Response to Quintin Pope’s Evolution Provides No Evidence For the Sharp Left Turn
Zvi
Oct 5, 2023, 11:39 AM
129
points
29
comments
9
min read
LW
link
How to Get Rationalist Feedback
Nicholas / Heather Kross
Oct 5, 2023, 2:03 AM
16
points
0
comments
2
min read
LW
link
On my AI Fable, and the importance of de re, de dicto, and de se reference for AI alignment
PhilGoetz
Oct 5, 2023, 12:50 AM
9
points
5
comments
1
min read
LW
link
Underspecified Probabilities: A Thought Experiment
lunatic_at_large
Oct 4, 2023, 10:25 PM
8
points
4
comments
2
min read
LW
link
Fraternal Birth Order Effect and the Maternal Immune Hypothesis
Bucky
Oct 4, 2023, 9:18 PM
20
points
1
comment
2
min read
LW
link
How to solve deception and still fail.
Charlie Steiner
Oct 4, 2023, 7:56 PM
40
points
7
comments
6
min read
LW
link
PortAudio M1 Latency
jefftk
Oct 4, 2023, 7:10 PM
8
points
5
comments
1
min read
LW
link
(www.jefftk.com)
Open Philanthropy is hiring for multiple roles across our Global Catastrophic Risks teams
aarongertler
Oct 4, 2023, 6:04 PM
6
points
0
comments
3
min read
LW
link
(forum.effectivealtruism.org)
Safeguarding Humanity: Ensuring AI Remains a Servant, Not a Master
kgldeshapriya
Oct 4, 2023, 5:52 PM
−20
points
2
comments
2
min read
LW
link
The 5 Pillars of Happiness
Gabi QUENE
Oct 4, 2023, 5:50 PM
−24
points
5
comments
5
min read
LW
link
[Question]
Using Reinforcement Learning to try to control the heating of a building (district heating)
Tony Karlsson
Oct 4, 2023, 5:47 PM
3
points
5
comments
1
min read
LW
link
rationalistic probability(litterally just throwing shit out there)
NotaSprayer ASprayer
Oct 4, 2023, 5:46 PM
−30
points
8
comments
2
min read
LW
link
AISN #23: New OpenAI Models, News from Anthropic, and Representation Engineering
Dan H
Oct 4, 2023, 5:37 PM
15
points
2
comments
5
min read
LW
link
(newsletter.safe.ai)
I don’t find the lie detection results that surprising (by an author of the paper)
JanB
Oct 4, 2023, 5:10 PM
97
points
8
comments
3
min read
LW
link
[Question]
What evidence is there of LLM’s containing world models?
Chris_Leong
Oct 4, 2023, 2:33 PM
17
points
17
comments
1
min read
LW
link
Entanglement and intuition about words and meaning
Bill Benzon
Oct 4, 2023, 2:16 PM
4
points
0
comments
2
min read
LW
link
Why a Mars colony would lead to a first strike situation
Remmelt
Oct 4, 2023, 11:29 AM
−60
points
8
comments
1
min read
LW
link
(mflb.com)
[Question]
What are some examples of AIs instantiating the ‘nearest unblocked strategy problem’?
EJT
Oct 4, 2023, 11:05 AM
6
points
4
comments
1
min read
LW
link
Graphical tensor notation for interpretability
Jordan Taylor
Oct 4, 2023, 8:04 AM
141
points
11
comments
19
min read
LW
link
[Link] Bay Area Winter Solstice 2023
tcheasdfjkl
and
TheSkeward
Oct 4, 2023, 2:19 AM
18
points
3
comments
1
min read
LW
link
(fb.me)
[Question]
Who determines whether an alignment proposal is the definitive alignment solution?
MiguelDev
Oct 3, 2023, 10:39 PM
−1
points
6
comments
1
min read
LW
link
AXRP Episode 25 - Cooperative AI with Caspar Oesterheld
DanielFilan
Oct 3, 2023, 9:50 PM
43
points
0
comments
92
min read
LW
link
When to Get the Booster?
jefftk
Oct 3, 2023, 9:00 PM
50
points
15
comments
2
min read
LW
link
(www.jefftk.com)
OpenAI-Microsoft partnership
Zach Stein-Perlman
Oct 3, 2023, 8:01 PM
51
points
19
comments
1
min read
LW
link
[Question]
Current AI safety techniques?
Zach Stein-Perlman
Oct 3, 2023, 7:30 PM
30
points
2
comments
2
min read
LW
link
Testing and Automation for Intelligent Systems.
Sai Kiran Kammari
Oct 3, 2023, 5:51 PM
−13
points
0
comments
1
min read
LW
link
(resource-cms.springernature.com)
Metaculus Announces Forecasting Tournament to Evaluate Focused Research Organizations, in Partnership With the Federation of American Scientists
ChristianWilliams
Oct 3, 2023, 4:44 PM
13
points
0
comments
LW
link
(www.metaculus.com)
What would it mean to understand how a large language model (LLM) works? Some quick notes.
Bill Benzon
Oct 3, 2023, 3:11 PM
20
points
4
comments
8
min read
LW
link
[Question]
Potential alignment targets for a sovereign superintelligent AI
Paul Colognese
Oct 3, 2023, 3:09 PM
29
points
4
comments
1
min read
LW
link
Monthly Roundup #11: October 2023
Zvi
3 Oct 2023 14:10 UTC
42
points
12
comments
35
min read
LW
link
(thezvi.wordpress.com)
Why We Use Money? - A Walrasian View
Savio Coelho
3 Oct 2023 12:02 UTC
4
points
3
comments
8
min read
LW
link
Previous
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel