Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
Ice: The Penultimate Frontier
Roko
Jul 13, 2024, 11:44 PM
63
points
56
comments
1
min read
LW
link
(transhumanaxiology.substack.com)
Trust as a bottleneck to growing teams quickly
benkuhn
Jul 13, 2024, 6:00 PM
44
points
3
comments
5
min read
LW
link
(www.benkuhn.net)
Stitching SAEs of different sizes
Bart Bussmann
,
Patrick Leask
,
Joseph Bloom
,
Curt Tigges
and
Neel Nanda
Jul 13, 2024, 5:19 PM
39
points
12
comments
12
min read
LW
link
Kinds of Motivation
Sable
Jul 13, 2024, 3:52 PM
7
points
2
comments
7
min read
LW
link
(affablyevil.substack.com)
A simple case for extreme inner misalignment
Richard_Ngo
Jul 13, 2024, 3:40 PM
84
points
41
comments
7
min read
LW
link
Reality Testing
Ben Turtel
Jul 13, 2024, 3:20 PM
−2
points
1
comment
6
min read
LW
link
(bturtel.substack.com)
The world is awful. The world is much better. The world can be much better: The Animation.
Writer
Jul 13, 2024, 2:03 PM
10
points
0
comments
LW
link
(youtu.be)
The Modern Problems with Conformity
Zero Contradictions
Jul 13, 2024, 8:20 AM
0
points
5
comments
1
min read
LW
link
(expandingrationality.substack.com)
Designing Artificial Wisdom: GitWise and AlphaWise
Jordan Arel
Jul 13, 2024, 6:46 AM
2
points
0
comments
7
min read
LW
link
OpenAI’s Intelligence Levels
infinibot27
Jul 13, 2024, 6:25 AM
1
point
0
comments
1
min read
LW
link
(www.bloomberg.com)
Some desirable properties of automated wisdom
Marius Adrian Nicoară
Jul 13, 2024, 6:05 AM
3
points
2
comments
6
min read
LW
link
Thought Experiments Website
minmi_drover
Jul 13, 2024, 4:47 AM
11
points
11
comments
1
min read
LW
link
A Second Wetsuit Summer
jefftk
Jul 13, 2024, 2:00 AM
19
points
2
comments
1
min read
LW
link
(www.jefftk.com)
Timaeus is hiring!
Jesse Hoogland
,
Stan van Wingerden
,
Alexander Gietelink Oldenziel
and
Daniel Murfet
Jul 12, 2024, 11:42 PM
67
points
6
comments
2
min read
LW
link
Consider attending the AI Security Forum ’24, a 1-day pre-DEFCON event
Charlie Rogers-Smith
Jul 12, 2024, 11:01 PM
21
points
0
comments
1
min read
LW
link
Memorising molecular structures
dkl9
Jul 12, 2024, 10:40 PM
6
points
0
comments
2
min read
LW
link
(dkl9.net)
Robin Hanson AI X-Risk Debate — Highlights and Analysis
Liron
Jul 12, 2024, 9:31 PM
46
points
7
comments
45
min read
LW
link
(www.youtube.com)
Designing Artificial Wisdom: The Wise Workflow Research Organization
Jordan Arel
Jul 12, 2024, 7:18 PM
2
points
0
comments
8
min read
LW
link
Whiteboard Pen Magazines are Useful
Johannes C. Mayer
Jul 12, 2024, 5:15 PM
40
points
8
comments
1
min read
LW
link
Alignment: “Do what I would have wanted you to do”
Oleg Trott
Jul 12, 2024, 4:47 PM
11
points
48
comments
1
min read
LW
link
Virtue taxation
Dentosal
Jul 12, 2024, 2:56 PM
9
points
1
comment
2
min read
LW
link
Most smart and skilled people are outside of the EA/rationalist community: an analysis
titotal
Jul 12, 2024, 12:13 PM
109
points
39
comments
LW
link
(open.substack.com)
2024 Freedom Communities Events
Tudor Iliescu
Jul 12, 2024, 8:04 AM
−6
points
1
comment
1
min read
LW
link
Faithful vs Interpretable Sparse Autoencoder Evals
Louka Ewington-Pitsos
Jul 12, 2024, 5:37 AM
2
points
0
comments
12
min read
LW
link
Moving away from physical continuity
ProgramCrafter
Jul 12, 2024, 5:05 AM
2
points
1
comment
1
min read
LW
link
Transformer Circuit Faithfulness Metrics Are Not Robust
Joseph Miller
,
bilalchughtai
and
William_S
Jul 12, 2024, 3:47 AM
104
points
5
comments
7
min read
LW
link
(arxiv.org)
On Artificial Wisdom
Jordan Arel
Jul 12, 2024, 12:20 AM
3
points
0
comments
14
min read
LW
link
Yoshua Bengio: Reasoning through arguments against taking AI safety seriously
Judd Rosenblatt
Jul 11, 2024, 11:53 PM
70
points
3
comments
1
min read
LW
link
(yoshuabengio.org)
Podcast: “How the Smart Money teaches trading with Ricki Heicklen” (Patrick McKenzie interviewing)
rossry
Jul 11, 2024, 10:49 PM
20
points
2
comments
1
min read
LW
link
(www.complexsystemspodcast.com)
Superbabies: Putting The Pieces Together
sarahconstantin
Jul 11, 2024, 8:40 PM
215
points
37
comments
10
min read
LW
link
(sarahconstantin.substack.com)
Sherlockian Abduction Master List
Cole Wyeth
Jul 11, 2024, 8:27 PM
52
points
66
comments
36
min read
LW
link
Thoughts to niplav on lie-detection, truthfwl mechanisms, and wealth-inequality
Emrik
and
niplav
Jul 11, 2024, 6:55 PM
7
points
8
comments
11
min read
LW
link
Games for AI Control
charlie_griffin
and
Buck
Jul 11, 2024, 6:40 PM
45
points
0
comments
5
min read
LW
link
Video Intro to Guaranteed Safe AI
Mike Vaiana
,
Diogo de Lucena
and
AE Studio
Jul 11, 2024, 5:53 PM
27
points
0
comments
1
min read
LW
link
(youtu.be)
Effective Empathy
Thac0
11 Jul 2024 15:14 UTC
4
points
1
comment
1
min read
LW
link
AI #72: Denying the Future
Zvi
11 Jul 2024 15:00 UTC
45
points
8
comments
41
min read
LW
link
(thezvi.wordpress.com)
The Best Bits From Build, Baby, Build
Maxwell Tabarrok
11 Jul 2024 14:09 UTC
23
points
0
comments
4
min read
LW
link
(www.maximum-progress.com)
[Question]
What Other Lines of Work are Safe from AI Automation?
RogerDearnaley
11 Jul 2024 10:01 UTC
34
points
35
comments
5
min read
LW
link
Decomposing Agency — capabilities without desires
owencb
and
Raymond Douglas
11 Jul 2024 9:38 UTC
153
points
32
comments
12
min read
LW
link
(strangecities.substack.com)
Reliable Sources: The Story of David Gerard
TracingWoodgrains
10 Jul 2024 19:50 UTC
391
points
54
comments
43
min read
LW
link
Managing Emotional Potential Energy
adamShimi
10 Jul 2024 18:20 UTC
24
points
4
comments
4
min read
LW
link
(epistemologicalfascinations.substack.com)
[EAForum xpost] A breakdown of OpenAI’s revenue
dschwarz
and
Lawrence Phillips
10 Jul 2024 18:09 UTC
57
points
5
comments
1
min read
LW
link
(forum.effectivealtruism.org)
Solving Pascal’s Wager using dynamic programming
Paul Wilczewski
10 Jul 2024 18:09 UTC
1
point
0
comments
5
min read
LW
link
Fluent, Cruxy Predictions
Raemon
10 Jul 2024 18:00 UTC
86
points
14
comments
14
min read
LW
link
Antitrust as Controlled Creative Destruction
Martin Sustrik
10 Jul 2024 16:40 UTC
14
points
2
comments
2
min read
LW
link
(250bpm.substack.com)
New page: Integrity
Zach Stein-Perlman
10 Jul 2024 15:00 UTC
91
points
3
comments
1
min read
LW
link
AirBnB Baking
jefftk
10 Jul 2024 12:50 UTC
7
points
1
comment
1
min read
LW
link
(www.jefftk.com)
DIY RLHF: A simple implementation for hands on experience
Mike Vaiana
and
AE Studio
10 Jul 2024 12:07 UTC
28
points
0
comments
6
min read
LW
link
Usefulness grounds truth
invertedpassion
10 Jul 2024 7:58 UTC
0
points
0
comments
4
min read
LW
link
On passing Complete and Honest Ideological Turing Tests (CHITTs)
Aryeh Englander
10 Jul 2024 4:01 UTC
11
points
2
comments
1
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel