Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
2
Rationality Research Report: Towards 10x OODA Looping?
Raemon
Feb 24, 2024, 9:06 PM
117
points
25
comments
15
min read
LW
link
Let’s ask some of the largest LLMs for tips and ideas on how to take over the world
Super AGI
Feb 24, 2024, 8:35 PM
1
point
0
comments
7
min read
LW
link
Exercise: Planmaking, Surprise Anticipation, and “Baba is You”
Raemon
Feb 24, 2024, 8:33 PM
67
points
31
comments
6
min read
LW
link
In search of God.
Spiritus Dei
Feb 24, 2024, 6:59 PM
−19
points
3
comments
7
min read
LW
link
Impossibility of Anthropocentric-Alignment
False Name
Feb 24, 2024, 6:31 PM
−8
points
2
comments
39
min read
LW
link
The Inner Alignment Problem
Jakub Halmeš
Feb 24, 2024, 5:55 PM
1
point
1
comment
3
min read
LW
link
(jakubhalmes.substack.com)
We Need Major, But Not Radical, FDA Reform
Maxwell Tabarrok
Feb 24, 2024, 4:54 PM
42
points
12
comments
7
min read
LW
link
(www.maximum-progress.com)
After Overmorrow: Scattered Musings on the Immediate Post-AGI World
Yuli_Ban
Feb 24, 2024, 3:49 PM
−3
points
0
comments
26
min read
LW
link
[Question]
CDT vs. EDT on Deterrence
Terence Coelho
Feb 24, 2024, 3:41 PM
1
point
9
comments
1
min read
LW
link
Balancing Games
jefftk
Feb 24, 2024, 2:40 PM
62
points
18
comments
1
min read
LW
link
(www.jefftk.com)
How well do truth probes generalise?
mishajw
Feb 24, 2024, 2:12 PM
93
points
11
comments
9
min read
LW
link
Rawls’s Veil of Ignorance Doesn’t Make Any Sense
Arjun Panickssery
Feb 24, 2024, 1:18 PM
9
points
9
comments
1
min read
LW
link
[Question]
Can someone explain to me what went wrong with ChatGPT?
Valentin Baltadzhiev
Feb 24, 2024, 11:50 AM
9
points
1
comment
1
min read
LW
link
The Sense Of Physical Necessity: A Naturalism Demo (Introduction)
LoganStrohl
Feb 24, 2024, 2:56 AM
59
points
1
comment
6
min read
LW
link
Instrumental deception and manipulation in LLMs—a case study
Olli Järviniemi
Feb 24, 2024, 2:07 AM
39
points
13
comments
12
min read
LW
link
A starting point for making sense of task structure (in machine learning)
Kaarel
,
RP
and
jake_mendel
Feb 24, 2024, 1:51 AM
45
points
2
comments
12
min read
LW
link
Why you, personally, should want a larger human population
jasoncrawford
Feb 23, 2024, 7:48 PM
32
points
32
comments
5
min read
LW
link
(rootsofprogress.org)
Deliberative Cognitive Algorithms as Scaffolding
Cole Wyeth
Feb 23, 2024, 5:15 PM
20
points
4
comments
3
min read
LW
link
The Shutdown Problem: Incomplete Preferences as a Solution
EJT
Feb 23, 2024, 4:01 PM
53
points
33
comments
42
min read
LW
link
In set theory, everything is a set
Jacob G-W
Feb 23, 2024, 2:35 PM
11
points
9
comments
2
min read
LW
link
The role of philosophical thinking in understanding large language models: Calibrating and closing the gap between first-person experience and underlying mechanisms
Bill Benzon
Feb 23, 2024, 12:19 PM
4
points
0
comments
10
min read
LW
link
Deep and obvious points in the gap between your thoughts and your pictures of thought
KatjaGrace
Feb 23, 2024, 7:30 AM
42
points
6
comments
1
min read
LW
link
(worldspiritsockpuppet.com)
Parasocial relationship logic
KatjaGrace
Feb 23, 2024, 7:30 AM
20
points
1
comment
1
min read
LW
link
(worldspiritsockpuppet.com)
Shaming with and without naming
KatjaGrace
Feb 23, 2024, 7:30 AM
17
points
5
comments
2
min read
LW
link
(worldspiritsockpuppet.com)
Complexity of value but not disvalue implies more focus on s-risk. Moral uncertainty and preference utilitarianism also do.
Chi Nguyen
Feb 23, 2024, 6:10 AM
52
points
18
comments
LW
link
[Question]
Does increasing the power of a multimodal LLM get you an agentic AI?
yanni kyriacos
Feb 23, 2024, 4:14 AM
3
points
3
comments
1
min read
LW
link
The natural boundaries between people
Chipmonk
Feb 23, 2024, 1:09 AM
23
points
2
comments
8
min read
LW
link
(chipmonk.substack.com)
Contra Ngo et al. “Every ‘Every Bay Area House Party’ Bay Area House Party”
Ricki Heicklen
Feb 22, 2024, 11:56 PM
186
points
5
comments
4
min read
LW
link
(bayesshammai.substack.com)
AI #52: Oops
Zvi
Feb 22, 2024, 9:50 PM
50
points
9
comments
29
min read
LW
link
(thezvi.wordpress.com)
Embed your second brain in your first brain
dkl9
Feb 22, 2024, 9:46 PM
10
points
3
comments
1
min read
LW
link
(dkl9.net)
The Gemini Incident
Zvi
Feb 22, 2024, 9:00 PM
80
points
19
comments
18
min read
LW
link
(thezvi.wordpress.com)
Some Thoughts On Using Auctions For Land Valuation
harsimony
Feb 22, 2024, 7:54 PM
0
points
9
comments
9
min read
LW
link
(progressandpoverty.substack.com)
The Binding of Isaac & Transparent Newcomb’s Problem
suvjectibity
Feb 22, 2024, 6:56 PM
−11
points
0
comments
10
min read
LW
link
Language Models Don’t Learn the Physical Manifestation of Language
Bruce W. Lee
and
Jaehyuk Lim
Feb 22, 2024, 6:52 PM
39
points
23
comments
1
min read
LW
link
(arxiv.org)
Sora What
Zvi
Feb 22, 2024, 6:10 PM
47
points
3
comments
9
min read
LW
link
(thezvi.wordpress.com)
Do sparse autoencoders find “true features”?
Demian Till
Feb 22, 2024, 6:06 PM
74
points
33
comments
11
min read
LW
link
Everything Wrong with Roko’s Claims about an Engineered Pandemic
WitheringWeights
Feb 22, 2024, 3:59 PM
96
points
10
comments
16
min read
LW
link
The One and a Half Gemini
Zvi
Feb 22, 2024, 1:10 PM
73
points
4
comments
8
min read
LW
link
(thezvi.wordpress.com)
[Question]
How do I make predictions about the future to make sense of what to do with my life?
Raj Thimmiah
Feb 22, 2024, 11:22 AM
8
points
1
comment
1
min read
LW
link
How are voluntary commitments on vulnerability reporting going?
Adam Jones
Feb 22, 2024, 8:43 AM
23
points
1
comment
1
min read
LW
link
(adamjones.me)
Notes on Internal Objectives in Toy Models of Agents
Paul Colognese
Feb 22, 2024, 8:02 AM
16
points
0
comments
8
min read
LW
link
The Byronic Hero Always Loses
Cole Wyeth
Feb 22, 2024, 1:31 AM
32
points
4
comments
2
min read
LW
link
Job Listing: Managing Editor / Writer
Gretta Duleba
Feb 21, 2024, 11:41 PM
43
points
2
comments
1
min read
LW
link
The Pareto Best and the Curse of Doom
Screwtape
Feb 21, 2024, 11:10 PM
120
points
21
comments
9
min read
LW
link
AISN #31: A New AI Policy Bill in California Plus, Precedents for AI Governance and The EU AI Office
Dan H
Feb 21, 2024, 9:58 PM
17
points
0
comments
6
min read
LW
link
(newsletter.safe.ai)
Analogies between scaling labs and misaligned superintelligent AI
scasper
Feb 21, 2024, 7:29 PM
77
points
5
comments
4
min read
LW
link
Extinction Risks from AI: Invisible to Science?
VojtaKovarik
,
Chris van Merwijk
and
Ida Mattsson
Feb 21, 2024, 6:07 PM
24
points
7
comments
1
min read
LW
link
(arxiv.org)
Extinction-level Goodhart’s Law as a Property of the Environment
VojtaKovarik
and
Ida Mattsson
Feb 21, 2024, 5:56 PM
23
points
0
comments
10
min read
LW
link
Dynamics Crucial to AI Risk Seem to Make for Complicated Models
VojtaKovarik
and
Ida Mattsson
Feb 21, 2024, 5:54 PM
19
points
0
comments
9
min read
LW
link
Which Model Properties are Necessary for Evaluating an Argument?
VojtaKovarik
and
Ida Mattsson
Feb 21, 2024, 5:52 PM
18
points
2
comments
7
min read
LW
link
Previous
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel