Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
[Question]
Is there any metric measuring ~”proportion of people creating extra value”?
Amal
Aug 3, 2023, 10:54 PM
7
points
3
comments
1
min read
LW
link
[Question]
Hypothetical: what would you do?
JNS
Aug 3, 2023, 10:39 PM
4
points
2
comments
1
min read
LW
link
[Linkpost] Deception Abilities Emerged in Large Language Models
Bogdan Ionut Cirstea
Aug 3, 2023, 5:28 PM
12
points
0
comments
1
min read
LW
link
Embedding Ethical Priors into AI Systems: A Bayesian Approach
Justausername
Aug 3, 2023, 3:31 PM
−5
points
3
comments
21
min read
LW
link
Password-locked models: a stress case for capabilities evaluation
Fabien Roger
Aug 3, 2023, 2:53 PM
156
points
14
comments
6
min read
LW
link
AI #23: Fundamental Problems with RLHF
Zvi
Aug 3, 2023, 12:50 PM
59
points
9
comments
41
min read
LW
link
(thezvi.wordpress.com)
Bad Imitation Instruments
jefftk
Aug 3, 2023, 2:30 AM
21
points
1
comment
1
min read
LW
link
(www.jefftk.com)
Kolmogorov’s theory of Algorithmic Probability
Aidan Rocke
Aug 3, 2023, 12:58 AM
5
points
2
comments
2
min read
LW
link
(keplerlounge.com)
Work culture creep
CrimsonChin
Aug 3, 2023, 12:38 AM
32
points
15
comments
8
min read
LW
link
[Question]
Boxing
Zach Stein-Perlman
Aug 2, 2023, 11:38 PM
6
points
1
comment
1
min read
LW
link
External rationality vs. internal rationality
metachirality
Aug 2, 2023, 11:29 PM
7
points
0
comments
1
min read
LW
link
When performing a dimensionality reduction on tensors, the trace is often zero.
Joseph Van Name
Aug 2, 2023, 9:06 PM
7
points
1
comment
3
min read
LW
link
Progress links digest, 2023-08-02: Superconductor edition
jasoncrawford
Aug 2, 2023, 8:27 PM
13
points
0
comments
3
min read
LW
link
(rootsofprogress.org)
[Question]
What works for ADHD and/or related things?
TeaTieAndHat
Aug 2, 2023, 6:37 PM
7
points
13
comments
1
min read
LW
link
[Question]
Would you pay for a search engine limited to rationalist sites?
Conor
Aug 2, 2023, 6:06 PM
4
points
19
comments
1
min read
LW
link
The Roots of Progress Blog-Building Intensive: advice for applicants, request for support
jasoncrawford
Aug 2, 2023, 3:37 PM
9
points
0
comments
1
min read
LW
link
(rootsofprogress.org)
3 levels of threat obfuscation
HoldenKarnofsky
Aug 2, 2023, 2:58 PM
69
points
14
comments
7
min read
LW
link
ChatGPT for translation
Varshul Gupta
Aug 2, 2023, 11:57 AM
1
point
0
comments
3
min read
LW
link
(dubverseblack.substack.com)
Long-Term Future Fund: April 2023 grant recommendations
abergal
,
calebp99
,
Linch
,
habryka
,
Thomas Larsen
and
Vaniver
Aug 2, 2023, 7:54 AM
81
points
3
comments
50
min read
LW
link
[Question]
Could we breed/engineer intelligent parrots?
lemonhope
Aug 2, 2023, 7:32 AM
9
points
18
comments
1
min read
LW
link
Anthropical Motte and Bailey in two versions of Sleeping Beauty
Ape in the coat
Aug 2, 2023, 7:08 AM
32
points
56
comments
6
min read
LW
link
solar-thermal and techno-economic analysis
bhauth
Aug 2, 2023, 6:22 AM
21
points
8
comments
5
min read
LW
link
(www.bhauth.com)
South Bay ACX/SSC Meetup @ Whole Foods
allisona
Aug 2, 2023, 3:44 AM
1
point
0
comments
1
min read
LW
link
“Is There Anything That’s Worth More”
Zack_M_Davis
Aug 2, 2023, 3:28 AM
64
points
6
comments
1
min read
LW
link
Bay Winter Solstice: call for speech pitches!
tcheasdfjkl
Aug 2, 2023, 3:24 AM
9
points
0
comments
1
min read
LW
link
(docs.google.com)
[Question]
What is ontology?
Adam Zerner
Aug 2, 2023, 12:54 AM
28
points
19
comments
1
min read
LW
link
My current LK99 questions
Eliezer Yudkowsky
Aug 1, 2023, 10:48 PM
206
points
38
comments
5
min read
LW
link
Spiral Staircase
Michael Samoilov
Aug 1, 2023, 9:51 PM
19
points
2
comments
2
min read
LW
link
Open Mic—August 2023
Adam Zerner
Aug 1, 2023, 7:24 PM
8
points
0
comments
1
min read
LW
link
ARC Evals new report: Evaluating Language-Model Agents on Realistic Autonomous Tasks
Beth Barnes
Aug 1, 2023, 6:30 PM
153
points
12
comments
5
min read
LW
link
(evals.alignment.org)
[Question]
When(if ever) are superstimuli good/useful/advantageous?
Perhaps
Aug 1, 2023, 3:50 PM
−7
points
2
comments
1
min read
LW
link
AISN #17: Automatically Circumventing LLM Guardrails, the Frontier Model Forum, and Senate Hearing on AI Oversight
Dan H
Aug 1, 2023, 3:40 PM
8
points
0
comments
8
min read
LW
link
(newsletter.safe.ai)
AISN #16: White House Secures Voluntary Commitments from Leading AI Labs and Lessons from Oppenheimer
Dan H
and
Corin Katzke
Aug 1, 2023, 3:39 PM
3
points
0
comments
6
min read
LW
link
(newsletter.safe.ai)
“Desperate Honesty” by Agnes Callard
David Gross
Aug 1, 2023, 1:34 PM
11
points
0
comments
2
min read
LW
link
(dailynous.com)
Barbieheimer: Across the Dead Reckoning
Zvi
Aug 1, 2023, 1:00 PM
49
points
17
comments
41
min read
LW
link
(thezvi.wordpress.com)
Untangling Infrabayesianism: A redistillation [PDF link; ~12k words + lots of math]
Lorxus
1 Aug 2023 12:42 UTC
29
points
16
comments
2
min read
LW
link
(docdro.id)
What Is Childhood Supposed To Be?
Sable
1 Aug 2023 9:51 UTC
21
points
13
comments
3
min read
LW
link
(affablyevil.substack.com)
AI romantic partners will harm society if they go unregulated
Roman Leventov
1 Aug 2023 9:32 UTC
26
points
76
comments
13
min read
LW
link
What is autonomy, and how does it lead to greater risk from AI?
Davidmanheim
1 Aug 2023 7:58 UTC
30
points
0
comments
6
min read
LW
link
Evaluating Superhuman Models with Consistency Checks
Daniel Paleka
and
Lukas Fluri
1 Aug 2023 7:51 UTC
21
points
2
comments
9
min read
LW
link
(arxiv.org)
[See link to Sept meetup below!] San Francisco ACX Meetup “First Saturday” August 5, 1 pm
guenael
1 Aug 2023 3:38 UTC
1
point
0
comments
1
min read
LW
link
[Question]
Exercise: Solve “Thinking Physics”
Raemon
1 Aug 2023 0:44 UTC
102
points
30
comments
5
min read
LW
link
1
review
The “public debate” about AI is confusing for the general public and for policymakers because it is a three-sided debate
Adam David Long
1 Aug 2023 0:08 UTC
146
points
30
comments
4
min read
LW
link
The “no sandbagging on checkable tasks” hypothesis
Joe Carlsmith
31 Jul 2023 23:06 UTC
61
points
14
comments
9
min read
LW
link
A Social History of Truth
Vaniver
31 Jul 2023 22:49 UTC
64
points
2
comments
14
min read
LW
link
Watermarking considered overrated?
DanielFilan
31 Jul 2023 21:36 UTC
19
points
4
comments
1
min read
LW
link
What The Lord of the Rings Teaches Us About AI Alignment
Jeffrey Heninger
31 Jul 2023 20:16 UTC
24
points
12
comments
7
min read
LW
link
The “spelling miracle”: GPT-3 spelling abilities and glitch tokens revisited
mwatkins
31 Jul 2023 19:47 UTC
85
points
29
comments
20
min read
LW
link
“Building a House” Review
jefftk
31 Jul 2023 19:20 UTC
62
points
6
comments
1
min read
LW
link
(www.jefftk.com)
The Meaning of Shoggoth AI Memes
Dan Smith
31 Jul 2023 18:52 UTC
−5
points
5
comments
2
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel