Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Page
1
Job Listing: Managing Editor / Writer
Gretta Duleba
Feb 21, 2024, 11:41 PM
43
points
2
comments
1
min read
LW
link
The Pareto Best and the Curse of Doom
Screwtape
Feb 21, 2024, 11:10 PM
120
points
21
comments
9
min read
LW
link
AISN #31: A New AI Policy Bill in California Plus, Precedents for AI Governance and The EU AI Office
Dan H
Feb 21, 2024, 9:58 PM
17
points
0
comments
6
min read
LW
link
(newsletter.safe.ai)
Analogies between scaling labs and misaligned superintelligent AI
scasper
Feb 21, 2024, 7:29 PM
77
points
5
comments
4
min read
LW
link
Extinction Risks from AI: Invisible to Science?
VojtaKovarik
,
Chris van Merwijk
and
Ida Mattsson
Feb 21, 2024, 6:07 PM
24
points
7
comments
1
min read
LW
link
(arxiv.org)
Extinction-level Goodhart’s Law as a Property of the Environment
VojtaKovarik
and
Ida Mattsson
Feb 21, 2024, 5:56 PM
23
points
0
comments
10
min read
LW
link
Dynamics Crucial to AI Risk Seem to Make for Complicated Models
VojtaKovarik
and
Ida Mattsson
Feb 21, 2024, 5:54 PM
19
points
0
comments
9
min read
LW
link
Which Model Properties are Necessary for Evaluating an Argument?
VojtaKovarik
and
Ida Mattsson
Feb 21, 2024, 5:52 PM
18
points
2
comments
7
min read
LW
link
Weak vs Quantitative Extinction-level Goodhart’s Law
VojtaKovarik
and
Ida Mattsson
Feb 21, 2024, 5:38 PM
27
points
1
comment
2
min read
LW
link
Dual Wielding Kindle Scribes
mesaoptimizer
Feb 21, 2024, 5:17 PM
57
points
18
comments
6
min read
LW
link
A Tale of Two Restaurant Types
Zvi
Feb 21, 2024, 1:50 PM
15
points
0
comments
6
min read
LW
link
(thezvi.wordpress.com)
Less Wrong automated systems are inadvertently Censoring me
Roko
Feb 21, 2024, 12:57 PM
6
points
52
comments
1
min read
LW
link
[Question]
What is the research speed multiplier of the most advanced current LLMs?
wunan
Feb 21, 2024, 12:39 PM
6
points
2
comments
1
min read
LW
link
Jailbreaking GPT-4 with the tool API
mishajw
Feb 21, 2024, 11:16 AM
20
points
2
comments
4
min read
LW
link
Gut Renovating Another Bathroom
jefftk
Feb 21, 2024, 3:00 AM
22
points
0
comments
2
min read
LW
link
(www.jefftk.com)
Thoughts for and against an ASI figuring out ethics for itself
sweenesm
Feb 20, 2024, 11:40 PM
6
points
10
comments
3
min read
LW
link
AI #51: Altman’s Ambition
Zvi
Feb 20, 2024, 7:50 PM
83
points
5
comments
38
min read
LW
link
(thezvi.wordpress.com)
The Third Gemini
Zvi
Feb 20, 2024, 7:50 PM
30
points
2
comments
9
min read
LW
link
(thezvi.wordpress.com)
Why does generalization work?
Martín Soto
Feb 20, 2024, 5:51 PM
43
points
16
comments
4
min read
LW
link
ChatGPT refuses to accept a challenge where it would get shot between the eyes [game theory]
Bill Benzon
Feb 20, 2024, 4:55 PM
4
points
6
comments
4
min read
LW
link
Inducing human-like biases in moral reasoning LMs
Artyom Karpov
,
Austin Meek
,
Bogdan Ionut Cirstea
and
SCho
Feb 20, 2024, 4:28 PM
23
points
3
comments
14
min read
LW
link
Monthly Roundup #15: February 2024
Zvi
Feb 20, 2024, 1:10 PM
22
points
7
comments
32
min read
LW
link
(thezvi.wordpress.com)
Selections From “The Trouble With Being Born”
Arjun Panickssery
Feb 20, 2024, 10:07 AM
23
points
2
comments
2
min read
LW
link
(arjunpanickssery.substack.com)
Difficulty classes for alignment properties
Jozdien
Feb 20, 2024, 9:08 AM
34
points
5
comments
2
min read
LW
link
Lessons from Failed Attempts to Model Sleeping Beauty Problem
Ape in the coat
Feb 20, 2024, 6:43 AM
13
points
16
comments
14
min read
LW
link
flowing like water; hard like stone
lsusr
and
SilverFlame
Feb 20, 2024, 3:20 AM
27
points
4
comments
4
min read
LW
link
Theism Isn’t So Crazy
omnizoid
Feb 20, 2024, 3:20 AM
−31
points
11
comments
19
min read
LW
link
[Question]
Getting started at distillations: can critique mine?
Joyee Chen
Feb 20, 2024, 12:49 AM
2
points
0
comments
1
min read
LW
link
Auditing LMs with counterfactual search: a tool for control and ELK
Jacob Pfau
Feb 20, 2024, 12:02 AM
28
points
6
comments
10
min read
LW
link
Rationalist Storytelling (French)
Camille Berger
Feb 19, 2024, 10:25 PM
3
points
0
comments
1
min read
LW
link
Abs-E (or, speak only in the positive)
dkl9
Feb 19, 2024, 9:14 PM
29
points
24
comments
2
min read
LW
link
(dkl9.net)
Retirement Accounts and Short Timelines
jefftk
Feb 19, 2024, 6:50 PM
83
points
35
comments
2
min read
LW
link
(www.jefftk.com)
How Technical AI Safety Researchers Can Help Implement Punitive Damages to Mitigate Catastrophic AI Risk
Gabriel Weil
Feb 19, 2024, 6:00 PM
18
points
0
comments
4
min read
LW
link
Protocol evaluations: good analogies vs control
Fabien Roger
Feb 19, 2024, 6:00 PM
42
points
10
comments
11
min read
LW
link
When Should Copyright Get Shorter?
Maxwell Tabarrok
Feb 19, 2024, 4:03 PM
11
points
14
comments
4
min read
LW
link
(www.maximum-progress.com)
Auto-matching hidden layers in Pytorch LLMs
chanind
Feb 19, 2024, 12:40 PM
2
points
0
comments
3
min read
LW
link
I’d also take $7 trillion
bhauth
Feb 19, 2024, 3:31 AM
47
points
12
comments
10
min read
LW
link
(www.bhauth.com)
On coincidences and Bayesian reasoning, as applied to the origins of COVID-19
viking_math
Feb 19, 2024, 1:14 AM
62
points
28
comments
14
min read
LW
link
Solution to the two envelopes problem for moral weights
MichaelStJules
Feb 19, 2024, 12:15 AM
9
points
1
comment
LW
link
Conspiracy Investigation Done Right
ymeskhout
Feb 19, 2024, 12:09 AM
24
points
0
comments
6
min read
LW
link
Scientific Method
Andrij “Androniq” Ghorbunov
Feb 18, 2024, 9:06 PM
24
points
4
comments
30
min read
LW
link
[Question]
Weighing reputational and moral consequences of leaving Russia or staying
spza
Feb 18, 2024, 7:36 PM
29
points
24
comments
1
min read
LW
link
Things I’ve Grieved
Raemon
18 Feb 2024 19:32 UTC
125
points
6
comments
2
min read
LW
link
Senses of “knowing” a person
dkl9
18 Feb 2024 19:13 UTC
3
points
0
comments
1
min read
LW
link
(dkl9.net)
The Jolly Green Giant Chronicles [ChatGPT]
Bill Benzon
18 Feb 2024 17:28 UTC
4
points
0
comments
8
min read
LW
link
Intuition for 1 + 2 + 3 + … = −1/12
Shankar Sivarajan
18 Feb 2024 16:46 UTC
18
points
28
comments
3
min read
LW
link
No Clickbait—Misalignment Database
Kabir Kumar
18 Feb 2024 5:35 UTC
6
points
10
comments
1
min read
LW
link
Idea: NV⁻ Centers for Brain Interpretability
James Camacho
18 Feb 2024 5:28 UTC
6
points
1
comment
3
min read
LW
link
Celiacs don’t need to live in fear
Jarrah
18 Feb 2024 2:30 UTC
16
points
4
comments
4
min read
LW
link
“What if we could redesign society from scratch? The promise of charter cities.” [Rational Animations video]
Jackson Wagner
18 Feb 2024 0:57 UTC
40
points
7
comments
LW
link
(www.youtube.com)
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel