Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Page
1
What should we censor from training data?
wassname
Apr 22, 2023, 11:33 PM
16
points
4
comments
1
min read
LW
link
Architecture-aware optimisation: train ImageNet and more without hyperparameters
Chris Mingard
Apr 22, 2023, 9:50 PM
6
points
2
comments
2
min read
LW
link
OpenAI’s GPT-4 Safety Goals
PeterMcCluskey
Apr 22, 2023, 7:11 PM
3
points
3
comments
4
min read
LW
link
(bayesianinvestor.com)
Introducing the Nuts and Bolts Of Naturalism
LoganStrohl
Apr 22, 2023, 6:31 PM
77
points
2
comments
3
min read
LW
link
We Need To Know About Continual Learning
michael_mjd
Apr 22, 2023, 5:08 PM
30
points
14
comments
4
min read
LW
link
The Security Mindset, S-Risk and Publishing Prosaic Alignment Research
lukemarks
Apr 22, 2023, 2:36 PM
40
points
7
comments
5
min read
LW
link
[Question]
How did LW update p(doom) after LLMs blew up?
FinalFormal2
Apr 22, 2023, 2:21 PM
24
points
29
comments
1
min read
LW
link
The Cruel Trade-Off Between AI Misuse and AI X-risk Concerns
simeon_c
Apr 22, 2023, 1:49 PM
24
points
1
comment
2
min read
LW
link
five ways to say “Almost Always” and actually mean it
Yudhister Kumar
Apr 22, 2023, 10:38 AM
17
points
3
comments
2
min read
LW
link
(www.ykumar.org)
P(doom|superintelligence) or coin tosses and dice throws of human values (and other related Ps).
Muyyd
Apr 22, 2023, 10:06 AM
−7
points
0
comments
4
min read
LW
link
[Question]
Is it allowed to post job postings here? I am looking for a new PhD student to work on AI Interpretability. Can I advertise my position?
Tiberius
Apr 22, 2023, 1:22 AM
5
points
4
comments
1
min read
LW
link
LessWrong moderation messaging container
Raemon
Apr 22, 2023, 1:19 AM
21
points
13
comments
1
min read
LW
link
Neural network polytopes (Colab notebook)
Zach Furman
Apr 21, 2023, 10:42 PM
11
points
0
comments
1
min read
LW
link
(colab.research.google.com)
Readability is mostly a waste of characters
vlad.proex
Apr 21, 2023, 10:05 PM
21
points
7
comments
3
min read
LW
link
The Relationship between RLHF and AI Psychology: Debunking the Shoggoth Argument
FinalFormal2
Apr 21, 2023, 10:05 PM
−11
points
8
comments
2
min read
LW
link
Thinking about maximization and corrigibility
James Payor
Apr 21, 2023, 9:22 PM
63
points
4
comments
5
min read
LW
link
Would we even want AI to solve all our problems?
So8res
Apr 21, 2023, 6:04 PM
98
points
15
comments
2
min read
LW
link
The Commission for Stopping Further Improvements: A letter of note from Isambard K. Brunel
jasoncrawford
Apr 21, 2023, 5:42 PM
39
points
0
comments
4
min read
LW
link
(rootsofprogress.org)
Should we publish mechanistic interpretability research?
Marius Hobbhahn
and
LawrenceC
Apr 21, 2023, 4:19 PM
106
points
40
comments
13
min read
LW
link
500 Million, But Not A Single One More—The Animation
Writer
Apr 21, 2023, 3:48 PM
47
points
0
comments
LW
link
Talking publicly about AI risk
Jan_Kulveit
Apr 21, 2023, 11:28 AM
180
points
9
comments
6
min read
LW
link
Notes on “the hot mess theory of AI misalignment”
JakubK
Apr 21, 2023, 10:07 AM
16
points
0
comments
5
min read
LW
link
(sohl-dickstein.github.io)
Requisite Variety
Stephen Fowler
Apr 21, 2023, 8:07 AM
6
points
0
comments
5
min read
LW
link
The Agency Overhang
Jeffrey Ladish
Apr 21, 2023, 7:47 AM
85
points
6
comments
6
min read
LW
link
[Question]
What would “The Medical Model Is Wrong” look like?
Elo
Apr 21, 2023, 1:46 AM
8
points
7
comments
2
min read
LW
link
Gas and Water
jefftk
Apr 21, 2023, 1:30 AM
17
points
9
comments
1
min read
LW
link
(www.jefftk.com)
[Question]
Did the fonts change?
the gears to ascension
Apr 21, 2023, 12:40 AM
2
points
1
comment
1
min read
LW
link
[Question]
Should we openly talk about explicit use cases for AutoGPT?
ChristianKl
Apr 20, 2023, 11:44 PM
20
points
4
comments
1
min read
LW
link
United We Align: Harnessing Collective Human Intelligence for AI Alignment Progress
Shoshannah Tekofsky
Apr 20, 2023, 11:19 PM
41
points
13
comments
25
min read
LW
link
[Question]
Where to start with statistics if I want to measure things?
matto
Apr 20, 2023, 10:40 PM
21
points
7
comments
1
min read
LW
link
Upskilling, bridge-building, research on security/cryptography and AI safety
Allison Duettmann
Apr 20, 2023, 10:32 PM
14
points
0
comments
4
min read
LW
link
Behavioural statistics for a maze-solving agent
peligrietzer
and
TurnTrout
Apr 20, 2023, 10:26 PM
46
points
11
comments
10
min read
LW
link
An introduction to language model interpretability
Alexandre Variengien
Apr 20, 2023, 10:22 PM
14
points
0
comments
9
min read
LW
link
The Case for Brain-Only Preservation
Mati_Roy
Apr 20, 2023, 10:01 PM
21
points
7
comments
1
min read
LW
link
(biostasis.substack.com)
[Question]
Practical ways to actualize our beliefs into concrete bets over a longer time horizon?
M. Y. Zuo
Apr 20, 2023, 9:21 PM
4
points
2
comments
1
min read
LW
link
LW moderation: my current thoughts and questions, 2023-04-12
Ruby
Apr 20, 2023, 9:02 PM
53
points
30
comments
10
min read
LW
link
Proposal: Using Monte Carlo tree search instead of RLHF for alignment research
Christopher King
Apr 20, 2023, 7:57 PM
2
points
7
comments
3
min read
LW
link
DeepMind and Google Brain are merging [Linkpost]
Orpheus16
Apr 20, 2023, 6:47 PM
55
points
5
comments
1
min read
LW
link
(www.deepmind.com)
Ideas for studies on AGI risk
dr_s
Apr 20, 2023, 6:17 PM
5
points
1
comment
11
min read
LW
link
Study 1b: This One Weird Trick does NOT cause incorrectness cascades
Robert_AIZI
Apr 20, 2023, 6:10 PM
5
points
0
comments
6
min read
LW
link
(aizi.substack.com)
An open letter to SERI MATS program organisers
Roman Leventov
Apr 20, 2023, 4:34 PM
26
points
26
comments
4
min read
LW
link
Deception Strategies
Thoth Hermes
Apr 20, 2023, 3:59 PM
−7
points
2
comments
5
min read
LW
link
(thothhermes.substack.com)
Paperclip Club (AI Safety Meetup)
LThorburn
Apr 20, 2023, 3:55 PM
1
point
0
comments
1
min read
LW
link
AI #8: People Can Do Reasonable Things
Zvi
Apr 20, 2023, 3:50 PM
100
points
16
comments
55
min read
LW
link
(thezvi.wordpress.com)
OpenAI could help X-risk by wagering itself
VojtaKovarik
Apr 20, 2023, 2:51 PM
31
points
16
comments
1
min read
LW
link
Japan AI Alignment Conference Postmortem
Chris Scammell
and
Katrina Joslin
Apr 20, 2023, 10:58 AM
71
points
8
comments
8
min read
LW
link
Stability AI releases StableLM, an open-source ChatGPT counterpart
Ozyrus
Apr 20, 2023, 6:04 AM
11
points
3
comments
1
min read
LW
link
(github.com)
The Quantum Wave Function is Related to a Philosophy Concept
Richard Aragon
Apr 20, 2023, 3:16 AM
−11
points
3
comments
6
min read
LW
link
A poem written by a fancy autocomplete
Christopher King
Apr 20, 2023, 2:31 AM
1
point
0
comments
1
min read
LW
link
List of commonly used benchmarks for LLMs
Diziet
Apr 20, 2023, 2:25 AM
8
points
0
comments
1
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel