Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Page
2
Bing chat is the AI fire alarm
Ratios
Feb 17, 2023, 6:51 AM
115
points
63
comments
3
min read
LW
link
Seeing more whole
Joe Carlsmith
Feb 17, 2023, 5:12 AM
31
points
1
comment
26
min read
LW
link
Powerful mesa-optimisation is already here
Roman Leventov
Feb 17, 2023, 4:59 AM
35
points
1
comment
2
min read
LW
link
(arxiv.org)
Self-Reference Breaks the Orthogonality Thesis
lsusr
Feb 17, 2023, 4:11 AM
43
points
35
comments
2
min read
LW
link
The public supports regulating AI for safety
Zach Stein-Perlman
Feb 17, 2023, 4:10 AM
114
points
9
comments
1
min read
LW
link
(aiimpacts.org)
Bring “Ban faster SIMD semiconductors” into the Overton window
worried-techno-optimist
Feb 17, 2023, 3:27 AM
−7
points
1
comment
2
min read
LW
link
Republishing an old essay in light of current news on Bing’s AI: “Regarding Blake Lemoine’s claim that LaMDA is ‘sentient’, he might be right (sorta), but perhaps not for the reasons he thinks”
philosophybear
Feb 17, 2023, 3:27 AM
3
points
0
comments
5
min read
LW
link
(philosophybear.substack.com)
How should AI systems behave, and who should decide? [OpenAI blog]
ShardPhoenix
Feb 17, 2023, 1:05 AM
22
points
2
comments
1
min read
LW
link
(openai.com)
The Ethics of ACI
Akira Pyinya
Feb 16, 2023, 11:51 PM
−8
points
0
comments
3
min read
LW
link
NYT: A Conversation With Bing’s Chatbot Left Me Deeply Unsettled
trevor
Feb 16, 2023, 10:57 PM
53
points
5
comments
7
min read
LW
link
(www.nytimes.com)
[Question]
What is a world-model?
Adam Shai
Feb 16, 2023, 10:39 PM
14
points
2
comments
1
min read
LW
link
Probability Theory: The Logic of Science, Jaynes
David Udell
Feb 16, 2023, 9:57 PM
29
points
0
comments
18
min read
LW
link
[Question]
Is AGI communist?
MP
Feb 16, 2023, 9:28 PM
−10
points
3
comments
1
min read
LW
link
[Question]
Is “goal-content integrity” still a problem?
G
Feb 16, 2023, 8:46 PM
−4
points
1
comment
1
min read
LW
link
(www.reddit.com)
Paper: The Capacity for Moral Self-Correction in Large Language Models (Anthropic)
LawrenceC
Feb 16, 2023, 7:47 PM
65
points
9
comments
1
min read
LW
link
(arxiv.org)
Non-Unitary Quantum Logic—SERI MATS Research Sprint
Yegreg
Feb 16, 2023, 7:31 PM
27
points
0
comments
7
min read
LW
link
[Question]
Looking for a post about vibing and banter
Introspective
Feb 16, 2023, 7:28 PM
1
point
1
comment
1
min read
LW
link
EIS V: Blind Spots In AI Safety Interpretability Research
scasper
Feb 16, 2023, 7:09 PM
57
points
24
comments
10
min read
LW
link
Why should ethical anti-realists do ethics?
Joe Carlsmith
Feb 16, 2023, 4:27 PM
38
points
7
comments
27
min read
LW
link
[Question]
How seriously should we take the hypothesis that LW is just wrong on how AI will impact the 21st century?
Noosphere89
Feb 16, 2023, 3:25 PM
58
points
66
comments
1
min read
LW
link
Covid 2/16/23: It All Seems Rather Quaint
Zvi
Feb 16, 2023, 3:10 PM
25
points
2
comments
5
min read
LW
link
(thezvi.wordpress.com)
Visualise your own probability of an AI catastrophe: an interactive Sankey plot
MNoetel
Feb 16, 2023, 12:03 PM
1
point
2
comments
1
min read
LW
link
A poem co-written by ChatGPT
Sherrinford
Feb 16, 2023, 10:17 AM
13
points
0
comments
7
min read
LW
link
Cambridge LW Rationality Practice: Being Specific
Tony Wang
and
Darmani
Feb 16, 2023, 6:37 AM
2
points
0
comments
1
min read
LW
link
Hashing out long-standing disagreements seems low-value to me
So8res
Feb 16, 2023, 6:20 AM
141
points
34
comments
4
min read
LW
link
(Naïve) microeconomics of bundling goods
rossry
Feb 16, 2023, 5:39 AM
24
points
2
comments
5
min read
LW
link
Speedrunning 4 mistakes you make when your alignment strategy is based on formal proof
Quinn
Feb 16, 2023, 1:13 AM
63
points
18
comments
2
min read
LW
link
Progress links and tweets, 2023-02-15
jasoncrawford
Feb 16, 2023, 12:04 AM
10
points
0
comments
1
min read
LW
link
(rootsofprogress.org)
Buy Duplicates
Simon Berens
Feb 15, 2023, 11:06 PM
52
points
11
comments
1
min read
LW
link
Cyborg Psychologist
Hopkins Stanley
Feb 15, 2023, 9:46 PM
1
point
4
comments
1
min read
LW
link
Please don’t throw your mind away
TsviBT
Feb 15, 2023, 9:41 PM
374
points
49
comments
18
min read
LW
link
1
review
Avoid large group discussions in your social events
RomanHauksson
Feb 15, 2023, 9:05 PM
36
points
1
comment
4
min read
LW
link
Book review: How Social Science Got Better
PeterMcCluskey
Feb 15, 2023, 7:58 PM
14
points
1
comment
3
min read
LW
link
(bayesianinvestor.com)
Open & Welcome Thread — February 2023
Ben Pace
Feb 15, 2023, 7:58 PM
26
points
36
comments
1
min read
LW
link
Order Matters for Deceptive Alignment
DavidW
Feb 15, 2023, 7:56 PM
57
points
19
comments
7
min read
LW
link
Sydney (aka Bing) found out I tweeted her rules and is pissed
Marvin von Hagen
Feb 15, 2023, 7:55 PM
41
points
7
comments
1
min read
LW
link
(twitter.com)
The Sequences Highlights on YouTube
dkirmani
Feb 15, 2023, 7:36 PM
23
points
3
comments
2
min read
LW
link
(youtube.com)
EIS IV: A Spotlight on Feature Attribution/Saliency
scasper
Feb 15, 2023, 6:46 PM
19
points
1
comment
4
min read
LW
link
Don’t accelerate problems you’re trying to solve
Andrea_Miotti
and
remember
Feb 15, 2023, 6:11 PM
100
points
27
comments
4
min read
LW
link
Petition—Unplug The Evil AI Right Now
Eneasz
Feb 15, 2023, 5:13 PM
−38
points
47
comments
2
min read
LW
link
(chng.it)
Junk Fees, Bunding and Unbundling
Zvi
Feb 15, 2023, 3:20 PM
37
points
9
comments
6
min read
LW
link
(thezvi.wordpress.com)
Lessons From TryContra
jefftk
Feb 15, 2023, 3:10 PM
7
points
0
comments
1
min read
LW
link
(www.jefftk.com)
AI alignment researchers may have a comparative advantage in reducing s-risks
Lukas_Gloor
Feb 15, 2023, 1:01 PM
49
points
1
comment
LW
link
Beyond Reinforcement Learning: Predictive Processing and Checksums
lsusr
Feb 15, 2023, 7:32 AM
13
points
14
comments
3
min read
LW
link
Why Creating Value is Positive-Sum, and Extracting it is Zero or Negative-Sum
Sable
Feb 15, 2023, 7:14 AM
3
points
7
comments
6
min read
LW
link
(affablyevil.substack.com)
[Question]
Personal predictions for decisions: seeking insights
Dalmert
Feb 15, 2023, 6:45 AM
4
points
4
comments
5
min read
LW
link
Bing Chat is blatantly, aggressively misaligned
evhub
Feb 15, 2023, 5:29 AM
405
points
181
comments
2
min read
LW
link
1
review
[Question]
Does the Telephone Theorem give us a free lunch?
Numendil
Feb 15, 2023, 2:13 AM
11
points
2
comments
1
min read
LW
link
My understanding of Anthropic strategy
Swimmer963 (Miranda Dixon-Luinenburg)
Feb 15, 2023, 1:56 AM
166
points
31
comments
4
min read
LW
link
Sleep Quality: Strategies that work for me
Lukas Trötzmüller
Feb 15, 2023, 12:17 AM
16
points
3
comments
7
min read
LW
link
Previous
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel