Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
Simple alignment plan that maybe works
Iknownothing
Jul 18, 2023, 10:48 PM
4
points
8
comments
1
min read
LW
link
Prospera-dump
tailcalled
Jul 18, 2023, 9:36 PM
11
points
16
comments
1
min read
LW
link
Tiny Mech Interp Projects: Emergent Positional Embeddings of Words
Neel Nanda
Jul 18, 2023, 9:24 PM
51
points
1
comment
9
min read
LW
link
Quick Thoughts on Language Models
RohanS
Jul 18, 2023, 8:38 PM
6
points
0
comments
4
min read
LW
link
Still no Lie Detector for LLMs
Daniel Herrmann
and
ben_levinstein
Jul 18, 2023, 7:56 PM
50
points
2
comments
21
min read
LW
link
Meta announces Llama 2; “open sources” it for commercial use
LawrenceC
Jul 18, 2023, 7:28 PM
46
points
12
comments
1
min read
LW
link
(about.fb.com)
The Rope Management Theory: A Comprehensive Approach to Modulating Reward Perception and Mitigating Hedonic Adaptation
Eris Discordia
Jul 18, 2023, 5:45 PM
−23
points
2
comments
3
min read
LW
link
AI Impacts Quarterly Newsletter, Apr-Jun 2023
Harlan
and
Richard Korzekwa
Jul 18, 2023, 5:14 PM
6
points
0
comments
3
min read
LW
link
(blog.aiimpacts.org)
Clever arguers give weak evidence, not zero
dkl9
Jul 18, 2023, 5:07 PM
7
points
2
comments
1
min read
LW
link
(dkl9.net)
Measuring and Improving the Faithfulness of Model-Generated Reasoning
Ansh Radhakrishnan
,
tamera
,
karinanguyen
,
Sam Bowman
and
Ethan Perez
Jul 18, 2023, 4:36 PM
111
points
15
comments
6
min read
LW
link
1
review
[Question]
Least-problematic Resource for learning RL?
Dalcy
Jul 18, 2023, 4:30 PM
9
points
7
comments
1
min read
LW
link
Charter Cities: why they’re exciting & how they might work
Jackson Wagner
Jul 18, 2023, 1:57 PM
21
points
7
comments
LW
link
Narrative Theory. Part 6. Artificial Neural Networks
Eris
Jul 18, 2023, 9:22 AM
3
points
0
comments
2
min read
LW
link
Train for incorrigibility, then reverse it (Shutdown Problem Contest Submission)
Daniel_Eth
Jul 18, 2023, 8:26 AM
9
points
1
comment
LW
link
The shape of AGI: Cartoons and back of envelope
boazbarak
Jul 17, 2023, 8:57 PM
33
points
19
comments
6
min read
LW
link
1
review
Predictive history classes
dkl9
Jul 17, 2023, 8:48 PM
68
points
17
comments
2
min read
LW
link
(dkl9.net)
Highlights from The Industrial Revolution, by T. S. Ashton
jasoncrawford
Jul 17, 2023, 7:02 PM
17
points
0
comments
10
min read
LW
link
(rootsofprogress.org)
Existential Risk Persuasion Tournament
PeterMcCluskey
Jul 17, 2023, 6:04 PM
73
points
1
comment
8
min read
LW
link
(bayesianinvestor.com)
[Interview w/ Rob Miles] The case for taking AI Safety seriously
fowlertm
Jul 17, 2023, 5:08 PM
17
points
1
comment
1
min read
LW
link
Announcing the Existential InfoSec Forum
calebp99
Jul 17, 2023, 5:05 PM
10
points
0
comments
2
min read
LW
link
Narrative Theory. Part 4. Neural Darwinism
Eris
Jul 17, 2023, 4:45 PM
3
points
0
comments
2
min read
LW
link
Sapient Algorithms
Valentine
Jul 17, 2023, 4:30 PM
83
points
15
comments
5
min read
LW
link
AI safety technical research—Career review
Benjamin Hilton
Jul 17, 2023, 3:34 PM
14
points
0
comments
LW
link
[Question]
Conditional on living in a AI safety/alignment by default universe, what are the implications of this assumption being true?
Noosphere89
Jul 17, 2023, 2:44 PM
26
points
10
comments
1
min read
LW
link
Thoughts on “Process-Based Supervision”
Steven Byrnes
Jul 17, 2023, 2:08 PM
74
points
4
comments
23
min read
LW
link
Proof of posteriority: a defense against AI-generated misinformation
jchan
Jul 17, 2023, 12:04 PM
33
points
3
comments
5
min read
LW
link
An Overview of AI risks—the Flyer
Charbel-Raphaël
,
Jonathan Claybrough
and
tchauvin
Jul 17, 2023, 12:03 PM
20
points
0
comments
1
min read
LW
link
(docs.google.com)
[Question]
Build knowledge base first, or backchain?
Nicholas / Heather Kross
Jul 17, 2023, 3:44 AM
11
points
5
comments
1
min read
LW
link
A fictional AI law laced w/ alignment theory
MiguelDev
Jul 17, 2023, 1:42 AM
6
points
0
comments
2
min read
LW
link
AutoInterpretation Finds Sparse Coding Beats Alternatives
Hoagy
Jul 17, 2023, 1:41 AM
57
points
1
comment
7
min read
LW
link
An upcoming US Supreme Court case may impede AI governance efforts
NickGabs
Jul 16, 2023, 11:51 PM
57
points
17
comments
2
min read
LW
link
Weak Evidence is Common
dkl9
Jul 16, 2023, 11:37 PM
7
points
5
comments
1
min read
LW
link
(dkl9.net)
Even briefer summary of ai-plans.com
Iknownothing
Jul 16, 2023, 11:25 PM
10
points
6
comments
2
min read
LW
link
(www.ai-plans.com)
Mech Interp Puzzle 1: Suspiciously Similar Embeddings in GPT-Neo
Neel Nanda
Jul 16, 2023, 10:02 PM
67
points
15
comments
1
min read
LW
link
A Technology of Everything – Part 1: A Magical Science Experiment
aiuisensei
Jul 16, 2023, 10:01 PM
−3
points
0
comments
7
min read
LW
link
(www.aiui.cloud)
AI, Consciousness, and the problem of Moral Considerability
stultus
Jul 16, 2023, 7:56 PM
1
point
0
comments
2
min read
LW
link
Narrative Theory. Part 3. Simplest to succeed
Eris
Jul 16, 2023, 2:41 PM
4
points
0
comments
1
min read
LW
link
Runaway Optimizers in Mind Space
silentbob
Jul 16, 2023, 2:26 PM
16
points
0
comments
12
min read
LW
link
[Question]
Is Adam Elga’s proof for thirdism in Sleeping Beauty still considered to be sound?
Ape in the coat
Jul 16, 2023, 2:11 PM
8
points
25
comments
1
min read
LW
link
A simple way of exploiting AI’s coming economic impact may be highly-impactful
kuira
Jul 16, 2023, 9:33 AM
11
points
2
comments
2
min read
LW
link
Activation adding experiments with llama-7b
Nina Panickssery
Jul 16, 2023, 4:17 AM
51
points
1
comment
3
min read
LW
link
Introducción al Riesgo Existencial de Inteligencia Artificial
david.friva
Jul 15, 2023, 8:37 PM
4
points
2
comments
4
min read
LW
link
(youtu.be)
The housing crisis, explained using game theory
Johnstone
Jul 15, 2023, 8:27 PM
4
points
2
comments
8
min read
LW
link
Only a hack can solve the shutdown problem
dp
Jul 15, 2023, 8:26 PM
5
points
0
comments
8
min read
LW
link
Robustness of Model-Graded Evaluations and Automated Interpretability
Simon Lermen
and
viluon
Jul 15, 2023, 7:12 PM
47
points
5
comments
9
min read
LW
link
[Question]
How to deal with fear of failure?
TeaTieAndHat
Jul 15, 2023, 6:57 PM
1
point
2
comments
1
min read
LW
link
Simplified bio-anchors for upper bounds on AI timelines
Fabien Roger
Jul 15, 2023, 6:15 PM
21
points
4
comments
5
min read
LW
link
A Hill of Validity in Defense of Meaning
Zack_M_Davis
Jul 15, 2023, 5:57 PM
25
points
120
comments
73
min read
LW
link
1
review
(unremediatedgender.space)
What is a cognitive bias?
Lionel
Jul 15, 2023, 1:01 PM
1
point
0
comments
2
min read
LW
link
(lionelpage.substack.com)
[Question]
When people say robots will steal jobs, what kinds of jobs are never implied?
Mary Chernyshenko
Jul 15, 2023, 10:50 AM
5
points
12
comments
1
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel