Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
Are AIs like Animals? Perspectives and Strategies from Biology
Jackson Emanuel
May 16, 2023, 11:39 PM
1
point
0
comments
21
min read
LW
link
A Mechanistic Interpretability Analysis of a GridWorld Agent-Simulator (Part 1 of N)
Joseph Bloom
May 16, 2023, 10:59 PM
36
points
2
comments
16
min read
LW
link
A TAI which kills all humans might also doom itself
Jeffrey Heninger
May 16, 2023, 10:36 PM
7
points
3
comments
3
min read
LW
link
Brief notes on the Senate hearing on AI oversight
Diziet
May 16, 2023, 10:29 PM
77
points
2
comments
2
min read
LW
link
$500 Bounty/Prize Problem: Channel Capacity Using “Insensitive” Functions
johnswentworth
May 16, 2023, 9:31 PM
40
points
11
comments
2
min read
LW
link
Progress links and tweets, 2023-05-16
jasoncrawford
May 16, 2023, 8:54 PM
14
points
0
comments
1
min read
LW
link
(rootsofprogress.org)
AI Will Not Want to Self-Improve
petersalib
May 16, 2023, 8:53 PM
28
points
24
comments
20
min read
LW
link
Nice intro video to RSI
Nathan Helm-Burger
May 16, 2023, 6:48 PM
12
points
0
comments
1
min read
LW
link
(youtu.be)
[Interview w/ Zvi Mowshowitz] Should we halt progress in AI?
fowlertm
May 16, 2023, 6:12 PM
18
points
2
comments
3
min read
LW
link
AI Risk & Policy Forecasts from Metaculus & FLI’s AI Pathways Workshop
_will_
May 16, 2023, 6:06 PM
11
points
4
comments
8
min read
LW
link
[Question]
Why doesn’t the presence of log-loss for probabilistic models (e.g. sequence prediction) imply that any utility function capable of producing a “fairly capable” agent will have at least some non-negligible fraction of overlap with human values?
Thoth Hermes
May 16, 2023, 6:02 PM
2
points
0
comments
1
min read
LW
link
Decision Theory with the Magic Parts Highlighted
moridinamael
May 16, 2023, 5:39 PM
175
points
24
comments
5
min read
LW
link
We learn long-lasting strategies to protect ourselves from danger and rejection
Richard_Ngo
May 16, 2023, 4:36 PM
86
points
5
comments
5
min read
LW
link
Proposal: Align Systems Earlier In Training
OneManyNone
May 16, 2023, 4:24 PM
18
points
0
comments
11
min read
LW
link
Procedural Executive Function, Part 2
DaystarEld
May 16, 2023, 4:22 PM
24
points
0
comments
18
min read
LW
link
(daystareld.com)
My current workflow to study the internal mechanisms of LLM
Yulu Pi
May 16, 2023, 3:27 PM
4
points
0
comments
1
min read
LW
link
Proposal: we should start referring to the risk from unaligned AI as a type of *accident risk*
Christopher King
May 16, 2023, 3:18 PM
22
points
6
comments
2
min read
LW
link
AI Safety Newsletter #6: Examples of AI safety progress, Yoshua Bengio proposes a ban on AI agents, and lessons from nuclear arms control
Dan H
and
Orpheus16
May 16, 2023, 3:14 PM
31
points
0
comments
6
min read
LW
link
(newsletter.safe.ai)
Lazy Baked Mac and Cheese
jefftk
May 16, 2023, 2:40 PM
18
points
2
comments
1
min read
LW
link
(www.jefftk.com)
Tyler Cowen’s challenge to develop an ‘actual mathematical model’ for AI X-Risk
Joe Brenton
May 16, 2023, 11:57 AM
6
points
4
comments
1
min read
LW
link
Evaluating Language Model Behaviours for Shutdown Avoidance in Textual Scenarios
Simon Lermen
,
Teun van der Weij
and
Leon Lang
May 16, 2023, 10:53 AM
26
points
0
comments
13
min read
LW
link
[Review] Two People Smoking Behind the Supermarket
lsusr
May 16, 2023, 7:25 AM
32
points
1
comment
1
min read
LW
link
Superposition and Dropout
Edoardo Pona
May 16, 2023, 7:24 AM
21
points
5
comments
6
min read
LW
link
[Question]
What is the literature on long term water fasts?
lc
May 16, 2023, 3:23 AM
16
points
4
comments
1
min read
LW
link
Lessons learned from offering in-office nutritional testing
Elizabeth
May 15, 2023, 11:20 PM
80
points
11
comments
14
min read
LW
link
(acesounderglass.com)
Judgments often smuggle in implicit standards
Richard_Ngo
May 15, 2023, 6:50 PM
95
points
4
comments
3
min read
LW
link
Rational retirement plans
Ik
May 15, 2023, 5:49 PM
5
points
17
comments
1
min read
LW
link
[Question]
(Crosspost) Asking for online calls on AI s-risks discussions
jackchang110
May 15, 2023, 5:42 PM
1
point
0
comments
1
min read
LW
link
(forum.effectivealtruism.org)
Simple experiments with deceptive alignment
Andreas_Moe
May 15, 2023, 5:41 PM
7
points
0
comments
4
min read
LW
link
Some Summaries of Agent Foundations Work
mattmacdermott
May 15, 2023, 4:09 PM
62
points
1
comment
13
min read
LW
link
Facebook Increased Visibility
jefftk
May 15, 2023, 3:40 PM
15
points
1
comment
1
min read
LW
link
(www.jefftk.com)
Un-unpluggability—can’t we just unplug it?
Oliver Sourbut
May 15, 2023, 1:23 PM
26
points
10
comments
12
min read
LW
link
(www.oliversourbut.net)
[Question]
Can we learn much by studying the behaviour of RL policies?
AidanGoth
May 15, 2023, 12:56 PM
1
point
0
comments
1
min read
LW
link
How I apply (so-called) Non-Violent Communication
Kaj_Sotala
May 15, 2023, 9:56 AM
86
points
28
comments
3
min read
LW
link
Let’s build a fire alarm for AGI
chaosmage
May 15, 2023, 9:16 AM
−1
points
0
comments
2
min read
LW
link
From fear to excitement
Richard_Ngo
May 15, 2023, 6:23 AM
132
points
9
comments
3
min read
LW
link
Reward is the optimization target (of capabilities researchers)
Max H
May 15, 2023, 3:22 AM
32
points
4
comments
5
min read
LW
link
The Lightcone Theorem: A Better Foundation For Natural Abstraction?
johnswentworth
May 15, 2023, 2:24 AM
69
points
25
comments
6
min read
LW
link
GovAI: Towards best practices in AGI safety and governance: A survey of expert opinion
Zach Stein-Perlman
May 15, 2023, 1:42 AM
28
points
11
comments
1
min read
LW
link
(arxiv.org)
[Question]
Why don’t quantilizers also cut off the upper end of the distribution?
Alex_Altair
May 15, 2023, 1:40 AM
25
points
2
comments
1
min read
LW
link
Support Structures for Naturalist Study
LoganStrohl
May 15, 2023, 12:25 AM
47
points
6
comments
10
min read
LW
link
Catastrophic Regressional Goodhart: Appendix
Thomas Kwa
and
Drake Thomas
May 15, 2023, 12:10 AM
25
points
1
comment
9
min read
LW
link
Helping your Senator Prepare for the Upcoming Sam Altman Hearing
Tiago de Vassal
May 14, 2023, 10:45 PM
69
points
2
comments
1
min read
LW
link
(aisafetytour.com)
Difficulties in making powerful aligned AI
DanielFilan
May 14, 2023, 8:50 PM
41
points
1
comment
10
min read
LW
link
(danielfilan.com)
How much do markets value Open AI?
Xodarap
May 14, 2023, 7:28 PM
21
points
5
comments
LW
link
Misaligned AGI Death Match
Nate Reinar Windwood
May 14, 2023, 6:00 PM
1
point
0
comments
1
min read
LW
link
[Question]
What new technology, for what institutions?
bhauth
14 May 2023 17:33 UTC
29
points
6
comments
3
min read
LW
link
A strong mind continues its trajectory of creativity
TsviBT
14 May 2023 17:24 UTC
22
points
8
comments
6
min read
LW
link
Ontologies Should Be Backwards-Compatible
Thoth Hermes
14 May 2023 17:21 UTC
3
points
3
comments
4
min read
LW
link
(thothhermes.substack.com)
Jaan Tallinn’s 2022 Philanthropy Overview
jaan
14 May 2023 15:35 UTC
64
points
2
comments
1
min read
LW
link
(jaan.online)
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel