Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
Foresight Vision Weekend 2024
Allison Duettmann
Oct 1, 2024, 9:59 PM
8
points
0
comments
1
min read
LW
link
Happy simulations
FateGrinder
Oct 1, 2024, 9:05 PM
−5
points
0
comments
2
min read
LW
link
Three Subtle Examples of Data Leakage
abstractapplic
Oct 1, 2024, 8:45 PM
172
points
16
comments
4
min read
LW
link
AI Safety Newsletter #42: Newsom Vetoes SB 1047 Plus, OpenAI’s o1, and AI Governance Summary
Corin Katzke
,
Corin Katzke
,
Julius
,
Alexa Pan
,
andrewz
and
Dan H
Oct 1, 2024, 8:35 PM
8
points
0
comments
6
min read
LW
link
(newsletter.safe.ai)
Retrieval Augmented Genesis
João Ribeiro Medeiros
Oct 1, 2024, 8:18 PM
6
points
0
comments
29
min read
LW
link
Likelihood calculation with duobels
Martin Gerdes
Oct 1, 2024, 4:21 PM
4
points
0
comments
6
min read
LW
link
Is Text Watermarking a lost cause?
egor.timatkov
Oct 1, 2024, 4:20 PM
17
points
13
comments
10
min read
LW
link
Information dark matter
Logan Kieller
Oct 1, 2024, 3:05 PM
33
points
4
comments
28
min read
LW
link
(logankieller.substack.com)
Conventional footnotes considered harmful
dkl9
Oct 1, 2024, 2:54 PM
25
points
16
comments
1
min read
LW
link
(dkl9.net)
Newsom Vetoes SB 1047
Zvi
Oct 1, 2024, 12:20 PM
84
points
6
comments
32
min read
LW
link
(thezvi.wordpress.com)
Will AI and Humanity Go to War?
Simon Goldstein
Oct 1, 2024, 6:35 AM
9
points
4
comments
6
min read
LW
link
[Question]
AMA: International School Student in China
Novice
Oct 1, 2024, 6:00 AM
5
points
0
comments
1
min read
LW
link
AGI Farm
Rahul Chand
Oct 1, 2024, 4:29 AM
1
point
0
comments
8
min read
LW
link
Why comparative advantage does not help horses
Sherrinford
Sep 30, 2024, 10:27 PM
105
points
15
comments
3
min read
LW
link
Intelligence explosion: a rational assessment.
p4rziv4l
Sep 30, 2024, 9:17 PM
1
point
0
comments
1
min read
LW
link
(docs.google.com)
Peak Human Capital
PeterMcCluskey
Sep 30, 2024, 9:13 PM
70
points
3
comments
5
min read
LW
link
(bayesianinvestor.com)
Sam Altman’s Business Negging
Julian Bradshaw
Sep 30, 2024, 9:06 PM
13
points
0
comments
1
min read
LW
link
(www.bloomberg.com)
In-Context Learning: An Alignment Survey
alamerton
Sep 30, 2024, 6:44 PM
8
points
0
comments
20
min read
LW
link
(docs.google.com)
Not Just For Therapy Chatbots: The Case For Compassion In AI Moral Alignment Research
kenneth_diao
Sep 30, 2024, 6:37 PM
2
points
0
comments
12
min read
LW
link
Exploring Decomposability of SAE Features
Vikram_N
Sep 30, 2024, 6:28 PM
1
point
0
comments
3
min read
LW
link
Knowledge Base 1: Could it increase intelligence and make it safer?
iwis
Sep 30, 2024, 4:00 PM
−4
points
0
comments
4
min read
LW
link
Point of Failure: Semiconductor-Grade Quartz
Annapurna
Sep 30, 2024, 3:57 PM
41
points
8
comments
2
min read
LW
link
(jorgevelez.substack.com)
on bacteria, on teeth
bhauth
Sep 30, 2024, 3:56 PM
62
points
9
comments
6
min read
LW
link
(bhauth.com)
SB 1047 gets vetoed
ryan_b
Sep 30, 2024, 3:49 PM
25
points
1
comment
1
min read
LW
link
(www.reuters.com)
Of Birds and Bees
RussellThor
Sep 30, 2024, 10:52 AM
7
points
9
comments
2
min read
LW
link
A new process for mapping discussions
Nathan Young
Sep 30, 2024, 8:57 AM
29
points
8
comments
6
min read
LW
link
(open.substack.com)
MATS Alumni Impact Analysis
utilistrutil
,
Juan Gil
,
yams
,
LauraVaughan
,
K Richards
and
Ryan Kidd
Sep 30, 2024, 2:35 AM
62
points
7
comments
11
min read
LW
link
[Question]
Most capable publicly available agents?
Gabe
Sep 30, 2024, 12:04 AM
2
points
0
comments
1
min read
LW
link
the case for CoT unfaithfulness is overstated
nostalgebraist
Sep 29, 2024, 10:07 PM
259
points
43
comments
11
min read
LW
link
Pomodoro Method Randomized Self Experiment
niplav
Sep 29, 2024, 9:55 PM
14
points
2
comments
1
min read
LW
link
Toy Models of Superposition: Simplified by Hand
Axel Sorensen
Sep 29, 2024, 9:19 PM
9
points
3
comments
8
min read
LW
link
LLMs are likely not conscious
research_prime_space
Sep 29, 2024, 8:57 PM
6
points
9
comments
1
min read
LW
link
A Policy Proposal
phdead
Sep 29, 2024, 8:45 PM
10
points
4
comments
4
min read
LW
link
Do Sparse Autoencoders (SAEs) transfer across base and finetuned language models?
Taras Kutsyk
,
Tommaso Mencattini
and
Ciprian Florea
Sep 29, 2024, 7:37 PM
26
points
8
comments
25
min read
LW
link
Models of life
Abhishaike Mahajan
Sep 29, 2024, 7:24 PM
8
points
0
comments
16
min read
LW
link
(www.asimov.press)
Interpreting the effects of Jailbreak Prompts in LLMs
Harsh Raj
Sep 29, 2024, 7:01 PM
8
points
0
comments
5
min read
LW
link
New Capabilities, New Risks? - Evaluating Agentic General Assistants using Elements of GAIA & METR Frameworks
Tej Lander
Sep 29, 2024, 6:58 PM
5
points
0
comments
29
min read
LW
link
Developmental Stages in Multi-Problem Grokking
James Sullivan
Sep 29, 2024, 6:58 PM
4
points
0
comments
6
min read
LW
link
A Psychoanalytic Explanation of Sam Altman’s Irrational Actions
Gabe
Sep 29, 2024, 6:58 PM
1
point
3
comments
3
min read
LW
link
Building Safer AI from the Ground Up: Steering Model Behavior via Pre-Training Data Curation
Antonio Clarke
Sep 29, 2024, 6:48 PM
6
points
0
comments
23
min read
LW
link
Cryonics is free
Mati_Roy
Sep 29, 2024, 5:58 PM
198
points
43
comments
2
min read
LW
link
Runner’s High On Demand: A Story of Luck & Persistence
Shoshannah Tekofsky
Sep 29, 2024, 5:15 PM
14
points
6
comments
5
min read
LW
link
(shoshanigans.substack.com)
You can, in fact, bamboozle an unaligned AI into sparing your life
David Matolcsi
Sep 29, 2024, 4:59 PM
112
points
173
comments
27
min read
LW
link
Base LLMs refuse too
Connor Kissane
,
robertzk
,
Arthur Conmy
and
Neel Nanda
29 Sep 2024 16:04 UTC
60
points
20
comments
10
min read
LW
link
My Methodological Turn
adamShimi
29 Sep 2024 15:01 UTC
29
points
0
comments
1
min read
LW
link
(formethods.substack.com)
Linkpost: Hypocrisy standoff
Chris_Leong
29 Sep 2024 14:27 UTC
5
points
1
comment
1
min read
LW
link
(x.com)
[Question]
Any real toeholds for making practical decisions regarding AI safety?
lemonhope
29 Sep 2024 12:03 UTC
27
points
6
comments
1
min read
LW
link
Review: Dr Stone
ProgramCrafter
29 Sep 2024 10:35 UTC
18
points
9
comments
4
min read
LW
link
AXRP Episode 36 - Adam Shai and Paul Riechers on Computational Mechanics
DanielFilan
29 Sep 2024 5:50 UTC
25
points
0
comments
55
min read
LW
link
DunCon @Lighthaven
Duncan Sabien (Inactive)
29 Sep 2024 4:56 UTC
45
points
2
comments
1
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel