Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
AXRP Episode 35 - Peter Hase on LLM Beliefs and Easy-to-Hard Generalization
DanielFilan
Aug 24, 2024, 10:30 PM
21
points
0
comments
74
min read
LW
link
The top 30 books to expand the capabilities of AI: a biased reading list
Jonathan Mugan
Aug 24, 2024, 9:48 PM
−6
points
0
comments
16
min read
LW
link
The Ap Distribution
criticalpoints
Aug 24, 2024, 9:45 PM
22
points
8
comments
3
min read
LW
link
(eregis.github.io)
What is it to solve the alignment problem? (Notes)
Joe Carlsmith
Aug 24, 2024, 9:19 PM
69
points
18
comments
53
min read
LW
link
Examine self modification as an intuition provider for the concept of consciousness
Canaletto
Aug 24, 2024, 8:48 PM
−4
points
2
comments
10
min read
LW
link
[Question]
Looking to interview AI Safety researchers for a book
jeffreycaruso
Aug 24, 2024, 7:57 PM
14
points
0
comments
1
min read
LW
link
Perplexity wins my AI race
Elizabeth
Aug 24, 2024, 7:20 PM
107
points
12
comments
10
min read
LW
link
(acesounderglass.com)
Why should anyone boot *you* up?
onur
Aug 24, 2024, 5:51 PM
−1
points
5
comments
3
min read
LW
link
(solmaz.io)
Understanding Hidden Computations in Chain-of-Thought Reasoning
rokosbasilisk
Aug 24, 2024, 4:35 PM
6
points
1
comment
1
min read
LW
link
August 2024 Time Tracking
jefftk
Aug 24, 2024, 1:50 PM
22
points
0
comments
3
min read
LW
link
(www.jefftk.com)
Training a Sparse Autoencoder in < 30 minutes on 16GB of VRAM using an S3 cache
Louka Ewington-Pitsos
Aug 24, 2024, 7:39 AM
17
points
0
comments
5
min read
LW
link
[Question]
Looking for intuitions to extend bargaining notions
ProgramCrafter
Aug 24, 2024, 5:00 AM
13
points
0
comments
1
min read
LW
link
Owain Evans on Situational Awareness and Out-of-Context Reasoning in LLMs
Michaël Trazzi
Aug 24, 2024, 4:30 AM
55
points
0
comments
5
min read
LW
link
[Question]
Developing Positive Habits through Video Games
pzas
Aug 24, 2024, 3:47 AM
1
point
5
comments
1
min read
LW
link
“Can AI Scaling Continue Through 2030?”, Epoch AI (yes)
gwern
Aug 24, 2024, 1:40 AM
135
points
4
comments
3
min read
LW
link
(epochai.org)
What’s important in “AI for epistemics”?
Lukas Finnveden
Aug 24, 2024, 1:27 AM
48
points
0
comments
28
min read
LW
link
(www.forethought.org)
Showing SAE Latents Are Not Atomic Using Meta-SAEs
Bart Bussmann
,
Michael Pearce
,
Patrick Leask
,
Joseph Bloom
,
Lee Sharkey
and
Neel Nanda
Aug 24, 2024, 12:56 AM
68
points
10
comments
20
min read
LW
link
Using ideologically-charged language to get gpt-3.5-turbo to disobey it’s system prompt: a demo
Milan W
Aug 24, 2024, 12:13 AM
3
points
0
comments
6
min read
LW
link
Crafting Polysemantic Transformer Benchmarks with Known Circuits
Evan Anders
and
Adrià Garriga-alonso
Aug 23, 2024, 10:03 PM
17
points
0
comments
25
min read
LW
link
[Question]
What is an appropriate sample size when surveying billions of data points?
Blake
Aug 23, 2024, 9:54 PM
1
point
2
comments
1
min read
LW
link
Interpretability as Compression: Reconsidering SAE Explanations of Neural Activations with MDL-SAEs
Kola Ayonrinde
,
Michael Pearce
and
Lee Sharkey
Aug 23, 2024, 6:52 PM
42
points
8
comments
16
min read
LW
link
How I started believing religion might actually matter for rationality and moral philosophy
zhukeepa
Aug 23, 2024, 5:40 PM
129
points
41
comments
7
min read
LW
link
[Question]
What do you expect AI capabilities may look like in 2028?
nonzerosum
Aug 23, 2024, 4:59 PM
9
points
5
comments
1
min read
LW
link
Invitation to lead a project at AI Safety Camp (Virtual Edition, 2025)
Linda Linsefors
,
Remmelt Ellen
and
Robert Kralisch
Aug 23, 2024, 2:18 PM
17
points
2
comments
4
min read
LW
link
If we solve alignment, do we die anyway?
Seth Herd
Aug 23, 2024, 1:13 PM
84
points
130
comments
4
min read
LW
link
What’s going on with Per-Component Weight Updates?
4gate
Aug 22, 2024, 9:22 PM
1
point
0
comments
6
min read
LW
link
Interoperable High Level Structures: Early Thoughts on Adjectives
johnswentworth
and
David Lorell
Aug 22, 2024, 9:12 PM
49
points
1
comment
7
min read
LW
link
Interest poll: A time-waster blocker for desktop Linux programs
nahoj
Aug 22, 2024, 8:44 PM
4
points
5
comments
1
min read
LW
link
Turning 22 in the Pre-Apocalypse
testingthewaters
Aug 22, 2024, 8:28 PM
38
points
14
comments
24
min read
LW
link
(utilityhotbar.github.io)
A Robust Natural Latent Over A Mixed Distribution Is Natural Over The Distributions Which Were Mixed
johnswentworth
and
David Lorell
Aug 22, 2024, 7:19 PM
42
points
4
comments
4
min read
LW
link
what becoming more secure did for me
Chris Lakin
Aug 22, 2024, 5:44 PM
26
points
5
comments
2
min read
LW
link
(chrislakin.blog)
A primer on the current state of longevity research
Abhishaike Mahajan
Aug 22, 2024, 5:14 PM
109
points
6
comments
14
min read
LW
link
(www.owlposting.com)
Some reasons to start a project to stop harmful AI
Remmelt
Aug 22, 2024, 4:23 PM
5
points
0
comments
2
min read
LW
link
The economics of space tethers
harsimony
Aug 22, 2024, 4:15 PM
67
points
22
comments
7
min read
LW
link
(splittinginfinity.substack.com)
Dima’s Shortform
Dmitrii Krasheninnikov
Aug 22, 2024, 2:49 PM
3
points
0
comments
1
min read
LW
link
AI #78: Some Welcome Calm
Zvi
Aug 22, 2024, 2:20 PM
61
points
15
comments
33
min read
LW
link
(thezvi.wordpress.com)
[Question]
How do we know dreams aren’t real?
Logan Zoellner
Aug 22, 2024, 12:41 PM
5
points
31
comments
1
min read
LW
link
Measuring Structure Development in Algorithmic Transformers
Micurie
and
Einar Urdshals
Aug 22, 2024, 8:38 AM
56
points
4
comments
11
min read
LW
link
Deception and Jailbreak Sequence: 1. Iterative Refinement Stages of Deception in LLMs
Winnie Yang
and
Jojo Yang
Aug 22, 2024, 7:32 AM
23
points
1
comment
21
min read
LW
link
Just because an LLM said it doesn’t mean it’s true: an illustrative example
dirk
Aug 21, 2024, 9:05 PM
26
points
12
comments
3
min read
LW
link
[Question]
How do you finish your tasks faster?
Cipolla
Aug 21, 2024, 8:01 PM
4
points
2
comments
1
min read
LW
link
AI Safety Newsletter #40: California AI Legislation Plus, NVIDIA Delays Chip Production, and Do AI Safety Benchmarks Actually Measure Safety?
Corin Katzke
,
Julius
,
Alexa Pan
and
Dan H
Aug 21, 2024, 6:09 PM
11
points
0
comments
6
min read
LW
link
(newsletter.safe.ai)
[Question]
Should LW suggest standard metaprompts?
Dagon
Aug 21, 2024, 4:41 PM
3
points
6
comments
1
min read
LW
link
Eternal Existence and Eternal Boredom: The Case for AI and Immortal Humans
Tuan Tu Nguyen
Aug 21, 2024, 9:58 AM
−12
points
2
comments
5
min read
LW
link
Please do not use AI to write for you
Richard_Kennaway
Aug 21, 2024, 9:53 AM
69
points
34
comments
4
min read
LW
link
Apply to Aether—Independent LLM Agent Safety Research Group
RohanS
Aug 21, 2024, 9:47 AM
10
points
0
comments
7
min read
LW
link
(forum.effectivealtruism.org)
the Giga Press was a mistake
bhauth
Aug 21, 2024, 4:51 AM
99
points
26
comments
5
min read
LW
link
(bhauth.com)
Exploring the Boundaries of Cognitohazards and the Nature of Reality
Victor Novikov
21 Aug 2024 3:42 UTC
−2
points
2
comments
1
min read
LW
link
[Question]
What is the point of 2v2 debates?
Axel Ahlqvist
20 Aug 2024 21:59 UTC
2
points
1
comment
1
min read
LW
link
[Question]
Where should I look for information on gut health?
FinalFormal2
20 Aug 2024 19:44 UTC
10
points
10
comments
1
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel