Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
2
Sam Altman’s Business Negging
Julian Bradshaw
Sep 30, 2024, 9:06 PM
13
points
0
comments
1
min read
LW
link
(www.bloomberg.com)
In-Context Learning: An Alignment Survey
alamerton
Sep 30, 2024, 6:44 PM
8
points
0
comments
20
min read
LW
link
(docs.google.com)
Not Just For Therapy Chatbots: The Case For Compassion In AI Moral Alignment Research
kenneth_diao
Sep 30, 2024, 6:37 PM
2
points
0
comments
12
min read
LW
link
Exploring Decomposability of SAE Features
Vikram_N
Sep 30, 2024, 6:28 PM
1
point
0
comments
3
min read
LW
link
Knowledge Base 1: Could it increase intelligence and make it safer?
iwis
Sep 30, 2024, 4:00 PM
−4
points
0
comments
4
min read
LW
link
Point of Failure: Semiconductor-Grade Quartz
Annapurna
Sep 30, 2024, 3:57 PM
41
points
8
comments
2
min read
LW
link
(jorgevelez.substack.com)
on bacteria, on teeth
bhauth
Sep 30, 2024, 3:56 PM
62
points
9
comments
6
min read
LW
link
(bhauth.com)
SB 1047 gets vetoed
ryan_b
Sep 30, 2024, 3:49 PM
25
points
1
comment
1
min read
LW
link
(www.reuters.com)
Of Birds and Bees
RussellThor
Sep 30, 2024, 10:52 AM
7
points
9
comments
2
min read
LW
link
A new process for mapping discussions
Nathan Young
Sep 30, 2024, 8:57 AM
29
points
8
comments
6
min read
LW
link
(open.substack.com)
MATS Alumni Impact Analysis
utilistrutil
,
Juan Gil
,
yams
,
LauraVaughan
,
K Richards
and
Ryan Kidd
Sep 30, 2024, 2:35 AM
62
points
7
comments
11
min read
LW
link
[Question]
Most capable publicly available agents?
Gabe
Sep 30, 2024, 12:04 AM
2
points
0
comments
1
min read
LW
link
the case for CoT unfaithfulness is overstated
nostalgebraist
Sep 29, 2024, 10:07 PM
260
points
43
comments
11
min read
LW
link
Pomodoro Method Randomized Self Experiment
niplav
Sep 29, 2024, 9:55 PM
14
points
2
comments
1
min read
LW
link
Toy Models of Superposition: Simplified by Hand
Axel Sorensen
Sep 29, 2024, 9:19 PM
9
points
3
comments
8
min read
LW
link
LLMs are likely not conscious
research_prime_space
Sep 29, 2024, 8:57 PM
6
points
9
comments
1
min read
LW
link
A Policy Proposal
phdead
Sep 29, 2024, 8:45 PM
10
points
4
comments
4
min read
LW
link
Do Sparse Autoencoders (SAEs) transfer across base and finetuned language models?
Taras Kutsyk
,
Tommaso Mencattini
and
Ciprian Florea
Sep 29, 2024, 7:37 PM
26
points
8
comments
25
min read
LW
link
Models of life
Abhishaike Mahajan
Sep 29, 2024, 7:24 PM
8
points
0
comments
16
min read
LW
link
(www.asimov.press)
Interpreting the effects of Jailbreak Prompts in LLMs
Harsh Raj
Sep 29, 2024, 7:01 PM
8
points
0
comments
5
min read
LW
link
New Capabilities, New Risks? - Evaluating Agentic General Assistants using Elements of GAIA & METR Frameworks
Tej Lander
Sep 29, 2024, 6:58 PM
5
points
0
comments
29
min read
LW
link
Developmental Stages in Multi-Problem Grokking
James Sullivan
Sep 29, 2024, 6:58 PM
4
points
0
comments
6
min read
LW
link
A Psychoanalytic Explanation of Sam Altman’s Irrational Actions
Gabe
Sep 29, 2024, 6:58 PM
1
point
3
comments
3
min read
LW
link
Building Safer AI from the Ground Up: Steering Model Behavior via Pre-Training Data Curation
Antonio Clarke
Sep 29, 2024, 6:48 PM
6
points
0
comments
23
min read
LW
link
Cryonics is free
Mati_Roy
Sep 29, 2024, 5:58 PM
198
points
43
comments
2
min read
LW
link
Runner’s High On Demand: A Story of Luck & Persistence
Shoshannah Tekofsky
Sep 29, 2024, 5:15 PM
14
points
6
comments
5
min read
LW
link
(shoshanigans.substack.com)
You can, in fact, bamboozle an unaligned AI into sparing your life
David Matolcsi
Sep 29, 2024, 4:59 PM
112
points
173
comments
27
min read
LW
link
Base LLMs refuse too
Connor Kissane
,
robertzk
,
Arthur Conmy
and
Neel Nanda
Sep 29, 2024, 4:04 PM
60
points
20
comments
10
min read
LW
link
My Methodological Turn
adamShimi
Sep 29, 2024, 3:01 PM
29
points
0
comments
1
min read
LW
link
(formethods.substack.com)
Linkpost: Hypocrisy standoff
Chris_Leong
Sep 29, 2024, 2:27 PM
5
points
1
comment
1
min read
LW
link
(x.com)
[Question]
Any real toeholds for making practical decisions regarding AI safety?
lemonhope
Sep 29, 2024, 12:03 PM
27
points
6
comments
1
min read
LW
link
Review: Dr Stone
ProgramCrafter
Sep 29, 2024, 10:35 AM
18
points
9
comments
4
min read
LW
link
AXRP Episode 36 - Adam Shai and Paul Riechers on Computational Mechanics
DanielFilan
Sep 29, 2024, 5:50 AM
25
points
0
comments
55
min read
LW
link
DunCon @Lighthaven
Duncan Sabien (Inactive)
Sep 29, 2024, 4:56 AM
45
points
2
comments
1
min read
LW
link
Exploring Shard-like Behavior: Empirical Insights into Contextual Decision-Making in RL Agents
Alejandro Aristizabal
Sep 29, 2024, 12:32 AM
6
points
0
comments
15
min read
LW
link
Jailbreaking language models with user roleplay
loops
Sep 28, 2024, 11:43 PM
8
points
0
comments
3
min read
LW
link
(iter.ca)
“Slow” takeoff is a terrible term for “maybe even faster takeoff, actually”
Raemon
Sep 28, 2024, 11:38 PM
217
points
69
comments
1
min read
LW
link
Contextual Constitutional AI
aksh-n
Sep 28, 2024, 11:24 PM
14
points
2
comments
12
min read
LW
link
Explore More: A Bag of Tricks to Keep Your Life on the Rails
Shoshannah Tekofsky
Sep 28, 2024, 9:38 PM
236
points
19
comments
11
min read
LW
link
(shoshanigans.substack.com)
2024 Petrov Day Retrospective
Ben Pace
and
Raemon
Sep 28, 2024, 9:30 PM
93
points
25
comments
10
min read
LW
link
[Question]
Any Trump Supporters Want to Dialogue?
k64
Sep 28, 2024, 7:41 PM
15
points
83
comments
1
min read
LW
link
Evaluating LLaMA 3 for political sycophancy
alma.liezenga
Sep 28, 2024, 7:02 PM
2
points
2
comments
6
min read
LW
link
Two new datasets for evaluating political sycophancy in LLMs
alma.liezenga
Sep 28, 2024, 6:29 PM
9
points
0
comments
9
min read
LW
link
COT Scaling implies slower takeoff speeds
Logan Zoellner
Sep 28, 2024, 4:20 PM
36
points
56
comments
1
min read
LW
link
Thoughts on Evo-Bio Math and Mesa-Optimization: Maybe We Need To Think Harder About “Relative” Fitness?
Lorec
Sep 28, 2024, 2:07 PM
6
points
6
comments
1
min read
LW
link
Steering LLMs’ Behavior with Concept Activation Vectors
Ruixuan Huang
Sep 28, 2024, 9:53 AM
8
points
0
comments
10
min read
LW
link
An Interactive Shapley Value Explainer
James Stephen Brown
Sep 28, 2024, 5:01 AM
42
points
9
comments
1
min read
LW
link
(nonzerosum.games)
[Question]
Implications of China’s recession on AGI development?
Eric Neyman
Sep 28, 2024, 1:12 AM
41
points
3
comments
1
min read
LW
link
The Compute Conundrum: AI Governance in a Shifting Geopolitical Era
octavo
Sep 28, 2024, 1:05 AM
−3
points
1
comment
17
min read
LW
link
‘Chat with impactful research & evaluations’ (Unjournal NotebookLMs)
david reinstein
Sep 28, 2024, 12:32 AM
6
points
0
comments
2
min read
LW
link
Previous
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel