Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
2
What are the flaws in this AGI argument?
William the Kiwi
Aug 11, 2023, 11:31 AM
5
points
14
comments
1
min read
LW
link
Google DeepMind’s RT-2
SandXbox
Aug 11, 2023, 11:26 AM
9
points
1
comment
1
min read
LW
link
(robotics-transformer2.github.io)
Linkpost: We need another Expert Survey on Progress in AI, urgently
David Mears
Aug 11, 2023, 8:22 AM
25
points
2
comments
2
min read
LW
link
(open.substack.com)
What Does a Marginal Grant at LTFF Look Like? Funding Priorities and Grantmaking Thresholds at the Long-Term Future Fund
Linch
,
calebp99
and
Daniel_Eth
Aug 11, 2023, 3:59 AM
64
points
0
comments
1
min read
LW
link
(forum.effectivealtruism.org)
[Question]
Will posting any thread on LW guarantee that a LLM will index all my content, and if questions people ask to the LLM after my name will surface up all my LW content?
Alex K. Chen (parrot)
Aug 11, 2023, 1:40 AM
0
points
0
comments
1
min read
LW
link
AI Safety Concepts Writeup: WebGPT
JustisMills
Aug 11, 2023, 1:35 AM
9
points
1
comment
7
min read
LW
link
[Question]
What is science?
Adam Zerner
Aug 11, 2023, 12:00 AM
6
points
4
comments
1
min read
LW
link
Three configurable prettyprinters
philh
Aug 10, 2023, 11:10 PM
9
points
0
comments
22
min read
LW
link
(reasonableapproximation.net)
Ilya Sutskever’s thoughts on AI safety (July 2023): a transcript with my comments
mishka
Aug 10, 2023, 7:07 PM
21
points
3
comments
5
min read
LW
link
Seeking Input to AI Safety Book for non-technical audience
Darren McKee
Aug 10, 2023, 5:58 PM
10
points
4
comments
1
min read
LW
link
Evaluating GPT-4 Theory of Mind Capabilities
gcmac
and
Nathan
Aug 10, 2023, 5:57 PM
15
points
2
comments
14
min read
LW
link
Some alignment ideas
SelonNerias
Aug 10, 2023, 5:51 PM
1
point
0
comments
11
min read
LW
link
Self Supervised Learning (SSL)
Varshul Gupta
Aug 10, 2023, 5:43 PM
5
points
1
comment
2
min read
LW
link
(dubverseblack.substack.com)
Predicting Virus Relative Abundance in Wastewater
jefftk
Aug 10, 2023, 3:46 PM
33
points
2
comments
1
min read
LW
link
(naobservatory.org)
AI #24: Week of the Podcast
Zvi
Aug 10, 2023, 3:00 PM
49
points
5
comments
44
min read
LW
link
(thezvi.wordpress.com)
Could We Automate AI Alignment Research?
Stephen McAleese
Aug 10, 2023, 12:17 PM
34
points
10
comments
21
min read
LW
link
The positional embedding matrix and previous-token heads: how do they actually work?
AdamYedidia
Aug 10, 2023, 1:58 AM
27
points
4
comments
13
min read
LW
link
LLMs are (mostly) not helped by filler tokens
Kshitij Sachan
Aug 10, 2023, 12:48 AM
66
points
36
comments
6
min read
LW
link
2023 ACX Meetups Everywhere—Newton, MA
duck_master
Aug 9, 2023, 10:47 PM
6
points
2
comments
1
min read
LW
link
Progress links digest, 2023-08-09: US adds new nuclear, Katalin Karikó interview, and more
jasoncrawford
Aug 9, 2023, 7:22 PM
18
points
0
comments
3
min read
LW
link
(rootsofprogress.org)
Mech Interp Challenge: August—Deciphering the First Unique Character Model
CallumMcDougall
Aug 9, 2023, 7:14 PM
36
points
1
comment
3
min read
LW
link
Real Meaning of life has been found. Eliezer discovered it in 2000′s.
Jorterder
Aug 9, 2023, 6:13 PM
−15
points
1
comment
1
min read
LW
link
(docs.google.com)
Marginal Revolution unofficial birthday party
Derek M. Jones
Aug 9, 2023, 2:35 PM
4
points
0
comments
1
min read
LW
link
A content analysis of the SQ-R questionnaire and a proposal for testing EQ-SQ theory
tailcalled
Aug 9, 2023, 1:51 PM
10
points
2
comments
13
min read
LW
link
[Question]
Does LessWrong allow exempting posts from being scraped by GPTBot?
mic
Aug 9, 2023, 1:02 PM
29
points
3
comments
1
min read
LW
link
If I Was An Eccentric Trillionaire
niplav
Aug 9, 2023, 7:56 AM
9
points
8
comments
26
min read
LW
link
Modulating sycophancy in an RLHF model via activation steering
Nina Panickssery
Aug 9, 2023, 7:06 AM
69
points
20
comments
12
min read
LW
link
Open Thread—August 2023
habryka
Aug 9, 2023, 3:52 AM
18
points
49
comments
1
min read
LW
link
marine cloud brightening
bhauth
Aug 9, 2023, 2:50 AM
40
points
14
comments
3
min read
LW
link
(www.bhauth.com)
Inflection.ai is a major AGI lab
Nikola Jurkovic
Aug 9, 2023, 1:05 AM
137
points
13
comments
2
min read
LW
link
Acausal Now: We could totally acausally bargain with aliens at our current tech level if desired
Christopher King
Aug 9, 2023, 12:50 AM
1
point
5
comments
4
min read
LW
link
Necromancy’s unintended consequences.
Christopher King
Aug 9, 2023, 12:08 AM
−6
points
2
comments
2
min read
LW
link
What’s A “Market”?
johnswentworth
Aug 8, 2023, 11:29 PM
94
points
16
comments
10
min read
LW
link
Podcast (+transcript): Nathan Barnard on how US financial regulation can inform AI governance
Aaron Bergman
Aug 8, 2023, 9:46 PM
8
points
0
comments
23
min read
LW
link
(www.aaronbergman.net)
What are the flaws in this argument about p(Doom)?
William the Kiwi
Aug 8, 2023, 8:34 PM
−2
points
26
comments
1
min read
LW
link
A Simple Theory Of Consciousness
SherlockHolmes
Aug 8, 2023, 6:05 PM
2
points
5
comments
1
min read
LW
link
(peterholmes.medium.com)
[Linkpost] Rationally awake
jpc
Aug 8, 2023, 5:59 PM
−1
points
0
comments
4
min read
LW
link
(jpc.dev)
Yet more UFO Betting: Put Up or Shut Up
MoreRatsWrongReUAP
Aug 8, 2023, 5:50 PM
10
points
18
comments
1
min read
LW
link
AISN #18: Challenges of Reinforcement Learning from Human Feedback, Microsoft’s Security Breach, and Conceptual Research on AI Safety
Dan H
Aug 8, 2023, 3:52 PM
13
points
0
comments
5
min read
LW
link
(newsletter.safe.ai)
[Question]
Beginner’s question about RLHF
FTPickle
Aug 8, 2023, 3:48 PM
1
point
3
comments
1
min read
LW
link
My Trial Period as an Independent Alignment Researcher
Bart Bussmann
Aug 8, 2023, 2:16 PM
34
points
1
comment
3
min read
LW
link
4 types of AGI selection, and how to constrain them
Remmelt
Aug 8, 2023, 10:02 AM
−4
points
3
comments
3
min read
LW
link
Notice your everything
metachirality
Aug 8, 2023, 2:38 AM
15
points
1
comment
2
min read
LW
link
Model Organisms of Misalignment: The Case for a New Pillar of Alignment Research
evhub
,
Nicholas Schiefer
,
Carson Denison
and
Ethan Perez
Aug 8, 2023, 1:30 AM
319
points
30
comments
18
min read
LW
link
1
review
Perpetually Declining Population?
jefftk
Aug 8, 2023, 1:30 AM
48
points
29
comments
3
min read
LW
link
(www.jefftk.com)
[Question]
How do I find all the items on LW that I’ve *favorited* or upvoted?
Alex K. Chen (parrot)
Aug 7, 2023, 11:51 PM
14
points
3
comments
1
min read
LW
link
A plea for more funding shortfall transparency
porby
Aug 7, 2023, 9:33 PM
73
points
4
comments
2
min read
LW
link
[Question]
Tips for reducing thinking branching factor
Simon Berens
Aug 7, 2023, 8:21 PM
4
points
6
comments
1
min read
LW
link
An interactive introduction to grokking and mechanistic interpretability
Adam Pearce
and
Asma Ghandeharioun
Aug 7, 2023, 7:09 PM
23
points
3
comments
1
min read
LW
link
(pair.withgoogle.com)
Feedbackloop-first Rationality
Raemon
Aug 7, 2023, 5:58 PM
205
points
69
comments
8
min read
LW
link
2
reviews
Previous
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel