Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Anthropic (org)
Tag
Last edit:
25 Dec 2021 4:12 UTC
by
Multicore
Anthropic
is an AI organization.
Not to be confused with
anthropics
.
Relevant
New
Old
Anthropic AI made the right call
bhauth
15 Apr 2024 0:39 UTC
30
points
19
comments
1
min read
LW
link
OMMC Announces RIP
Adam Scholl
and
aysja
1 Apr 2024 23:20 UTC
178
points
5
comments
2
min read
LW
link
On Anthropic’s Sleeper Agents Paper
Zvi
17 Jan 2024 16:10 UTC
54
points
5
comments
36
min read
LW
link
(thezvi.wordpress.com)
Introducing Alignment Stress-Testing at Anthropic
evhub
12 Jan 2024 23:51 UTC
179
points
23
comments
2
min read
LW
link
Scalable And Transferable Black-Box Jailbreaks For Language Models Via Persona Modulation
Soroush Pour
,
rusheb
,
Quentin FEUILLADE--MONTIXI
,
Arush
and
scasper
7 Nov 2023 17:59 UTC
36
points
2
comments
2
min read
LW
link
(arxiv.org)
Dario Amodei’s prepared remarks from the UK AI Safety Summit, on Anthropic’s Responsible Scaling Policy
Zac Hatfield-Dodds
1 Nov 2023 18:10 UTC
85
points
1
comment
4
min read
LW
link
(www.anthropic.com)
Vaniver’s thoughts on Anthropic’s RSP
Vaniver
28 Oct 2023 21:06 UTC
46
points
4
comments
3
min read
LW
link
Comparing Anthropic’s Dictionary Learning to Ours
Robert_AIZI
7 Oct 2023 23:30 UTC
136
points
8
comments
4
min read
LW
link
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
Zac Hatfield-Dodds
5 Oct 2023 21:01 UTC
286
points
19
comments
2
min read
LW
link
(transformer-circuits.pub)
Amazon to invest up to $4 billion in Anthropic
Davis_Kingsley
25 Sep 2023 14:55 UTC
44
points
8
comments
1
min read
LW
link
(twitter.com)
Anthropic’s Responsible Scaling Policy & Long-Term Benefit Trust
Zac Hatfield-Dodds
19 Sep 2023 15:09 UTC
90
points
23
comments
3
min read
LW
link
(www.anthropic.com)
AI Awareness through Interaction with Blatantly Alien Models
VojtaKovarik
28 Jul 2023 8:41 UTC
7
points
5
comments
3
min read
LW
link
Frontier Model Forum
Zach Stein-Perlman
26 Jul 2023 14:30 UTC
27
points
0
comments
4
min read
LW
link
(blog.google)
Frontier Model Security
Vaniver
26 Jul 2023 4:48 UTC
31
points
1
comment
3
min read
LW
link
(www.anthropic.com)
Anthropic Observations
Zvi
25 Jul 2023 12:50 UTC
104
points
1
comment
10
min read
LW
link
(thezvi.wordpress.com)
Measuring and Improving the Faithfulness of Model-Generated Reasoning
Ansh Radhakrishnan
,
tamera
,
karinanguyen
,
Sam Bowman
and
Ethan Perez
18 Jul 2023 16:36 UTC
109
points
13
comments
6
min read
LW
link
Anthropic | Charting a Path to AI Accountability
Gabriel Mukobi
14 Jun 2023 4:43 UTC
34
points
2
comments
3
min read
LW
link
(www.anthropic.com)
Rishi Sunak mentions “existential threats” in talk with OpenAI, DeepMind, Anthropic CEOs
Arjun Panickssery
,
Baldassare Castiglione
and
Cleo Nardo
24 May 2023 21:06 UTC
34
points
1
comment
1
min read
LW
link
(www.gov.uk)
Request to AGI organizations: Share your views on pausing AI progress
Akash
and
simeon_c
11 Apr 2023 17:30 UTC
141
points
11
comments
1
min read
LW
link
Anthropic is further accelerating the Arms Race?
sapphire
6 Apr 2023 23:29 UTC
82
points
22
comments
1
min read
LW
link
(techcrunch.com)
Anthropic: Core Views on AI Safety: When, Why, What, and How
jonmenaster
9 Mar 2023 17:34 UTC
17
points
1
comment
22
min read
LW
link
(www.anthropic.com)
Anthropic’s Core Views on AI Safety
Zac Hatfield-Dodds
9 Mar 2023 16:55 UTC
181
points
39
comments
2
min read
LW
link
(www.anthropic.com)
Podcast Transcript: Daniela and Dario Amodei on Anthropic
remember
7 Mar 2023 16:47 UTC
46
points
2
comments
79
min read
LW
link
(futureoflife.org)
[Preprint] Pretraining Language Models with Human Preferences
Giulio
21 Feb 2023 11:44 UTC
12
points
0
comments
1
min read
LW
link
(arxiv.org)
Paper: The Capacity for Moral Self-Correction in Large Language Models (Anthropic)
LawrenceC
16 Feb 2023 19:47 UTC
65
points
9
comments
1
min read
LW
link
(arxiv.org)
My understanding of Anthropic strategy
Swimmer963 (Miranda Dixon-Luinenburg)
15 Feb 2023 1:56 UTC
165
points
31
comments
4
min read
LW
link
[Linkpost] Google invested $300M in Anthropic in late 2022
Akash
3 Feb 2023 19:13 UTC
73
points
14
comments
1
min read
LW
link
(www.ft.com)
Concrete Reasons for Hope about AI
Zac Hatfield-Dodds
14 Jan 2023 1:22 UTC
101
points
13
comments
1
min read
LW
link
Why I’m joining Anthropic
evhub
5 Jan 2023 1:12 UTC
121
points
4
comments
1
min read
LW
link
[Question]
Will research in AI risk jinx it? Consequences of training AI on AI risk arguments
Yann Dubois
19 Dec 2022 22:42 UTC
5
points
6
comments
1
min read
LW
link
A challenge for AGI organizations, and a challenge for readers
Rob Bensinger
and
Eliezer Yudkowsky
1 Dec 2022 23:11 UTC
301
points
33
comments
2
min read
LW
link
The limited upside of interpretability
Peter S. Park
15 Nov 2022 18:46 UTC
13
points
11
comments
1
min read
LW
link
Toy Models of Superposition
evhub
21 Sep 2022 23:48 UTC
68
points
4
comments
5
min read
LW
link
1
review
(transformer-circuits.pub)
Anthropic’s SoLU (Softmax Linear Unit)
Joel Burget
4 Jul 2022 18:38 UTC
21
points
1
comment
4
min read
LW
link
(transformer-circuits.pub)
How do new models from OpenAI, DeepMind and Anthropic perform on TruthfulQA?
Owain_Evans
26 Feb 2022 12:46 UTC
44
points
3
comments
11
min read
LW
link
A Summary Of Anthropic’s First Paper
Sam Ringer
30 Dec 2021 0:48 UTC
82
points
1
comment
8
min read
LW
link
Mechanistic Interpretability for the MLP Layers (rough early thoughts)
MadHatter
24 Dec 2021 7:24 UTC
11
points
2
comments
1
min read
LW
link
(www.youtube.com)
Transformer Circuits
evhub
22 Dec 2021 21:09 UTC
144
points
4
comments
3
min read
LW
link
(transformer-circuits.pub)