Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Anthropic (org)
Tag
Last edit:
15 Feb 2023 14:12 UTC
by
Yoav Ravid
Anthropic
is an AI organization.
Not to be confused with
anthropics
.
Relevant
New
Old
Anthropic’s Core Views on AI Safety
Zac Hatfield-Dodds
9 Mar 2023 16:55 UTC
178
points
38
comments
2
min read
LW
link
(www.anthropic.com)
Toy Models of Superposition
evhub
21 Sep 2022 23:48 UTC
64
points
2
comments
5
min read
LW
link
(transformer-circuits.pub)
Why I’m joining Anthropic
evhub
5 Jan 2023 1:12 UTC
121
points
4
comments
1
min read
LW
link
Concrete Reasons for Hope about AI
Zac Hatfield-Dodds
14 Jan 2023 1:22 UTC
107
points
13
comments
1
min read
LW
link
[Linkpost] Google invested $300M in Anthropic in late 2022
Akash
3 Feb 2023 19:13 UTC
73
points
14
comments
1
min read
LW
link
(www.ft.com)
Anthropic’s SoLU (Softmax Linear Unit)
Joel Burget
4 Jul 2022 18:38 UTC
15
points
1
comment
4
min read
LW
link
(transformer-circuits.pub)
Transformer Circuits
evhub
22 Dec 2021 21:09 UTC
143
points
4
comments
3
min read
LW
link
(transformer-circuits.pub)
My understanding of Anthropic strategy
Swimmer963 (Miranda Dixon-Luinenburg)
15 Feb 2023 1:56 UTC
161
points
28
comments
4
min read
LW
link
Mechanistic Interpretability for the MLP Layers (rough early thoughts)
MadHatter
24 Dec 2021 7:24 UTC
11
points
2
comments
1
min read
LW
link
(www.youtube.com)
A Summary Of Anthropic’s First Paper
Sam Ringer
30 Dec 2021 0:48 UTC
82
points
1
comment
8
min read
LW
link
How do new models from OpenAI, DeepMind and Anthropic perform on TruthfulQA?
Owain_Evans
26 Feb 2022 12:46 UTC
42
points
3
comments
11
min read
LW
link
Paper: The Capacity for Moral Self-Correction in Large Language Models (Anthropic)
LawrenceC
16 Feb 2023 19:47 UTC
65
points
9
comments
1
min read
LW
link
(arxiv.org)
Podcast Transcript: Daniela and Dario Amodei on Anthropic
remember
7 Mar 2023 16:47 UTC
46
points
2
comments
79
min read
LW
link
(futureoflife.org)
Anthropic: Core Views on AI Safety: When, Why, What, and How
jonmenaster
9 Mar 2023 17:34 UTC
17
points
1
comment
22
min read
LW
link
(www.anthropic.com)
The limited upside of interpretability
Peter S. Park
15 Nov 2022 18:46 UTC
13
points
11
comments
1
min read
LW
link
A challenge for AGI organizations, and a challenge for readers
Rob Bensinger
and
Eliezer Yudkowsky
1 Dec 2022 23:11 UTC
294
points
32
comments
2
min read
LW
link
[Question]
Will research in AI risk jinx it? Consequences of training AI on AI risk arguments
Yann Dubois
19 Dec 2022 22:42 UTC
5
points
6
comments
1
min read
LW
link
[Preprint] Pretraining Language Models with Human Preferences
thesofakillers
21 Feb 2023 11:44 UTC
12
points
0
comments
1
min read
LW
link
(arxiv.org)
No comments.
Back to top