RSS

An­thropic (org)

TagLast edit: 15 Feb 2023 14:12 UTC by Yoav Ravid

Anthropic is an AI organization.

Not to be confused with anthropics.

An­thropic’s Core Views on AI Safety

Zac Hatfield-Dodds9 Mar 2023 16:55 UTC
178 points
38 comments2 min readLW link
(www.anthropic.com)

Toy Models of Superposition

evhub21 Sep 2022 23:48 UTC
64 points
2 comments5 min readLW link
(transformer-circuits.pub)

Why I’m join­ing Anthropic

evhub5 Jan 2023 1:12 UTC
121 points
4 comments1 min readLW link

Con­crete Rea­sons for Hope about AI

Zac Hatfield-Dodds14 Jan 2023 1:22 UTC
107 points
13 comments1 min readLW link

[Linkpost] Google in­vested $300M in An­thropic in late 2022

Akash3 Feb 2023 19:13 UTC
73 points
14 comments1 min readLW link
(www.ft.com)

An­thropic’s SoLU (Soft­max Lin­ear Unit)

Joel Burget4 Jul 2022 18:38 UTC
15 points
1 comment4 min readLW link
(transformer-circuits.pub)

Trans­former Circuits

evhub22 Dec 2021 21:09 UTC
143 points
4 comments3 min readLW link
(transformer-circuits.pub)

My un­der­stand­ing of An­thropic strategy

Swimmer963 (Miranda Dixon-Luinenburg) 15 Feb 2023 1:56 UTC
161 points
28 comments4 min readLW link

Mechanis­tic In­ter­pretabil­ity for the MLP Lay­ers (rough early thoughts)

MadHatter24 Dec 2021 7:24 UTC
11 points
2 comments1 min readLW link
(www.youtube.com)

A Sum­mary Of An­thropic’s First Paper

Sam Ringer30 Dec 2021 0:48 UTC
82 points
1 comment8 min readLW link

How do new mod­els from OpenAI, Deep­Mind and An­thropic perform on Truth­fulQA?

Owain_Evans26 Feb 2022 12:46 UTC
42 points
3 comments11 min readLW link

Paper: The Ca­pac­ity for Mo­ral Self-Cor­rec­tion in Large Lan­guage Models (An­thropic)

LawrenceC16 Feb 2023 19:47 UTC
65 points
9 comments1 min readLW link
(arxiv.org)

Pod­cast Tran­script: Daniela and Dario Amodei on Anthropic

remember7 Mar 2023 16:47 UTC
46 points
2 comments79 min readLW link
(futureoflife.org)

An­thropic: Core Views on AI Safety: When, Why, What, and How

jonmenaster9 Mar 2023 17:34 UTC
17 points
1 comment22 min readLW link
(www.anthropic.com)

The limited up­side of interpretability

Peter S. Park15 Nov 2022 18:46 UTC
13 points
11 comments1 min readLW link

A challenge for AGI or­ga­ni­za­tions, and a challenge for readers

1 Dec 2022 23:11 UTC
294 points
32 comments2 min readLW link

[Question] Will re­search in AI risk jinx it? Con­se­quences of train­ing AI on AI risk arguments

Yann Dubois19 Dec 2022 22:42 UTC
5 points
6 comments1 min readLW link

[Preprint] Pre­train­ing Lan­guage Models with Hu­man Preferences

thesofakillers21 Feb 2023 11:44 UTC
12 points
0 comments1 min readLW link
(arxiv.org)
No comments.