RSS

An­thropic (org)

TagLast edit: 25 Dec 2021 4:12 UTC by Multicore

Anthropic is an AI organization.

Not to be confused with anthropics.

An­thropic’s Core Views on AI Safety

Zac Hatfield-Dodds9 Mar 2023 16:55 UTC
181 points
39 comments2 min readLW link
(www.anthropic.com)

My un­der­stand­ing of An­thropic strategy

Swimmer963 (Miranda Dixon-Luinenburg) 15 Feb 2023 1:56 UTC
166 points
31 comments4 min readLW link

Why I’m join­ing Anthropic

evhub5 Jan 2023 1:12 UTC
121 points
4 comments1 min readLW link

Toy Models of Superposition

evhub21 Sep 2022 23:48 UTC
68 points
4 comments5 min readLW link1 review
(transformer-circuits.pub)

Con­crete Rea­sons for Hope about AI

Zac Hatfield-Dodds14 Jan 2023 1:22 UTC
101 points
13 comments1 min readLW link

[Linkpost] Google in­vested $300M in An­thropic in late 2022

Akash3 Feb 2023 19:13 UTC
73 points
14 comments1 min readLW link
(www.ft.com)

Trans­former Circuits

evhub22 Dec 2021 21:09 UTC
144 points
4 comments3 min readLW link
(transformer-circuits.pub)

An­thropic’s SoLU (Soft­max Lin­ear Unit)

Joel Burget4 Jul 2022 18:38 UTC
21 points
1 comment4 min readLW link
(transformer-circuits.pub)

OMMC An­nounces RIP

1 Apr 2024 23:20 UTC
178 points
5 comments2 min readLW link

Mechanis­tic In­ter­pretabil­ity for the MLP Lay­ers (rough early thoughts)

MadHatter24 Dec 2021 7:24 UTC
11 points
2 comments1 min readLW link
(www.youtube.com)

Towards Monose­man­tic­ity: De­com­pos­ing Lan­guage Models With Dic­tionary Learning

Zac Hatfield-Dodds5 Oct 2023 21:01 UTC
286 points
21 comments2 min readLW link
(transformer-circuits.pub)

An­thropic is fur­ther ac­cel­er­at­ing the Arms Race?

sapphire6 Apr 2023 23:29 UTC
82 points
22 comments1 min readLW link
(techcrunch.com)

An­thropic’s Re­spon­si­ble Scal­ing Policy & Long-Term Benefit Trust

Zac Hatfield-Dodds19 Sep 2023 15:09 UTC
90 points
23 comments3 min readLW link
(www.anthropic.com)

Paper: The Ca­pac­ity for Mo­ral Self-Cor­rec­tion in Large Lan­guage Models (An­thropic)

LawrenceC16 Feb 2023 19:47 UTC
65 points
9 comments1 min readLW link
(arxiv.org)

EIS XIII: Reflec­tions on An­thropic’s SAE Re­search Circa May 2024

scasper21 May 2024 20:15 UTC
120 points
12 comments3 min readLW link

On An­thropic’s Sleeper Agents Paper

Zvi17 Jan 2024 16:10 UTC
54 points
5 comments36 min readLW link
(thezvi.wordpress.com)

In­tro­duc­ing Align­ment Stress-Test­ing at Anthropic

evhub12 Jan 2024 23:51 UTC
179 points
23 comments2 min readLW link

Vaniver’s thoughts on An­thropic’s RSP

Vaniver28 Oct 2023 21:06 UTC
46 points
4 comments3 min readLW link

An­thropic AI made the right call

bhauth15 Apr 2024 0:39 UTC
22 points
19 comments1 min readLW link

Re­quest to AGI or­ga­ni­za­tions: Share your views on paus­ing AI progress

11 Apr 2023 17:30 UTC
141 points
11 comments1 min readLW link

Dario Amodei’s pre­pared re­marks from the UK AI Safety Sum­mit, on An­thropic’s Re­spon­si­ble Scal­ing Policy

Zac Hatfield-Dodds1 Nov 2023 18:10 UTC
85 points
1 comment4 min readLW link
(www.anthropic.com)

An­thropic Observations

Zvi25 Jul 2023 12:50 UTC
104 points
1 comment10 min readLW link
(thezvi.wordpress.com)

Fron­tier Model Security

Vaniver26 Jul 2023 4:48 UTC
31 points
1 comment3 min readLW link
(www.anthropic.com)

Fron­tier Model Forum

Zach Stein-Perlman26 Jul 2023 14:30 UTC
27 points
0 comments4 min readLW link
(blog.google)

Ama­zon to in­vest up to $4 billion in Anthropic

Davis_Kingsley25 Sep 2023 14:55 UTC
44 points
8 comments1 min readLW link
(twitter.com)

A Sum­mary Of An­thropic’s First Paper

Sam Ringer30 Dec 2021 0:48 UTC
82 points
1 comment8 min readLW link

How do new mod­els from OpenAI, Deep­Mind and An­thropic perform on Truth­fulQA?

Owain_Evans26 Feb 2022 12:46 UTC
44 points
3 comments11 min readLW link

An­thropic: Reflec­tions on our Re­spon­si­ble Scal­ing Policy

Zac Hatfield-Dodds20 May 2024 4:14 UTC
40 points
21 comments10 min readLW link
(www.anthropic.com)

Pod­cast Tran­script: Daniela and Dario Amodei on Anthropic

remember7 Mar 2023 16:47 UTC
46 points
2 comments79 min readLW link
(futureoflife.org)

An­thropic: Core Views on AI Safety: When, Why, What, and How

jonmenaster9 Mar 2023 17:34 UTC
17 points
1 comment22 min readLW link
(www.anthropic.com)

An­thropic | Chart­ing a Path to AI Accountability

Gabe M14 Jun 2023 4:43 UTC
34 points
2 comments3 min readLW link
(www.anthropic.com)

Rishi Su­nak men­tions “ex­is­ten­tial threats” in talk with OpenAI, Deep­Mind, An­thropic CEOs

24 May 2023 21:06 UTC
34 points
1 comment1 min readLW link
(www.gov.uk)

The limited up­side of interpretability

Peter S. Park15 Nov 2022 18:46 UTC
13 points
11 comments1 min readLW link

A challenge for AGI or­ga­ni­za­tions, and a challenge for readers

1 Dec 2022 23:11 UTC
301 points
33 comments2 min readLW link

[Question] Will re­search in AI risk jinx it? Con­se­quences of train­ing AI on AI risk arguments

Yann Dubois19 Dec 2022 22:42 UTC
5 points
6 comments1 min readLW link

Scal­able And Trans­fer­able Black-Box Jailbreaks For Lan­guage Models Via Per­sona Modulation

7 Nov 2023 17:59 UTC
36 points
2 comments2 min readLW link
(arxiv.org)

Quick Thoughts on Scal­ing Monosemanticity

Joel Burget23 May 2024 16:22 UTC
22 points
1 comment4 min readLW link
(transformer-circuits.pub)

Com­par­ing An­thropic’s Dic­tionary Learn­ing to Ours

Robert_AIZI7 Oct 2023 23:30 UTC
136 points
8 comments4 min readLW link

Mea­sur­ing and Im­prov­ing the Faith­ful­ness of Model-Gen­er­ated Rea­son­ing

18 Jul 2023 16:36 UTC
109 points
13 comments6 min readLW link

[Preprint] Pre­train­ing Lan­guage Models with Hu­man Preferences

Giulio21 Feb 2023 11:44 UTC
12 points
0 comments1 min readLW link
(arxiv.org)

Ci­cadas, An­thropic, and the bilat­eral al­ign­ment problem

kromem22 May 2024 11:09 UTC
24 points
4 comments5 min readLW link

AI Aware­ness through In­ter­ac­tion with Blatantly Alien Models

VojtaKovarik28 Jul 2023 8:41 UTC
7 points
5 comments3 min readLW link
No comments.