Anthropic (org)

TagLast edit: 25 Dec 2021 4:12 UTC by Multicore

Anthropic is an AI organization.

Not to be confused with anthropics.

Anthropic AI made the right call

bhauth15 Apr 2024 0:39 UTC

30 points

19 comments1 min readLW link

OMMC Announces RIP

Adam Scholl and aysja

1 Apr 2024 23:20 UTC

178 points

5 comments2 min readLW link

On Anthropic’s Sleeper Agents Paper

Zvi17 Jan 2024 16:10 UTC

54 points

5 comments36 min readLW link

(thezvi.wordpress.com)

Introducing Alignment Stress-Testing at Anthropic

evhub12 Jan 2024 23:51 UTC

179 points

23 comments2 min readLW link

Scalable And Transferable Black-Box Jailbreaks For Language Models Via Persona Modulation

Soroush Pour, rusheb, Quentin FEUILLADE--MONTIXI, Arush and scasper

7 Nov 2023 17:59 UTC

36 points

2 comments2 min readLW link

(arxiv.org)

Dario Amodei’s prepared remarks from the UK AI Safety Summit, on Anthropic’s Responsible Scaling Policy

Zac Hatfield-Dodds1 Nov 2023 18:10 UTC

85 points

1 comment4 min readLW link

(www.anthropic.com)

Vaniver’s thoughts on Anthropic’s RSP

Vaniver28 Oct 2023 21:06 UTC

46 points

4 comments3 min readLW link

Comparing Anthropic’s Dictionary Learning to Ours

Robert_AIZI7 Oct 2023 23:30 UTC

136 points

8 comments4 min readLW link

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning

Zac Hatfield-Dodds5 Oct 2023 21:01 UTC

286 points

19 comments2 min readLW link

(transformer-circuits.pub)

Amazon to invest up to $4 billion in Anthropic

Davis_Kingsley25 Sep 2023 14:55 UTC

44 points

8 comments1 min readLW link

(twitter.com)

Anthropic’s Responsible Scaling Policy & Long-Term Benefit Trust

Zac Hatfield-Dodds19 Sep 2023 15:09 UTC

90 points

23 comments3 min readLW link

(www.anthropic.com)

AI Awareness through Interaction with Blatantly Alien Models

VojtaKovarik28 Jul 2023 8:41 UTC

7 points

5 comments3 min readLW link

Frontier Model Forum

Zach Stein-Perlman26 Jul 2023 14:30 UTC

27 points

0 comments4 min readLW link

(blog.google)

Frontier Model Security

Vaniver26 Jul 2023 4:48 UTC

31 points

1 comment3 min readLW link

(www.anthropic.com)

Anthropic Observations

Zvi25 Jul 2023 12:50 UTC

104 points

1 comment10 min readLW link

(thezvi.wordpress.com)

Measuring and Improving the Faithfulness of Model-Generated Reasoning

Ansh Radhakrishnan, tamera, karinanguyen, Sam Bowman and Ethan Perez

18 Jul 2023 16:36 UTC

109 points

13 comments6 min readLW link

Anthropic | Charting a Path to AI Accountability

Gabriel Mukobi14 Jun 2023 4:43 UTC

34 points

2 comments3 min readLW link

(www.anthropic.com)

Rishi Sunak mentions “existential threats” in talk with OpenAI, DeepMind, Anthropic CEOs

Arjun Panickssery, Baldassare Castiglione and Cleo Nardo

24 May 2023 21:06 UTC

34 points

1 comment1 min readLW link

(www.gov.uk)

Request to AGI organizations: Share your views on pausing AI progress

Akash and simeon_c

11 Apr 2023 17:30 UTC

141 points

11 comments1 min readLW link

Anthropic is further accelerating the Arms Race?

sapphire6 Apr 2023 23:29 UTC

82 points

22 comments1 min readLW link

(techcrunch.com)

Anthropic: Core Views on AI Safety: When, Why, What, and How

jonmenaster9 Mar 2023 17:34 UTC

17 points

1 comment22 min readLW link

(www.anthropic.com)

Anthropic’s Core Views on AI Safety

Zac Hatfield-Dodds9 Mar 2023 16:55 UTC

181 points

39 comments2 min readLW link

(www.anthropic.com)

Podcast Transcript: Daniela and Dario Amodei on Anthropic

remember7 Mar 2023 16:47 UTC

46 points

2 comments79 min readLW link

(futureoflife.org)

[Preprint] Pretraining Language Models with Human Preferences

Giulio21 Feb 2023 11:44 UTC

12 points

0 comments1 min readLW link

(arxiv.org)

Paper: The Capacity for Moral Self-Correction in Large Language Models (Anthropic)

LawrenceC16 Feb 2023 19:47 UTC

65 points

9 comments1 min readLW link

(arxiv.org)

My understanding of Anthropic strategy

Swimmer963 (Miranda Dixon-Luinenburg) 15 Feb 2023 1:56 UTC

165 points

31 comments4 min readLW link

[Linkpost] Google invested $300M in Anthropic in late 2022

Akash3 Feb 2023 19:13 UTC

73 points

14 comments1 min readLW link

(www.ft.com)

Concrete Reasons for Hope about AI

Zac Hatfield-Dodds14 Jan 2023 1:22 UTC

101 points

13 comments1 min readLW link

Why I’m joining Anthropic

evhub5 Jan 2023 1:12 UTC

121 points

4 comments1 min readLW link

[Question] Will research in AI risk jinx it? Consequences of training AI on AI risk arguments

Yann Dubois19 Dec 2022 22:42 UTC

5 points

6 comments1 min readLW link

A challenge for AGI organizations, and a challenge for readers

Rob Bensinger and Eliezer Yudkowsky

1 Dec 2022 23:11 UTC

301 points

33 comments2 min readLW link

The limited upside of interpretability

Peter S. Park15 Nov 2022 18:46 UTC

13 points

11 comments1 min readLW link

Toy Models of Superposition

evhub21 Sep 2022 23:48 UTC

68 points

4 comments5 min readLW link 1 review

(transformer-circuits.pub)

Anthropic’s SoLU (Softmax Linear Unit)

Joel Burget4 Jul 2022 18:38 UTC

21 points

1 comment4 min readLW link

(transformer-circuits.pub)

How do new models from OpenAI, DeepMind and Anthropic perform on TruthfulQA?

Owain_Evans26 Feb 2022 12:46 UTC

44 points

3 comments11 min readLW link

A Summary Of Anthropic’s First Paper

Sam Ringer30 Dec 2021 0:48 UTC

82 points

1 comment8 min readLW link

Mechanistic Interpretability for the MLP Layers (rough early thoughts)

MadHatter24 Dec 2021 7:24 UTC

11 points

2 comments1 min readLW link

(www.youtube.com)

Transformer Circuits

evhub22 Dec 2021 21:09 UTC

144 points

4 comments3 min readLW link

(transformer-circuits.pub)

An­thropic (org)

Anthropic (org)