Anthropic (org)

TagLast edit: 31 Dec 2024 22:02 UTC by ryan_greenblatt

Anthropic is an AI company based in San Francisco. The company is known for developing the Claude AI family and publishing research on AI safety.

Not to be confused with anthropics.

Anthropic’s Core Views on AI Safety

Zac Hatfield-Dodds9 Mar 2023 16:55 UTC

173 points

39 comments2 min readLW link

(www.anthropic.com)

My understanding of Anthropic strategy

Swimmer963 (Miranda Dixon-Luinenburg) 15 Feb 2023 1:56 UTC

167 points

31 comments4 min readLW link

Toy Models of Superposition

evhub21 Sep 2022 23:48 UTC

69 points

4 comments5 min readLW link 1 review

(transformer-circuits.pub)

Why I’m joining Anthropic

evhub5 Jan 2023 1:12 UTC

119 points

4 comments2 min readLW link

Concrete Reasons for Hope about AI

Zac Hatfield-Dodds14 Jan 2023 1:22 UTC

94 points

13 comments1 min readLW link

[Linkpost] Google invested $300M in Anthropic in late 2022

Orpheus163 Feb 2023 19:13 UTC

73 points

14 comments1 min readLW link

(www.ft.com)

Transformer Circuits

evhub22 Dec 2021 21:09 UTC

145 points

4 comments3 min readLW link

(transformer-circuits.pub)

Anthropic’s SoLU (Softmax Linear Unit)

Joel Burget4 Jul 2022 18:38 UTC

21 points

1 comment4 min readLW link

(transformer-circuits.pub)

Anthropic is further accelerating the Arms Race?

sapphire6 Apr 2023 23:29 UTC

82 points

22 comments1 min readLW link

(techcrunch.com)

OMMC Announces RIP

Adam Scholl and aysja

1 Apr 2024 23:20 UTC

190 points

5 comments2 min readLW link

Mechanistic Interpretability for the MLP Layers (rough early thoughts)

MadHatter24 Dec 2021 7:24 UTC

12 points

3 comments1 min readLW link

(www.youtube.com)

Anthropic’s Certificate of Incorporation

Zach Stein-Perlman12 Jun 2024 13:00 UTC

115 points

7 comments4 min readLW link

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning

Zac Hatfield-Dodds5 Oct 2023 21:01 UTC

289 points

22 comments2 min readLW link 1 review

(transformer-circuits.pub)

How do new models from OpenAI, DeepMind and Anthropic perform on TruthfulQA?

Owain_Evans26 Feb 2022 12:46 UTC

44 points

3 comments11 min readLW link

EIS XIII: Reflections on Anthropic’s SAE Research Circa May 2024

scasper21 May 2024 20:15 UTC

157 points

16 comments3 min readLW link

[Anthropic] A hacker used Claude Code to automate ransomware

bohaska27 Aug 2025 14:57 UTC

86 points

25 comments3 min readLW link

(www.anthropic.com)

Anthropic Faces Potentially “Business-Ending” Copyright Lawsuit

garrison25 Jul 2025 17:01 UTC

57 points

15 comments9 min readLW link

(www.obsolete.pub)

Anthropic Lets Claude Opus 4 & 4.1 End Conversations

Stephen Martin16 Aug 2025 5:01 UTC

53 points

3 comments1 min readLW link

(www.anthropic.com)

Amazon to invest up to $4 billion in Anthropic

Davis_Kingsley25 Sep 2023 14:55 UTC

44 points

8 comments1 min readLW link

(twitter.com)

Anthropic AI made the right call

bhauth15 Apr 2024 0:39 UTC

22 points

20 comments1 min readLW link

Maybe Anthropic’s Long-Term Benefit Trust is powerless

Zach Stein-Perlman27 May 2024 13:00 UTC

206 points

21 comments2 min readLW link

On Claude 3.5 Sonnet

Zvi24 Jun 2024 12:00 UTC

95 points

14 comments13 min readLW link

(thezvi.wordpress.com)

Podcast Transcript: Daniela and Dario Amodei on Anthropic

remember7 Mar 2023 16:47 UTC

46 points

2 comments79 min readLW link

(futureoflife.org)

A Summary Of Anthropic’s First Paper

Sam Ringer30 Dec 2021 0:48 UTC

86 points

1 comment8 min readLW link

Paper: The Capacity for Moral Self-Correction in Large Language Models (Anthropic)

LawrenceC16 Feb 2023 19:47 UTC

65 points

9 comments1 min readLW link

(arxiv.org)

Vaniver’s thoughts on Anthropic’s RSP

Vaniver28 Oct 2023 21:06 UTC

46 points

4 comments3 min readLW link

Dario Amodei’s prepared remarks from the UK AI Safety Summit, on Anthropic’s Responsible Scaling Policy

Zac Hatfield-Dodds1 Nov 2023 18:10 UTC

85 points

1 comment4 min readLW link

(www.anthropic.com)

Anthropic: Reflections on our Responsible Scaling Policy

Zac Hatfield-Dodds20 May 2024 4:14 UTC

30 points

21 comments10 min readLW link

(www.anthropic.com)

Anthropic’s updated Responsible Scaling Policy

Zac Hatfield-Dodds15 Oct 2024 16:46 UTC

38 points

3 comments3 min readLW link

(www.anthropic.com)

Putting up Bumpers

Sam Bowman23 Apr 2025 16:05 UTC

54 points

14 comments2 min readLW link

On Anthropic’s Sleeper Agents Paper

Zvi17 Jan 2024 16:10 UTC

54 points

5 comments36 min readLW link

(thezvi.wordpress.com)

what makes Claude 3 Opus misaligned

janus10 Jul 2025 20:06 UTC

104 points

11 comments5 min readLW link

Anthropic, and taking “technical philosophy” more seriously

Raemon13 Mar 2025 1:48 UTC

139 points

29 comments11 min readLW link

Frontier Model Security

Vaniver26 Jul 2023 4:48 UTC

32 points

1 comment3 min readLW link

(www.anthropic.com)

Anthropic: Three Sketches of ASL-4 Safety Case Components

Zach Stein-Perlman6 Nov 2024 16:00 UTC

95 points

33 comments1 min readLW link

(alignment.anthropic.com)

John Schulman leaves OpenAI for Anthropic [and then left Anthropic again for Thinking Machines]

Sodium6 Aug 2024 1:23 UTC

57 points

0 comments1 min readLW link

“The Urgency of Interpretability” (Dario Amodei)

RobertM27 Apr 2025 4:31 UTC

31 points

23 comments3 min readLW link

(www.darioamodei.com)

Request to AGI organizations: Share your views on pausing AI progress

Orpheus16 and simeon_c

11 Apr 2023 17:30 UTC

141 points

11 comments1 min readLW link

Anthropic releases Claude 3.7 Sonnet with extended thinking mode

LawrenceC24 Feb 2025 19:32 UTC

88 points

8 comments4 min readLW link

(www.anthropic.com)

Anthropic Observations

Zvi25 Jul 2023 12:50 UTC

104 points

1 comment10 min readLW link

(thezvi.wordpress.com)

Anthropic CEO calls for RSI

Andrea_Miotti29 Jan 2025 16:54 UTC

32 points

10 comments1 min readLW link

(darioamodei.com)

Project Vend: Can Claude run a small shop?

Gunnar_Zarncke30 Jun 2025 15:22 UTC

53 points

8 comments1 min readLW link

(www.anthropic.com)

Frontier Model Forum

Zach Stein-Perlman26 Jul 2023 14:30 UTC

27 points

0 comments4 min readLW link

(blog.google)

AI Sleeper Agents: How Anthropic Trains and Catches Them—Video

Writer30 Aug 2025 17:53 UTC

9 points

0 comments7 min readLW link

(youtu.be)

Introducing Alignment Stress-Testing at Anthropic

evhub12 Jan 2024 23:51 UTC

182 points

23 comments2 min readLW link

Anthropic rewrote its RSP

Zach Stein-Perlman15 Oct 2024 14:25 UTC

46 points

19 comments6 min readLW link

Anthropic: Core Views on AI Safety: When, Why, What, and How

jonmenaster9 Mar 2023 17:34 UTC

17 points

1 comment22 min readLW link

(www.anthropic.com)

Anthropic’s Responsible Scaling Policy & Long-Term Benefit Trust

Zac Hatfield-Dodds19 Sep 2023 15:09 UTC

85 points

26 comments3 min readLW link 1 review

(www.anthropic.com)

Anthropic’s leading researchers acted as moderate accelerationists

Remmelt1 Sep 2025 23:23 UTC

118 points

69 comments42 min readLW link

Training fails to elicit subtle reasoning in current language models

mishajw, Fabien Roger, Hoagy, gasteigerjo, Joe Benton and Vlad Mikulik

9 Oct 2025 19:04 UTC

42 points

2 comments4 min readLW link

(alignment.anthropic.com)

Scalable And Transferable Black-Box Jailbreaks For Language Models Via Persona Modulation

Soroush Pour, rusheb, Quentin FEUILLADE--MONTIXI, Arush and scasper

7 Nov 2023 17:59 UTC

38 points

2 comments2 min readLW link

(arxiv.org)

Alignment Faking in Large Language Models

ryan_greenblatt, evhub, Carson Denison, Benjamin Wright, Fabien Roger, Monte M, Sam Marks, Johannes Treutlein, Sam Bowman and Buck

18 Dec 2024 17:19 UTC

489 points

75 comments10 min readLW link

Anthropic | Charting a Path to AI Accountability

Gabe M14 Jun 2023 4:43 UTC

34 points

2 comments3 min readLW link

(www.anthropic.com)

Measuring and Improving the Faithfulness of Model-Generated Reasoning

Ansh Radhakrishnan, tamera, karinanguyen, Sam Bowman and Ethan Perez

18 Jul 2023 16:36 UTC

111 points

15 comments6 min readLW link 1 review

Triggering Reflective Fallback: A Case Study in Claude’s Simulated Self-Model Failure

unmodeled.tyler8 Jul 2025 7:33 UTC

1 point

0 comments1 min readLW link

[Question] Will research in AI risk jinx it? Consequences of training AI on AI risk arguments

Yann Dubois19 Dec 2022 22:42 UTC

5 points

6 comments1 min readLW link

Anthropic—The case for targeted regulation

anaguma5 Nov 2024 7:07 UTC

11 points

0 comments2 min readLW link

(www.anthropic.com)

Independent research article analyzing consistent self-reports of experience in ChatGPT and Claude

rife6 Jan 2025 17:34 UTC

4 points

20 comments1 min readLW link

(awakenmoon.ai)

Anthropic teams up with Palantir and AWS to sell AI to defense customers

Matrice Jacobine9 Nov 2024 11:50 UTC

9 points

0 comments2 min readLW link

(techcrunch.com)

Rishi Sunak mentions “existential threats” in talk with OpenAI, DeepMind, Anthropic CEOs

Arjun Panickssery, Baldassare Castiglione and Cleo Nardo

24 May 2023 21:06 UTC

34 points

1 comment1 min readLW link

(www.gov.uk)

Quick Thoughts on Scaling Monosemanticity

Joel Burget23 May 2024 16:22 UTC

28 points

1 comment4 min readLW link

(transformer-circuits.pub)

A challenge for AGI organizations, and a challenge for readers

Rob Bensinger and Eliezer Yudkowsky

1 Dec 2022 23:11 UTC

302 points

33 comments2 min readLW link

Comparing Anthropic’s Dictionary Learning to Ours

Robert_AIZI7 Oct 2023 23:30 UTC

137 points

8 comments4 min readLW link

Claude 4, Opportunistic Blackmail, and “Pleas”

Stephen Martin22 May 2025 19:59 UTC

30 points

2 comments2 min readLW link

The limited upside of interpretability

Peter S. Park15 Nov 2022 18:46 UTC

13 points

11 comments10 min readLW link

[Preprint] Pretraining Language Models with Human Preferences

Giulio21 Feb 2023 11:44 UTC

12 points

0 comments1 min readLW link

(arxiv.org)

Dario Amodei — Machines of Loving Grace

Matrice Jacobine11 Oct 2024 21:43 UTC

63 points

26 comments1 min readLW link

(darioamodei.com)

[Question] Has Anthropic checked if Claude fakes alignment for intended values too?

Maloew23 Dec 2024 0:43 UTC

4 points

1 comment1 min readLW link

AI Awareness through Interaction with Blatantly Alien Models

VojtaKovarik28 Jul 2023 8:41 UTC

7 points

5 comments3 min readLW link

A letter to Kyle Fish on the Retirement of Claude 3 Sonnet

bridgebot15 Aug 2025 1:08 UTC

−4 points

3 comments5 min readLW link

Cicadas, Anthropic, and the bilateral alignment problem

kromem22 May 2024 11:09 UTC

28 points

6 comments5 min readLW link

Category-Theoretic Wanderings into Interpretability

unruly abstractions2 Sep 2025 0:03 UTC

18 points

2 comments1 min readLW link

(www.unrulyabstractions.com)

Sea Change

Charlie Sanders18 Feb 2025 6:03 UTC

−2 points

2 comments5 min readLW link

(www.dailymicrofiction.com)

Introducing the Anthropic Fellows Program

Miranda Zhang and Ethan Perez

30 Nov 2024 23:47 UTC

26 points

0 comments4 min readLW link

(alignment.anthropic.com)

No comments.

An­thropic (org)

Anthropic (org)