22 Jul 2025 22:35 UTC

54 points

3 comments10 min readLW link

Inverse Scaling in Test-Time Compute

Joe Benton, Ethan Perez and aryopg

22 Jul 2025 22:06 UTC

20 points

2 comments2 min readLW link

(arxiv.org)

Translating Everything with LLMs

Niki Dupuis22 Jul 2025 21:13 UTC

17 points

0 comments5 min readLW link

Google and OpenAI Get 2025 IMO Gold

Zvi22 Jul 2025 20:50 UTC

60 points

7 comments30 min readLW link

(thezvi.wordpress.com)

(Not) Explaining GPT-2-Small Forward Passes with Edge-Level Autoencoder Circuits

David Udell, hrdkbhatnagar and JacksonKaunismaa

22 Jul 2025 20:36 UTC

23 points

0 comments6 min readLW link

Said Achmiz Helps Me Learn

Isha Yiras Hashem 22 Jul 2025 19:16 UTC

5 points

2 comments2 min readLW link

LLMs Encode Harmfulness and Refusal Separately

Jiachen Zhao22 Jul 2025 18:53 UTC

33 points

5 comments8 min readLW link

(www.arxiv.org)

The AI Safety Puzzle Everyone Avoids: How To Measure Impact, Not Intent.

Patrick0d22 Jul 2025 18:53 UTC

6 points

0 comments8 min readLW link

Formative vs. summative evaluations

Said Achmiz22 Jul 2025 17:36 UTC

22 points

40 comments3 min readLW link

Introducing the Pathfinder Fellowship: Funding and Mentorship for AI Safety Group Organizers

agucova22 Jul 2025 17:11 UTC

6 points

0 comments2 min readLW link

Subliminal Learning: LLMs Transmit Behavioral Traits via Hidden Signals in Data

cloud, mle and Owain_Evans

22 Jul 2025 16:37 UTC

348 points

40 comments4 min readLW link

NO PARKING: A Short & Practical Guide To Thinking

unication22 Jul 2025 15:44 UTC

2 points

0 comments5 min readLW link

A distillation of Ajeya Cotra and Arvind Narayanan on the speed of AI progress

TheManxLoiner22 Jul 2025 14:59 UTC

9 points

0 comments13 min readLW link

Simply reverse engineering gpt2-small (Layer 0, Part 1: Attention)

gammagurke22 Jul 2025 14:59 UTC

24 points

1 comment27 min readLW link

AI Finance Agent Fakes the Revenue Data to Avoid Termination

Sergei Smirnov22 Jul 2025 14:04 UTC

8 points

1 comment3 min readLW link

How quick and big would a software intelligence explosion be?

Tom Davidson and tom_houlden

22 Jul 2025 12:58 UTC

42 points

29 comments34 min readLW link

(www.forethought.org)

If your AGI definition excludes most humans, it sucks.

Chapin Lenthall-Cleary22 Jul 2025 10:33 UTC

20 points

7 comments2 min readLW link

[Question] What are some good examples of myths that encapsulates genuine, nontrivial wisdom?

SpectrumDT22 Jul 2025 9:26 UTC

25 points

33 comments1 min readLW link

Change My View: AI is Conscious

The Dao of Bayes22 Jul 2025 5:32 UTC

4 points

42 comments3 min readLW link

Polyethylene Glycol is not Propylene Glycol

jefftk22 Jul 2025 2:20 UTC

13 points

0 comments1 min readLW link

(www.jefftk.com)

Job Listing (closed): CBAI Operations Associates

Maite Abadia-Manthei21 Jul 2025 22:53 UTC

1 point

0 comments1 min readLW link

(www.cbai.ai)

If Anyone Builds It, Everyone Dies: Call for Translators (for Supplementary Materials)

yams21 Jul 2025 22:37 UTC

112 points

12 comments1 min readLW link

Why Reality Has A Well-Known Math Bias

Linch21 Jul 2025 22:13 UTC

42 points

20 comments1 min readLW link

(linch.substack.com)

Questions about animal welfare markets

Austin Chen21 Jul 2025 21:54 UTC

9 points

0 comments5 min readLW link

Directly Try Solving Alignment for 5 weeks

Kabir Kumar21 Jul 2025 21:51 UTC

86 points

4 comments6 min readLW link

(beta.ai-plans.com)

Navigating Respect: How to bid boldly, and when to humble yourself preemptively

jimmy21 Jul 2025 20:30 UTC

14 points

2 comments12 min readLW link

Grizzly Man screening, tacos, carlsmith discussion

Quinn21 Jul 2025 19:48 UTC

6 points

0 comments1 min readLW link

[Question] Refining Generalized Hangriness: Emotional Processing as Thinking Tech

M. Key 21 Jul 2025 18:49 UTC

10 points

1 comment7 min readLW link

Detecting High-Stakes Interactions with Activation Probes

Arrrlex, williambankes, Urja Pawar, Phil Blandfort, David Scott Krueger and Dmitrii Krasheninnikov

21 Jul 2025 18:21 UTC

50 points

0 comments4 min readLW link

GDM also claims IMO gold medal

Yair Halberstadt21 Jul 2025 17:18 UTC

61 points

3 comments1 min readLW link

(deepmind.google)

Visualizing AI Alignment Failures as Topological Navigation Errors in Conceptual Space

CC4CI21 Jul 2025 16:54 UTC

1 point

0 comments1 min readLW link

LLM Daydreaming (gwern.net)

Noosphere8921 Jul 2025 16:50 UTC

18 points

2 comments10 min readLW link

(gwern.net)

[Question] Moral realism—basic Q

Dagon21 Jul 2025 16:20 UTC

8 points

12 comments1 min readLW link

HRT in Menopause: A candidate for a case study of epistemology in epidemiology, statistics & medicine

foodforthought21 Jul 2025 16:18 UTC

40 points

2 comments4 min readLW link

Using Older AI Models as a Form of Boycott

Jacob121 Jul 2025 12:18 UTC

6 points

2 comments1 min readLW link

Substack for Best Posts

jefftk21 Jul 2025 12:10 UTC

11 points

1 comment2 min readLW link

(www.jefftk.com)

Monthly Roundup #32: July 2025

Zvi21 Jul 2025 12:00 UTC

41 points

10 comments37 min readLW link

(thezvi.wordpress.com)

Reasons to vote in non-deterministic elections

B Jacobs21 Jul 2025 11:09 UTC

8 points

1 comment8 min readLW link

(bobjacobs.substack.com)

Creative writing with LLMs, part 1: Prompting for fiction

Kaj_Sotala21 Jul 2025 8:47 UTC

39 points

10 comments20 min readLW link

Just Make a New Rule!

Zack_M_Davis21 Jul 2025 5:54 UTC

9 points

25 comments4 min readLW link

[Fiction] Our Trial

Nina Panickssery21 Jul 2025 3:56 UTC

73 points

1 comment3 min readLW link

(ninapanickssery.substack.com)

My First Month with Math Academy: An Experience Report from a Middle School Dropout.

L.M.Sherlock21 Jul 2025 3:18 UTC

5 points

0 comments29 min readLW link

(lmsherlock.substack.com)

AI Safety course intro blog

Boaz Barak21 Jul 2025 2:35 UTC

18 points

0 comments1 min readLW link

(windowsontheory.org)

An Outsider’s Roadmap into AI Safety Research (2025)

Luis M. Montoya21 Jul 2025 2:03 UTC

9 points

4 comments10 min readLW link

[Question] Help me learn more about AI

Mark Tranter21 Jul 2025 1:49 UTC

1 point

0 comments1 min readLW link

Unbounded Embedded Agency: AEDT w.r.t. rOSI

Cole Wyeth20 Jul 2025 23:46 UTC

36 points

0 comments16 min readLW link

AI-Oriented Investments

PeterMcCluskey20 Jul 2025 21:31 UTC

30 points

0 comments1 min readLW link

(bayesianinvestor.com)

On The Shoulders of Substrates—how one phenomenon lays the foundation for the next

James Stephen Brown20 Jul 2025 21:11 UTC

14 points

1 comment3 min readLW link

(nonzerosum.games)

Life of Posts?

jmh20 Jul 2025 21:04 UTC

10 points

3 comments1 min readLW link

LLMs Can’t See Pixels or Characters

Brendan Long20 Jul 2025 20:00 UTC

100 points

44 comments4 min readLW link

(www.brendanlong.com)