16 Jul 2025 22:24 UTC

36 points

2 comments3 min readLW link

(cognition.cafe)

Emergence vs Entropy—a universal paradox

James Stephen Brown16 Jul 2025 21:31 UTC

4 points

0 comments4 min readLW link

Selective Generalization: Improving Capabilities While Maintaining Alignment

ariana_azarbal, Matthew A. Clarke, Jorio Cocola, Cailley Factor and cloud

16 Jul 2025 21:25 UTC

71 points

6 comments7 min readLW link

Bodydouble / Thinking Assistant matchmaking

Raemon16 Jul 2025 19:54 UTC

51 points

10 comments2 min readLW link

Zero sum expectations as an explanation of omnicide-indifference

asasz16 Jul 2025 19:25 UTC

2 points

6 comments2 min readLW link

On the geometrical Nature of Insight

Giuseppe Birardi16 Jul 2025 19:12 UTC

6 points

0 comments41 min readLW link

Vancouver Rationalists/Transhumanists/Futurists Beach Meetup

apocalypticc16 Jul 2025 19:09 UTC

2 points

0 comments1 min readLW link

What is the probability that future AI development will be seriously delayed or ended due to energy decline ?

AdamLacerdo16 Jul 2025 19:08 UTC

−1 points

12 comments1 min readLW link

Rebooting the Singularity

cdkg16 Jul 2025 18:26 UTC

8 points

0 comments1 min readLW link

(philpapers.org)

Being and Existence

Gordon Seidoh Worley16 Jul 2025 18:10 UTC

7 points

0 comments3 min readLW link

(uncertainupdates.substack.com)

Kimi K2

Zvi16 Jul 2025 16:20 UTC

54 points

5 comments12 min readLW link

(thezvi.wordpress.com)

[Question] How should Canada Negotiate with Trump on Tariffs?

Davey16 Jul 2025 15:56 UTC

1 point

2 comments1 min readLW link

[Question] Why haven’t we auto-translated all AI alignment content?

Algon16 Jul 2025 15:33 UTC

22 points

10 comments1 min readLW link

A Hallucination Filter Idea That Might Not Scale—Yet

8harath16 Jul 2025 14:40 UTC

−5 points

0 comments2 min readLW link

On being sort of back and sort of new here

Loki zen16 Jul 2025 12:55 UTC

32 points

13 comments3 min readLW link

Conway’s Game of Life—complexity emerges from simplicity

James Stephen Brown16 Jul 2025 4:42 UTC

3 points

0 comments2 min readLW link

(nonzerosum.games)

Emergent Price-Fixing by LLM Auction Agents

Lech Mazur16 Jul 2025 2:45 UTC

13 points

0 comments9 min readLW link

Mapping Mental Moves

Jordan Rubin16 Jul 2025 2:28 UTC

3 points

0 comments2 min readLW link

(jordanmrubin.substack.com)

Defining Monitorable and Useful Goals

Rubi J. Hudson15 Jul 2025 23:06 UTC

15 points

0 comments16 min readLW link

[Question] Do you have any recommendations for readings on global risk forecasting and analysis applied to public policy design on a slightly smaller scale, or for more specific objectives?

Ana Lopez15 Jul 2025 22:00 UTC

1 point

0 comments1 min readLW link

1 week fast on livestream for AI xrisk

samuelshadrach15 Jul 2025 21:36 UTC

1 point

2 comments1 min readLW link

AISN #59: EU Publishes General-Purpose AI Code of Practice

Corin Katzke and Dan H

15 Jul 2025 18:59 UTC

10 points

0 comments4 min readLW link

(aisafety.substack.com)

Principles for Picking Practical Interpretability Projects

Sam Marks15 Jul 2025 17:38 UTC

33 points

0 comments13 min readLW link

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety

Tomek Korbak, Mikita Balesni, Vlad Mikulik and Rohin Shah

15 Jul 2025 16:23 UTC

167 points

32 comments1 min readLW link

(bit.ly)

The Virtue of Fear and the Myth of “Fearlessness”

David_Veksler15 Jul 2025 16:10 UTC

7 points

3 comments1 min readLW link

Grok 4 Various Things

Zvi15 Jul 2025 15:50 UTC

52 points

4 comments32 min readLW link

(thezvi.wordpress.com)

Value systems of the frontier AIs, reduced to slogans

Mitchell_Porter15 Jul 2025 15:10 UTC

4 points

0 comments1 min readLW link

What is David Chapman talking about when he talks about “meaning” in his book “Meaningness”?

SpectrumDT15 Jul 2025 14:29 UTC

22 points

15 comments2 min readLW link

Why Eliminating Deception Won’t Align AI

Priyanka Bharadwaj15 Jul 2025 9:21 UTC

19 points

6 comments4 min readLW link

Generalizing zombie arguments

jessicata15 Jul 2025 5:09 UTC

25 points

9 comments7 min readLW link

(unstableontology.com)

Do confident short timelines make sense?

TsviBT and abramdemski

15 Jul 2025 3:37 UTC

140 points

76 comments69 min readLW link

Critic Contributions Are Logically Irrelevant

Zack_M_Davis15 Jul 2025 1:03 UTC

27 points

74 comments6 min readLW link

AISafety.com Hackathon 2025

Bryce Robertson15 Jul 2025 0:04 UTC

12 points

0 comments1 min readLW link

Don’t Say “I Want to Work In AI Policy”

henryj14 Jul 2025 23:19 UTC

7 points

0 comments2 min readLW link

(www.henryjosephson.com)

Recent Redwood Research project proposals

ryan_greenblatt, Buck, Julian Stastny, joshc, Alex Mallen, Adam Kaufman , Tyler Tracy, Aryan Bhatt and Joey Yudelson

14 Jul 2025 22:27 UTC

91 points

0 comments3 min readLW link

The Role of Respect: Why we inevitably appeal to authority

jimmy14 Jul 2025 21:28 UTC

18 points

2 comments12 min readLW link

Making Sense of Consciousness Part 3: The Pulvinar Nucleus

sarahconstantin14 Jul 2025 21:20 UTC

14 points

0 comments10 min readLW link

(sarahconstantin.substack.com)

LLM-induced craziness and base rates

Kaj_Sotala14 Jul 2025 21:16 UTC

70 points

2 comments2 min readLW link

(andymasley.substack.com)

Narrow Misalignment is Hard, Emergent Misalignment is Easy

Edward Turner, Anna Soligo, Senthooran Rajamanoharan and Neel Nanda

14 Jul 2025 21:05 UTC

134 points

24 comments5 min readLW link

What do you Want out of Literature Reviews?

Elizabeth14 Jul 2025 20:20 UTC

25 points

4 comments4 min readLW link

(acesounderglass.com)

The Three Ideological Stances

PranavG and Gabriel Alfour

14 Jul 2025 20:14 UTC

8 points

0 comments3 min readLW link

(cognition.cafe)

Visualizing AI Alignment – CFP for AGI-2025 Workshop (Aug 10, Live + Virtual)

CC4CI14 Jul 2025 20:12 UTC

9 points

0 comments4 min readLW link

[Question] Is the political right becoming actively, explicitly antisemitic?

lc14 Jul 2025 18:57 UTC

28 points

16 comments1 min readLW link

Weird Features in Protein LLMs: The Gram Lens

Jude Stiel14 Jul 2025 17:32 UTC

11 points

0 comments9 min readLW link

METR: How Does Time Horizon Vary Across Domains?

Thomas Kwa and Vincent Cheng

14 Jul 2025 16:13 UTC

88 points

8 comments14 min readLW link

(metr.org)

Worse Than MechaHitler

Zvi14 Jul 2025 16:00 UTC

56 points

1 comment22 min readLW link

(thezvi.wordpress.com)

How To Cause Less Suffering While Eating Animals

Bentham's Bulldog14 Jul 2025 15:59 UTC

11 points

3 comments4 min readLW link

Self-preservation or Instruction Ambiguity? Examining the Causes of Shutdown Resistance

Senthooran Rajamanoharan and Neel Nanda

14 Jul 2025 14:52 UTC

82 points

19 comments11 min readLW link

Bernie Sanders (I-VT) mentions AI loss of control risk in Gizmodo interview

Matrice Jacobine14 Jul 2025 14:47 UTC

49 points

2 comments1 min readLW link

(gizmodo.com)

Arrow theorem is an artifact of ordinal preferences

Arturo Macias14 Jul 2025 14:08 UTC

7 points

4 comments4 min readLW link