6 Dec 2024 22:56 UTC

18 points

0 comments16 min readLW link

Gradient Routing: Masking Gradients to Localize Computation in Neural Networks

cloud, Jacob G-W, Evzen, Joseph Miller and TurnTrout

6 Dec 2024 22:19 UTC

179 points

16 comments11 min readLW link 1 review

(arxiv.org)

Understanding Shapley Values with Venn Diagrams

Carson L6 Dec 2024 21:56 UTC

221 points

40 comments4 min readLW link 1 review

(medium.com)

Model Integrity

ryan.lowe, Oliver Klingefjord and Joe Edelman

6 Dec 2024 21:28 UTC

4 points

1 comment18 min readLW link

Can AI improve the current state of molecular simulation?

Abhishaike Mahajan6 Dec 2024 20:22 UTC

5 points

0 comments1 min readLW link

(www.owlposting.com)

Low Temperature Solomonoff Induction

dil-leik-og6 Dec 2024 18:55 UTC

10 points

4 comments11 min readLW link

Experiments are in the territory, results are in the map

Tahp6 Dec 2024 15:44 UTC

5 points

1 comment6 min readLW link

A car journey with conservative evangelicals—Understanding some British political-religious beliefs

Nathan Young6 Dec 2024 11:22 UTC

41 points

8 comments6 min readLW link

(nathanpmyoung.substack.com)

Frontier Models are Capable of In-context Scheming

Marius Hobbhahn, AlexMeinke, Bronson Schoen, rusheb, Jérémy Scheurer and Mikita Balesni

5 Dec 2024 22:11 UTC

211 points

24 comments7 min readLW link

Should you be worried about H5N1?

gw5 Dec 2024 21:11 UTC

89 points

2 comments5 min readLW link

(www.georgeyw.com)

o1 tried to avoid being shut down

Raelifin5 Dec 2024 19:52 UTC

10 points

5 comments1 min readLW link

(www.transformernews.ai)

More Growth, Melancholy, and MindCraft @3QD [revised and updated]

Bill Benzon5 Dec 2024 19:36 UTC

4 points

0 comments4 min readLW link

Expevolu, a laissez-faire approach to country creation

Fernando5 Dec 2024 19:29 UTC

5 points

4 comments44 min readLW link

(expevolu.substack.com)

Are SAE features from the Base Model still meaningful to LLaVA?

Shan23Chen5 Dec 2024 19:24 UTC

5 points

2 comments10 min readLW link

OpenAI o1 + ChatGPT Pro release

anaguma5 Dec 2024 19:13 UTC

5 points

0 comments1 min readLW link

(openai.com)

Smart people should do biology

Haotian5 Dec 2024 19:11 UTC

13 points

2 comments3 min readLW link

Announcement: AI for Math Fund

sarahconstantin5 Dec 2024 18:33 UTC

20 points

9 comments2 min readLW link

(renaissancephilanthropy.org)

Detection of Asymptomatically Spreading Pathogens

jefftk5 Dec 2024 18:20 UTC

46 points

10 comments7 min readLW link

(www.jefftk.com)

Model Integrity: MAI on Value Alignment

Jonas Hallgren5 Dec 2024 17:11 UTC

6 points

11 comments1 min readLW link

(meaningalignment.substack.com)

Social Science in its epistemological context

Arturo Macias5 Dec 2024 16:12 UTC

3 points

0 comments1 min readLW link

(www.theseedsofscience.pub)

Higher and lower pleasures

Chris_Leong5 Dec 2024 13:13 UTC

19 points

3 comments1 min readLW link

Sam Harris’s Argument For Objective Morality

Zero Contradictions5 Dec 2024 10:19 UTC

7 points

5 comments1 min readLW link

(thewaywardaxolotl.blogspot.com)

Morality as Cooperation Part III: Failure Modes

DeLesley Hutchins5 Dec 2024 9:39 UTC

4 points

0 comments20 min readLW link

Morality as Cooperation Part II: Theory and Experiment

DeLesley Hutchins5 Dec 2024 9:04 UTC

2 points

0 comments17 min readLW link

Morality as Cooperation Part I: Humans

DeLesley Hutchins5 Dec 2024 8:16 UTC

5 points

0 comments19 min readLW link

I Finally Worked Through Bayes’ Theorem (Personal Achievement)

keltan5 Dec 2024 2:04 UTC

53 points

7 comments9 min readLW link

The Dream Machine

sarahconstantin5 Dec 2024 0:00 UTC

117 points

6 comments12 min readLW link

(sarahconstantin.substack.com)

Should you have children? A decision framework for a crucial life choice that affects yourself, your child and the world

Sherrinford4 Dec 2024 23:14 UTC

0 points

1 comment20 min readLW link

CCing Mailing Lists on External Communication

jefftk4 Dec 2024 22:00 UTC

9 points

0 comments1 min readLW link

(www.jefftk.com)

Picking favourites is hard

dkl94 Dec 2024 20:46 UTC

11 points

3 comments1 min readLW link

(dkl9.net)

[Question] How can I convince my cryptobro friend that S&P500 is efficient?

AhmedNeedsATherapist4 Dec 2024 20:04 UTC

−7 points

10 comments1 min readLW link

The 2023 LessWrong Review: The Basic Ask

Raemon4 Dec 2024 19:52 UTC

78 points

25 comments9 min readLW link

Is the AI Doomsday Narrative the Product of a Big Tech Conspiracy?

garrison4 Dec 2024 19:20 UTC

35 points

1 comment11 min readLW link

(garrisonlovely.substack.com)

[Question] AI box question

KvmanThinking4 Dec 2024 19:03 UTC

2 points

2 comments1 min readLW link

The Polite Coup

Charlie Sanders4 Dec 2024 14:03 UTC

3 points

0 comments3 min readLW link

(www.dailymicrofiction.com)

Analysis of Global AI Governance Strategies

Sammy Martin, Justin Bullock and Corin Katzke

4 Dec 2024 10:45 UTC

53 points

10 comments36 min readLW link

[Question] Cryonics considerations: how big of a problem is ischemia?

kman4 Dec 2024 4:45 UTC

8 points

1 comment1 min readLW link

AI #93: Happy Tuesday

Zvi4 Dec 2024 0:30 UTC

26 points

2 comments23 min readLW link

(thezvi.wordpress.com)

A Qualitative Case for LTFF: Filling Critical Ecosystem Gaps

Linch3 Dec 2024 21:57 UTC

64 points

2 comments9 min readLW link

Deep Causal Transcoding: A Framework for Mechanistically Eliciting Latent Behaviors in Language Models

Andrew Mack and TurnTrout

3 Dec 2024 21:19 UTC

109 points

8 comments41 min readLW link

“Alignment at Large”: Bending the Arc of History Towards Life-Affirming Futures

welfvh3 Dec 2024 21:17 UTC

6 points

0 comments4 min readLW link

Roots of Progress is hiring an event manager

jasoncrawford3 Dec 2024 20:46 UTC

10 points

0 comments7 min readLW link

(rootsofprogress.notion.site)

Do simulacra dream of digital sheep?

EuanMcLean3 Dec 2024 20:25 UTC

16 points

36 comments10 min readLW link

Orca communication project—seeking feedback (and collaborators)

Towards_Keeperhood3 Dec 2024 17:29 UTC

38 points

16 comments2 min readLW link

Book a Time to Chat about Interp Research

Logan Riggs3 Dec 2024 17:27 UTC

47 points

3 comments1 min readLW link

Balsa Research 2024 Update

Zvi3 Dec 2024 12:30 UTC

21 points

0 comments5 min readLW link

(thezvi.wordpress.com)

First Solo Bus Ride

jefftk3 Dec 2024 12:20 UTC

28 points

1 comment1 min readLW link

(www.jefftk.com)

How to make evals for the AISI evals bounty

TheManxLoiner3 Dec 2024 10:44 UTC

9 points

0 comments5 min readLW link

Should there be just one western AGI project?

rosehadshar and Tom Davidson

3 Dec 2024 10:11 UTC

78 points

75 comments15 min readLW link

(www.forethought.org)

Cognitive Biases Contributing to AI X-risk — a deleted excerpt from my 2018 ARCHES draft

Andrew_Critch3 Dec 2024 9:29 UTC

48 points

2 comments5 min readLW link