Some perspectives on the discipline of Physics

Tahp20 May 2024 18:19 UTC

12 points

2 comments13 min readLW link

(quark.rodeo)

Interpretability: Integrated Gradients is a decent attribution method

Lucius Bushnaq, jake_mendel, StefanHex and Kaarel

20 May 2024 17:55 UTC

9 points

3 comments6 min readLW link

The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks

Lucius Bushnaq, jake_mendel, Dan Braun, StefanHex, Nicholas Goldowsky-Dill, Kaarel, Avery, Joern Stoehler, debrevitatevitae, Magdalena Wache and Marius Hobbhahn

20 May 2024 17:53 UTC

60 points

1 comment3 min readLW link

Infra-Bayesian haggling

hannagabor20 May 2024 12:23 UTC

10 points

0 comments20 min readLW link

Jaan Tallinn’s 2023 Philanthropy Overview

jaan20 May 2024 12:11 UTC

102 points

2 comments1 min readLW link

(jaan.info)

D&D.Sci (Easy Mode): On The Construction Of Impossible Structures [Evaluation and Ruleset]

abstractapplic20 May 2024 9:38 UTC

22 points

1 comment1 min readLW link

Why I find Davidad’s plan interesting

Paul W20 May 2024 8:13 UTC

17 points

0 comments6 min readLW link

Anthropic: Reflections on our Responsible Scaling Policy

Zac Hatfield-Dodds20 May 2024 4:14 UTC

45 points

11 comments10 min readLW link

(www.anthropic.com)

The consistent guessing problem is easier than the halting problem

jessicata20 May 2024 4:02 UTC

28 points

5 comments4 min readLW link

(unstableontology.com)

Against Computers (infinite play)

rogersbacon20 May 2024 0:43 UTC

−12 points

0 comments14 min readLW link

(www.secretorum.life)

[Question] Can environmental laws/NEPA be used for decelism?

Alex K. Chen (parrot)19 May 2024 18:43 UTC

−4 points

0 comments1 min readLW link

Testing for parallel reasoning in LLMs

meemi and Olli Järviniemi

19 May 2024 15:28 UTC

2 points

7 comments9 min readLW link

Some “meta-cruxes” for AI x-risk debates

Aryeh Englander19 May 2024 0:21 UTC

14 points

2 comments3 min readLW link

On Privilege

shminux18 May 2024 22:36 UTC

15 points

10 comments2 min readLW link

To Limit Impact, Limit KL-Divergence

J Bostock18 May 2024 18:52 UTC

7 points

1 comment5 min readLW link

[Crosspost] Introducing the Save State Paradox

Suzie. EXE18 May 2024 17:00 UTC

−1 points

0 comments7 min readLW link

Scientific Notation Options

jefftk18 May 2024 15:10 UTC

23 points

10 comments1 min readLW link

(www.jefftk.com)

“If we go extinct due to misaligned AI, at least nature will continue, right? … right?”

plex18 May 2024 14:09 UTC

46 points

23 comments2 min readLW link

(aisafety.info)

What Are Non-Zero-Sum Games?—A Primer

James Stephen Brown18 May 2024 9:19 UTC

4 points

1 comment3 min readLW link

DeepMind’s “Frontier Safety Framework” is weak and unambitious

Zach Stein-Perlman18 May 2024 3:00 UTC

141 points

13 comments4 min readLW link