23 Jul 2024 20:15 UTC

27 points

4 comments18 min readLW link

(www.florencehinder.com)

How reasonable is taking extinction risk?

FVelde23 Jul 2024 18:05 UTC

2 points

4 comments4 min readLW link

Unlearning via RMU is mostly shallow

Andy Arditi and bilalchughtai

23 Jul 2024 16:07 UTC

58 points

4 comments6 min readLW link

Monthly Roundup #20: July 2024

Zvi23 Jul 2024 12:50 UTC

33 points

9 comments38 min readLW link

(thezvi.wordpress.com)

Confusing the metric for the meaning: Perhaps correlated attributes are “natural”

NickyP23 Jul 2024 12:43 UTC

33 points

3 comments4 min readLW link

My covid-related beliefs and questions

Severin T. Seehrich23 Jul 2024 3:27 UTC

10 points

3 comments1 min readLW link

[Question] Is there a Schelling point for group house room listings?

NoSignalNoNoise23 Jul 2024 3:03 UTC

4 points

0 comments1 min readLW link

Room Available in Boston Group House

NoSignalNoNoise23 Jul 2024 2:55 UTC

15 points

1 comment1 min readLW link

D&D.Sci Scenario Index

aphyer and abstractapplic

23 Jul 2024 2:00 UTC

78 points

1 comment3 min readLW link 1 review

How to avoid death by AI.

Krantz23 Jul 2024 1:59 UTC

−3 points

13 comments2 min readLW link

Ransomware Payments Should Require a Sin Tax

Brian Bien22 Jul 2024 21:16 UTC

20 points

10 comments2 min readLW link

The Elusive Root Cause of Schizophrenia—Thesis Introduction Only

kareempforbes22 Jul 2024 20:24 UTC

−8 points

0 comments2 min readLW link

Is Chinese AGI a valid concern for the USA?

sammyboiz22 Jul 2024 20:21 UTC

0 points

2 comments9 min readLW link

Trying to understand Hanson’s Cultural Drift argument

Kemp22 Jul 2024 20:20 UTC

21 points

6 comments2 min readLW link

Efficient Dictionary Learning with Switch Sparse Autoencoders

Anish Mudide22 Jul 2024 18:45 UTC

118 points

20 comments12 min readLW link

Analyzing DeepMind’s Probabilistic Methods for Evaluating Agent Capabilities

Axel Højmark, fidgetsinner, Arjun Panickssery, Marius Hobbhahn and Jérémy Scheurer

22 Jul 2024 16:17 UTC

69 points

0 comments16 min readLW link

The Garden of Eden

Alexander Turok22 Jul 2024 16:07 UTC

23 points

2 comments9 min readLW link

Caring about excellence

owencb22 Jul 2024 14:24 UTC

47 points

5 comments6 min readLW link 1 review

Tim Dillon’s fake business altered my perspective more significantly than any other video I have watched in the last 24 months

Stuart Johnson22 Jul 2024 12:54 UTC

6 points

0 comments1 min readLW link

(youtu.be)

On the CrowdStrike Incident

Zvi22 Jul 2024 12:40 UTC

75 points

14 comments17 min readLW link

(thezvi.wordpress.com)

Auto-Enhance: Developing a meta-benchmark to measure LLM agents’ ability to improve other agents

Sam F. Brown, BasilLabib, Codruta (Coco) Lugoj and Sai Sasank Y

22 Jul 2024 12:33 UTC

20 points

0 comments14 min readLW link

What does “the universe is quantum” actually mean?

Tahp22 Jul 2024 11:52 UTC

2 points

0 comments14 min readLW link

Initial Experiments Using SAEs to Help Detect AI Generated Text

Aaron_Scher22 Jul 2024 5:16 UTC

18 points

1 comment14 min readLW link

Categories of leadership on technical teams

benkuhn22 Jul 2024 4:50 UTC

43 points

0 comments8 min readLW link

(www.benkuhn.net)

An experiment on hidden cognition

Olli Järviniemi22 Jul 2024 3:26 UTC

25 points

2 comments7 min readLW link

OpenAI Boycott Revisit

Jake Dennie-Lu22 Jul 2024 1:44 UTC

17 points

2 comments2 min readLW link

Coalitional agency

Richard_Ngo22 Jul 2024 0:09 UTC

61 points

6 comments6 min readLW link

The AI Driver’s Licence—A Policy Proposal

Joshua W and Tessa Malan

21 Jul 2024 20:38 UTC

0 points

1 comment19 min readLW link

Demography and Destiny

Zero Contradictions21 Jul 2024 20:34 UTC

6 points

11 comments1 min readLW link

(thewaywardaxolotl.blogspot.com)

The $100B plan with “70% risk of killing us all” w Stephen Fry [video]

Oleg Trott21 Jul 2024 20:06 UTC

35 points

8 comments1 min readLW link

(www.youtube.com)

Raising Welfare for Lab Rodents

xanderbalwit21 Jul 2024 19:18 UTC

−2 points

0 comments1 min readLW link

(press.asimov.com)

A simple model of math skill

Alex_Altair21 Jul 2024 18:57 UTC

107 points

17 comments8 min readLW link

Using an LLM perplexity filter to detect weight exfiltration

Adam Karvonen21 Jul 2024 18:18 UTC

25 points

11 comments2 min readLW link

[Question] Would a scope-insensitive AGI be less likely to incapacitate humanity?

Jim Buhler21 Jul 2024 14:15 UTC

2 points

3 comments1 min readLW link

Holomorphic surjection theorem (Picard’s little theorem)

dkl921 Jul 2024 13:24 UTC

17 points

0 comments2 min readLW link

(dkl9.net)

aimless ace analyzes active amateur: a micro-aaaaalignment proposal

lemonhope21 Jul 2024 12:37 UTC

13 points

0 comments1 min readLW link

Pivotal Acts are easier than Alignment?

Michael Soareverix21 Jul 2024 12:15 UTC

2 points

4 comments1 min readLW link

Ball Sq Pathways

jefftk21 Jul 2024 2:20 UTC

13 points

1 comment1 min readLW link

(www.jefftk.com)

Freedom and Privacy of Thought Architectures

SebastianG 20 Jul 2024 21:43 UTC

5 points

2 comments1 min readLW link

Why Georgism Lost Its Popularity

Zero Contradictions20 Jul 2024 15:08 UTC

49 points

55 comments1 min readLW link

(zerocontradictions.net)

Only Fools Avoid Hindsight Bias

Kevin Dorst20 Jul 2024 13:42 UTC

−11 points

5 comments6 min readLW link

(kevindorst.substack.com)

A more systematic case for inner misalignment

Richard_Ngo20 Jul 2024 5:03 UTC

31 points

4 comments5 min readLW link

BatchTopK: A Simple Improvement for TopK-SAEs

Bart Bussmann, Patrick Leask and Neel Nanda

20 Jul 2024 2:20 UTC

62 points

0 comments4 min readLW link

Krona Compare

jefftk20 Jul 2024 1:10 UTC

10 points

0 comments2 min readLW link

(www.jefftk.com)

(Approximately) Deterministic Natural Latents

johnswentworth and David Lorell

19 Jul 2024 23:02 UTC

45 points

1 comment4 min readLW link

Feature Targeted LLC Estimation Distinguishes SAE Features from Random Directions

Lidor Banuel Dabbah and Aviel Boag

19 Jul 2024 20:32 UTC

59 points

6 comments16 min readLW link

JumpReLU SAEs + Early Access to Gemma 2 SAEs

Senthooran Rajamanoharan, Tom Lieberum, nps29, Arthur Conmy, Vikrant Varma, János Kramár and Neel Nanda

19 Jul 2024 16:10 UTC

55 points

10 comments1 min readLW link

(storage.googleapis.com)

Truth is Universal: Robust Detection of Lies in LLMs

Lennart Buerger19 Jul 2024 14:07 UTC

24 points

4 comments2 min readLW link

(arxiv.org)

Sustainability of Digital Life Form Societies

Hiroshi Yamakawa19 Jul 2024 13:59 UTC

19 points

1 comment20 min readLW link

Romae Industriae

Maxwell Tabarrok19 Jul 2024 13:03 UTC

36 points

2 comments7 min readLW link

(www.maximum-progress.com)