All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025 2026

All Jan Feb Mar Apr May JunJulAug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 222324 25 26 27 28 29 30 31

Ransomware Payments Should Require a Sin Tax

Brian Bien22 Jul 2024 21:16 UTC

20 points

10 comments2 min readLW link

The Elusive Root Cause of Schizophrenia—Thesis Introduction Only

kareempforbes22 Jul 2024 20:24 UTC

−8 points

0 comments2 min readLW link

Is Chinese AGI a valid concern for the USA?

sammyboiz22 Jul 2024 20:21 UTC

0 points

2 comments9 min readLW link

Trying to understand Hanson’s Cultural Drift argument

Kemp22 Jul 2024 20:20 UTC

21 points

6 comments2 min readLW link

Efficient Dictionary Learning with Switch Sparse Autoencoders

Anish Mudide22 Jul 2024 18:45 UTC

118 points

20 comments12 min readLW link

Analyzing DeepMind’s Probabilistic Methods for Evaluating Agent Capabilities

Axel Højmark, fidgetsinner, Arjun Panickssery, Marius Hobbhahn and Jérémy Scheurer

22 Jul 2024 16:17 UTC

69 points

0 comments16 min readLW link

The Garden of Eden

Alexander Turok22 Jul 2024 16:07 UTC

23 points

2 comments9 min readLW link

Caring about excellence

owencb22 Jul 2024 14:24 UTC

47 points

5 comments6 min readLW link 1 review

Tim Dillon’s fake business altered my perspective more significantly than any other video I have watched in the last 24 months

Stuart Johnson22 Jul 2024 12:54 UTC

6 points

0 comments1 min readLW link

(youtu.be)

On the CrowdStrike Incident

Zvi22 Jul 2024 12:40 UTC

75 points

14 comments17 min readLW link

(thezvi.wordpress.com)

Auto-Enhance: Developing a meta-benchmark to measure LLM agents’ ability to improve other agents

Sam F. Brown, BasilLabib, Codruta (Coco) Lugoj and Sai Sasank Y

22 Jul 2024 12:33 UTC

20 points

0 comments14 min readLW link

What does “the universe is quantum” actually mean?

Tahp22 Jul 2024 11:52 UTC

2 points

0 comments14 min readLW link

Initial Experiments Using SAEs to Help Detect AI Generated Text

Aaron_Scher22 Jul 2024 5:16 UTC

18 points

1 comment14 min readLW link

Categories of leadership on technical teams

benkuhn22 Jul 2024 4:50 UTC

43 points

0 comments8 min readLW link

(www.benkuhn.net)

An experiment on hidden cognition

Olli Järviniemi22 Jul 2024 3:26 UTC

25 points

2 comments7 min readLW link

OpenAI Boycott Revisit

Jake Dennie-Lu22 Jul 2024 1:44 UTC

17 points

2 comments2 min readLW link

Coalitional agency

Richard_Ngo22 Jul 2024 0:09 UTC

61 points

6 comments6 min readLW link

The AI Driver’s Licence—A Policy Proposal

Joshua W and Tessa Malan

21 Jul 2024 20:38 UTC

0 points

1 comment19 min readLW link

Demography and Destiny

Zero Contradictions21 Jul 2024 20:34 UTC

6 points

11 comments1 min readLW link

(thewaywardaxolotl.blogspot.com)

The $100B plan with “70% risk of killing us all” w Stephen Fry [video]

Oleg Trott21 Jul 2024 20:06 UTC

35 points

8 comments1 min readLW link

(www.youtube.com)

Raising Welfare for Lab Rodents

xanderbalwit21 Jul 2024 19:18 UTC

−2 points

0 comments1 min readLW link

(press.asimov.com)

A simple model of math skill

Alex_Altair21 Jul 2024 18:57 UTC

107 points

17 comments8 min readLW link

Using an LLM perplexity filter to detect weight exfiltration

Adam Karvonen21 Jul 2024 18:18 UTC

25 points

11 comments2 min readLW link

[Question] Would a scope-insensitive AGI be less likely to incapacitate humanity?

Jim Buhler21 Jul 2024 14:15 UTC

2 points

3 comments1 min readLW link

Holomorphic surjection theorem (Picard’s little theorem)

dkl921 Jul 2024 13:24 UTC

17 points

0 comments2 min readLW link

(dkl9.net)

aimless ace analyzes active amateur: a micro-aaaaalignment proposal

lemonhope21 Jul 2024 12:37 UTC

13 points

0 comments1 min readLW link

Pivotal Acts are easier than Alignment?

Michael Soareverix21 Jul 2024 12:15 UTC

2 points

4 comments1 min readLW link

Ball Sq Pathways

jefftk21 Jul 2024 2:20 UTC

13 points

1 comment1 min readLW link

(www.jefftk.com)

Freedom and Privacy of Thought Architectures

SebastianG 20 Jul 2024 21:43 UTC

5 points

2 comments1 min readLW link

Why Georgism Lost Its Popularity

Zero Contradictions20 Jul 2024 15:08 UTC

49 points

55 comments1 min readLW link

(zerocontradictions.net)

Only Fools Avoid Hindsight Bias

Kevin Dorst20 Jul 2024 13:42 UTC

−11 points

5 comments6 min readLW link

(kevindorst.substack.com)

A more systematic case for inner misalignment

Richard_Ngo20 Jul 2024 5:03 UTC

31 points

4 comments5 min readLW link

BatchTopK: A Simple Improvement for TopK-SAEs

Bart Bussmann, Patrick Leask and Neel Nanda

20 Jul 2024 2:20 UTC

62 points

0 comments4 min readLW link

Krona Compare

jefftk20 Jul 2024 1:10 UTC

10 points

0 comments2 min readLW link

(www.jefftk.com)

(Approximately) Deterministic Natural Latents

johnswentworth and David Lorell

19 Jul 2024 23:02 UTC

45 points

1 comment4 min readLW link

Feature Targeted LLC Estimation Distinguishes SAE Features from Random Directions

Lidor Banuel Dabbah and Aviel Boag

19 Jul 2024 20:32 UTC

59 points

6 comments16 min readLW link

JumpReLU SAEs + Early Access to Gemma 2 SAEs

Senthooran Rajamanoharan, Tom Lieberum, nps29, Arthur Conmy, Vikrant Varma, János Kramár and Neel Nanda

19 Jul 2024 16:10 UTC

55 points

10 comments1 min readLW link

(storage.googleapis.com)

Truth is Universal: Robust Detection of Lies in LLMs

Lennart Buerger19 Jul 2024 14:07 UTC

24 points

4 comments2 min readLW link

(arxiv.org)

Sustainability of Digital Life Form Societies

Hiroshi Yamakawa19 Jul 2024 13:59 UTC

19 points

1 comment20 min readLW link

Romae Industriae

Maxwell Tabarrok19 Jul 2024 13:03 UTC

36 points

2 comments7 min readLW link

(www.maximum-progress.com)

[Question] Have people given up on iterated distillation and amplification?

Chris_Leong19 Jul 2024 12:23 UTC

20 points

1 comment1 min readLW link

How do we know that “good research” is good? (aka “direct evaluation” vs “eigen-evaluation”)

Ruby19 Jul 2024 0:31 UTC

49 points

21 comments6 min readLW link

Linkpost: Surely you can be serious

kave18 Jul 2024 22:18 UTC

64 points

8 comments1 min readLW link

(www.experimental-history.com)

My experience applying to MATS 6.0

mic18 Jul 2024 19:02 UTC

19 points

3 comments5 min readLW link

[Question] What are the actual arguments in favor of computationalism as a theory of identity?

sunwillrise18 Jul 2024 18:44 UTC

16 points

27 comments5 min readLW link

Yet Another Critique of “Luxury Beliefs”

ymeskhout18 Jul 2024 18:37 UTC

6 points

9 comments9 min readLW link

(www.ymeskhout.com)

[Interim research report] Evaluating the Goal-Directedness of Language Models

Rauno Arike, Elizabeth Donoway and Marius Hobbhahn

18 Jul 2024 18:19 UTC

40 points

4 comments11 min readLW link

Interpretability in Action: Exploratory Analysis of VPT, a Minecraft Agent

Karolis Jucys, george_adams and Sonia Joseph

18 Jul 2024 17:02 UTC

9 points

0 comments1 min readLW link

(arxiv.org)

Activation Engineering Theories of Impact

Jakub K. Nowak🔸18 Jul 2024 16:44 UTC

6 points

1 comment2 min readLW link

[Question] Me & My Clone

SimonBaars18 Jul 2024 16:25 UTC

27 points

22 comments1 min readLW link