All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025

All Jan Feb Mar Apr May Jun JulAugSep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 242526 27 28 29 30 31

AXRP Episode 35 - Peter Hase on LLM Beliefs and Easy-to-Hard Generalization

DanielFilan24 Aug 2024 22:30 UTC

21 points

0 comments74 min readLW link

The top 30 books to expand the capabilities of AI: a biased reading list

Jonathan Mugan24 Aug 2024 21:48 UTC

−6 points

0 comments16 min readLW link

The Ap Distribution

criticalpoints24 Aug 2024 21:45 UTC

22 points

8 comments3 min readLW link

(eregis.github.io)

What is it to solve the alignment problem? (Notes)

Joe Carlsmith24 Aug 2024 21:19 UTC

67 points

18 comments53 min readLW link

Examine self modification as an intuition provider for the concept of consciousness

Canaletto24 Aug 2024 20:48 UTC

−4 points

2 comments10 min readLW link

[Question] Looking to interview AI Safety researchers for a book

jeffreycaruso24 Aug 2024 19:57 UTC

14 points

0 comments1 min readLW link

Perplexity wins my AI race

Elizabeth24 Aug 2024 19:20 UTC

107 points

12 comments10 min readLW link

(acesounderglass.com)

Why should anyone boot you up?

onur24 Aug 2024 17:51 UTC

−1 points

5 comments3 min readLW link

(solmaz.io)

Understanding Hidden Computations in Chain-of-Thought Reasoning

rokosbasilisk24 Aug 2024 16:35 UTC

6 points

1 comment1 min readLW link

August 2024 Time Tracking

jefftk24 Aug 2024 13:50 UTC

22 points

0 comments3 min readLW link

(www.jefftk.com)

Training a Sparse Autoencoder in < 30 minutes on 16GB of VRAM using an S3 cache

Louka Ewington-Pitsos24 Aug 2024 7:39 UTC

17 points

0 comments5 min readLW link

[Question] Looking for intuitions to extend bargaining notions

ProgramCrafter24 Aug 2024 5:00 UTC

13 points

0 comments1 min readLW link

Owain Evans on Situational Awareness and Out-of-Context Reasoning in LLMs

Michaël Trazzi24 Aug 2024 4:30 UTC

55 points

0 comments5 min readLW link

[Question] Developing Positive Habits through Video Games

pzas24 Aug 2024 3:47 UTC

1 point

5 comments1 min readLW link

“Can AI Scaling Continue Through 2030?”, Epoch AI (yes)

gwern24 Aug 2024 1:40 UTC

135 points

4 comments3 min readLW link

(epochai.org)

What’s important in “AI for epistemics”?

Lukas Finnveden24 Aug 2024 1:27 UTC

50 points

2 comments28 min readLW link

(www.forethought.org)

Showing SAE Latents Are Not Atomic Using Meta-SAEs

Bart Bussmann, Michael Pearce, Patrick Leask, Joseph Bloom, Lee Sharkey and Neel Nanda

24 Aug 2024 0:56 UTC

72 points

10 comments20 min readLW link

Using ideologically-charged language to get gpt-3.5-turbo to disobey it’s system prompt: a demo

Milan W24 Aug 2024 0:13 UTC

3 points

0 comments6 min readLW link

Crafting Polysemantic Transformer Benchmarks with Known Circuits

Evan Anders and Adrià Garriga-alonso

23 Aug 2024 22:03 UTC

17 points

0 comments25 min readLW link

[Question] What is an appropriate sample size when surveying billions of data points?

Blake23 Aug 2024 21:54 UTC

1 point

2 comments1 min readLW link

Interpretability as Compression: Reconsidering SAE Explanations of Neural Activations with MDL-SAEs

Kola Ayonrinde, Michael Pearce and Lee Sharkey

23 Aug 2024 18:52 UTC

42 points

8 comments16 min readLW link

How I started believing religion might actually matter for rationality and moral philosophy

zhukeepa23 Aug 2024 17:40 UTC

129 points

41 comments7 min readLW link

[Question] What do you expect AI capabilities may look like in 2028?

nonzerosum23 Aug 2024 16:59 UTC

9 points

5 comments1 min readLW link

Invitation to lead a project at AI Safety Camp (Virtual Edition, 2025)

Linda Linsefors, Remmelt Ellen and Robert Kralisch

23 Aug 2024 14:18 UTC

17 points

2 comments4 min readLW link

If we solve alignment, do we die anyway?

Seth Herd23 Aug 2024 13:13 UTC

78 points

130 comments4 min readLW link

What’s going on with Per-Component Weight Updates?

4gate22 Aug 2024 21:22 UTC

1 point

0 comments6 min readLW link

Interoperable High Level Structures: Early Thoughts on Adjectives

johnswentworth and David Lorell

22 Aug 2024 21:12 UTC

55 points

2 comments7 min readLW link

Interest poll: A time-waster blocker for desktop Linux programs

nahoj22 Aug 2024 20:44 UTC

4 points

5 comments1 min readLW link

Turning 22 in the Pre-Apocalypse

testingthewaters22 Aug 2024 20:28 UTC

37 points

14 comments24 min readLW link

(utilityhotbar.github.io)

A Robust Natural Latent Over A Mixed Distribution Is Natural Over The Distributions Which Were Mixed

johnswentworth and David Lorell

22 Aug 2024 19:19 UTC

42 points

4 comments4 min readLW link

A primer on the current state of longevity research

Abhishaike Mahajan22 Aug 2024 17:14 UTC

110 points

8 comments14 min readLW link

(www.owlposting.com)

Some reasons to start a project to stop harmful AI

Remmelt22 Aug 2024 16:23 UTC

5 points

0 comments2 min readLW link

The economics of space tethers

harsimony22 Aug 2024 16:15 UTC

67 points

22 comments7 min readLW link

(splittinginfinity.substack.com)

Dima’s Shortform

Dmitrii Krasheninnikov22 Aug 2024 14:49 UTC

3 points

0 comments1 min readLW link

AI #78: Some Welcome Calm

Zvi22 Aug 2024 14:20 UTC

61 points

15 comments33 min readLW link

(thezvi.wordpress.com)

[Question] How do we know dreams aren’t real?

Logan Zoellner22 Aug 2024 12:41 UTC

5 points

31 comments1 min readLW link

Measuring Structure Development in Algorithmic Transformers

Micurie and Einar Urdshals

22 Aug 2024 8:38 UTC

56 points

4 comments11 min readLW link

Deception and Jailbreak Sequence: 1. Iterative Refinement Stages of Deception in LLMs

Winnie Yang and Jojo Yang

22 Aug 2024 7:32 UTC

23 points

1 comment21 min readLW link

Just because an LLM said it doesn’t mean it’s true: an illustrative example

dirk21 Aug 2024 21:05 UTC

26 points

12 comments3 min readLW link

[Question] How do you finish your tasks faster?

Cipolla21 Aug 2024 20:01 UTC

4 points

2 comments1 min readLW link

AI Safety Newsletter #40: California AI Legislation Plus, NVIDIA Delays Chip Production, and Do AI Safety Benchmarks Actually Measure Safety?

Corin Katzke, Julius, Alexa Pan and Dan H

21 Aug 2024 18:09 UTC

11 points

0 comments6 min readLW link

(newsletter.safe.ai)

[Question] Should LW suggest standard metaprompts?

Dagon21 Aug 2024 16:41 UTC

3 points

6 comments1 min readLW link

Eternal Existence and Eternal Boredom: The Case for AI and Immortal Humans

Tuan Tu Nguyen21 Aug 2024 9:58 UTC

−12 points

2 comments5 min readLW link

Please do not use AI to write for you

Richard_Kennaway21 Aug 2024 9:53 UTC

70 points

34 comments4 min readLW link

Apply to Aether—Independent LLM Agent Safety Research Group

RohanS21 Aug 2024 9:47 UTC

13 points

0 comments7 min readLW link

(forum.effectivealtruism.org)

the Giga Press was a mistake

bhauth21 Aug 2024 4:51 UTC

100 points

26 comments5 min readLW link

(bhauth.com)

Exploring the Boundaries of Cognitohazards and the Nature of Reality

Victor Novikov21 Aug 2024 3:42 UTC

−2 points

2 comments1 min readLW link

[Question] What is the point of 2v2 debates?

Axel Ahlqvist20 Aug 2024 21:59 UTC

2 points

1 comment1 min readLW link

[Question] Where should I look for information on gut health?

FinalFormal220 Aug 2024 19:44 UTC

10 points

10 comments1 min readLW link

Would you benefit from, or object to, a page with LW users’ reacts?

Raemon20 Aug 2024 16:35 UTC

23 points

6 comments1 min readLW link