All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 202120222023 2024 2025 2026

All Jan Feb Mar Apr May Jun Jul Aug Sep Oct NovDec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 252627 28 29 30 31

Concrete Steps to Get Started in Transformer Mechanistic Interpretability

Neel Nanda25 Dec 2022 22:21 UTC

58 points

7 comments12 min readLW link

(www.neelnanda.io)

It’s time to worry about online privacy again

Malmesbury25 Dec 2022 21:05 UTC

71 points

23 comments6 min readLW link

[Hebbian Natural Abstractions] Mathematical Foundations

Samuel Nellessen and Jan

25 Dec 2022 20:58 UTC

15 points

2 comments6 min readLW link

(www.snellessen.com)

[Question] Oracle AGI—How can it escape, other than security issues? (Steganography?)

RationalSieve25 Dec 2022 20:14 UTC

3 points

6 comments1 min readLW link

YCombinator fraud rates

Xodarap25 Dec 2022 19:21 UTC

56 points

3 comments4 min readLW link

How evolutionary lineages of LLMs can plan their own future and act on these plans

Roman Leventov25 Dec 2022 18:11 UTC

40 points

16 comments8 min readLW link

Accurate Models of AI Risk Are Hyperexistential Exfohazards

Thane Ruthenis25 Dec 2022 16:50 UTC

33 points

38 comments9 min readLW link

ChatGPT is our Wright Brothers moment

Ron J25 Dec 2022 16:26 UTC

10 points

9 comments1 min readLW link

The Meditation on Winter

Raemon25 Dec 2022 16:12 UTC

61 points

3 comments3 min readLW link

I’ve updated towards AI boxing being surprisingly easy

Noosphere8925 Dec 2022 15:40 UTC

8 points

20 comments2 min readLW link

Take 14: Corrigibility isn’t that great.

Charlie Steiner25 Dec 2022 13:04 UTC

15 points

3 comments3 min readLW link

Simplified Level Up

jefftk25 Dec 2022 13:00 UTC

12 points

16 comments2 min readLW link

(www.jefftk.com)

Hyperfinite graphs ~ manifolds

Alok Singh25 Dec 2022 12:24 UTC

11 points

5 comments2 min readLW link

Inconsistent math is great

Alok Singh25 Dec 2022 3:20 UTC

1 point

2 comments1 min readLW link

A hundredth of a bit of extra entropy

Adam Scherlis24 Dec 2022 21:12 UTC

84 points

4 comments3 min readLW link

Shared reality: a key driver of human behavior

kdbscott24 Dec 2022 19:35 UTC

136 points

25 comments4 min readLW link

Contra Steiner on Too Many Natural Abstractions

DragonGod24 Dec 2022 17:42 UTC

10 points

6 comments1 min readLW link

Three reasons to cooperate

paulfchristiano24 Dec 2022 17:40 UTC

86 points

14 comments10 min readLW link

(sideways-view.com)

Practical AI risk I: Watching large compute

Gustavo Ramires24 Dec 2022 13:25 UTC

3 points

0 comments1 min readLW link

Non-Elevated Air Purifiers

jefftk24 Dec 2022 12:40 UTC

10 points

2 comments1 min readLW link

(www.jefftk.com)

The Case for Chip-Backed Dollars

AnthonyRepetto24 Dec 2022 10:28 UTC

0 points

1 comment4 min readLW link

List #3: Why not to assume on prior that AGI-alignment workarounds are available

Remmelt24 Dec 2022 9:54 UTC

4 points

1 comment3 min readLW link

List #2: Why coordinating to align as humans to not develop AGI is a lot easier than, well… coordinating as humans with AGI coordinating to be aligned with humans

Remmelt24 Dec 2022 9:53 UTC

1 point

0 comments3 min readLW link

List #1: Why stopping the development of AGI is hard but doable

Remmelt24 Dec 2022 9:52 UTC

6 points

11 comments5 min readLW link

The case against AI alignment

andrew sauer24 Dec 2022 6:57 UTC

127 points

110 comments5 min readLW link

Content and Takeaways from SERI MATS Training Program with John Wentworth

RohanS24 Dec 2022 4:17 UTC

28 points

3 comments12 min readLW link

Löb’s Lemma: an easier approach to Löb’s Theorem

Andrew_Critch24 Dec 2022 2:02 UTC

38 points

17 comments3 min readLW link

Durkon, an open-source tool for Inherently Interpretable Modelling

abstractapplic24 Dec 2022 1:49 UTC

47 points

0 comments4 min readLW link

Issues with uneven AI resource distribution

User_Luke24 Dec 2022 1:18 UTC

3 points

9 comments5 min readLW link

(temporal.substack.com)

Loose Threads on Intelligence

Shoshannah Tekofsky24 Dec 2022 0:38 UTC

11 points

3 comments8 min readLW link

[Question] If you factor out next token prediction, what are the remaining salient features of human cognition?

Shmi24 Dec 2022 0:38 UTC

9 points

7 comments1 min readLW link

[Question] Why is “Argument Mapping” Not More Common in EA/Rationality (And What Objections Should I Address in a Post on the Topic?)

HarrisonDurland23 Dec 2022 21:58 UTC

11 points

5 comments1 min readLW link

The Fear [Fiction]

Yitz23 Dec 2022 21:21 UTC

7 points

0 comments1 min readLW link

To err is neural: select logs with ChatGPT

VipulNaik23 Dec 2022 20:26 UTC

22 points

2 comments38 min readLW link

AISER—AIS Europe Retreat

Carolin23 Dec 2022 19:03 UTC

5 points

0 comments1 min readLW link

Two Truths and a Prediction Market

Screwtape23 Dec 2022 18:52 UTC

24 points

2 comments6 min readLW link

ChatGPT understands, but largely does not generate Spanglish (and other code-mixed) text

Milan W23 Dec 2022 17:40 UTC

15 points

5 comments4 min readLW link

On sincerity

Joe Carlsmith23 Dec 2022 17:13 UTC

79 points

6 comments42 min readLW link

Epigenetics of the mammalian germline

Metacelsus23 Dec 2022 15:21 UTC

37 points

0 comments7 min readLW link

(denovo.substack.com)

Boston Solstice Songs

jefftk23 Dec 2022 13:00 UTC

9 points

0 comments1 min readLW link

(www.jefftk.com)

Are there any reliable CAPTCHAs? Competition for CAPTCHA ideas that AIs can’t solve.

MrThink23 Dec 2022 12:52 UTC

7 points

37 comments1 min readLW link

“Search” is dead. What is the new paradigm?

Shmi23 Dec 2022 10:33 UTC

15 points

9 comments1 min readLW link

Article Review: Discovering Latent Knowledge (Burns, Ye, et al)

Robert_AIZI22 Dec 2022 18:16 UTC

13 points

4 comments6 min readLW link

(aizi.substack.com)

Let’s think about slowing down AI

KatjaGrace22 Dec 2022 17:40 UTC

562 points

182 comments38 min readLW link 3 reviews

(aiimpacts.org)

Some Notes on the mathematics of Toy Autoencoding Problems

carboniferous_umbraculum 22 Dec 2022 17:21 UTC

18 points

1 comment12 min readLW link

December 2022 updates and fundraising

AI Impacts22 Dec 2022 17:20 UTC

39 points

1 comment3 min readLW link

(aiimpacts.org)

Covid 12/22/22: Reevaluating Past Options

Zvi22 Dec 2022 16:50 UTC

30 points

2 comments9 min readLW link

(thezvi.wordpress.com)

China Covid #4

Zvi22 Dec 2022 16:30 UTC

50 points

2 comments11 min readLW link

(thezvi.wordpress.com)

Racing through a minefield: the AI deployment problem

HoldenKarnofsky22 Dec 2022 16:10 UTC

38 points

2 comments13 min readLW link

(www.cold-takes.com)

Lead in Chocolate?

jefftk22 Dec 2022 16:10 UTC

42 points

6 comments2 min readLW link

(www.jefftk.com)