All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 202120222023 2024 2025

All Jan Feb Mar Apr May Jun Jul Aug Sep Oct NovDec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 3031

A few thoughts on my self-study for alignment research

Thomas Kehrenberg30 Dec 2022 22:05 UTC

6 points

0 comments2 min readLW link

Christmas Microscopy

jefftk30 Dec 2022 21:10 UTC

27 points

0 comments1 min readLW link

(www.jefftk.com)

What “upside” of AI?

False Name30 Dec 2022 20:58 UTC

0 points

5 comments4 min readLW link

Evidence on recursive self-improvement from current ML

beren30 Dec 2022 20:53 UTC

31 points

12 comments6 min readLW link

[Question] Is ChatGPT TAI?

Amal 30 Dec 2022 19:44 UTC

14 points

5 comments1 min readLW link

My thoughts on OpenAI’s alignment plan

Orpheus1630 Dec 2022 19:33 UTC

55 points

3 comments20 min readLW link

Beyond Rewards and Values: A Non-dualistic Approach to Universal Intelligence

Akira Pyinya30 Dec 2022 19:05 UTC

10 points

4 comments14 min readLW link

10 Years of LessWrong

SebastianG 30 Dec 2022 17:15 UTC

73 points

2 comments4 min readLW link

Chatbots as a Publication Format

derek shiller30 Dec 2022 14:11 UTC

6 points

6 comments4 min readLW link

Human sexuality as an interesting case study of alignment

beren30 Dec 2022 13:37 UTC

39 points

26 comments3 min readLW link

The Twitter Files: Covid Edition

Zvi30 Dec 2022 13:30 UTC

32 points

2 comments10 min readLW link

(thezvi.wordpress.com)

Worldly Positions archive, briefly with private drafts

KatjaGrace30 Dec 2022 12:20 UTC

11 points

0 comments1 min readLW link

(worldspiritsockpuppet.com)

Models Don’t “Get Reward”

Sam Ringer30 Dec 2022 10:37 UTC

322 points

64 comments5 min readLW link 1 review

The hyperfinite timeline

Alok Singh30 Dec 2022 9:30 UTC

3 points

6 comments1 min readLW link

(alok.github.io)

Reactive devaluation: Bias in Evaluating AGI X-Risks

Remmelt and flandry19

30 Dec 2022 9:02 UTC

−13 points

9 comments1 min readLW link

Things I carry almost every day, as of late December 2022

DanielFilan30 Dec 2022 7:40 UTC

38 points

9 comments5 min readLW link

(danielfilan.com)

More ways to spot abysses

KatjaGrace30 Dec 2022 6:30 UTC

21 points

1 comment1 min readLW link

(worldspiritsockpuppet.com)

Language models are nearly AGIs but we don’t notice it because we keep shifting the bar

philosophybear30 Dec 2022 5:15 UTC

105 points

13 comments7 min readLW link

Progress links and tweets, 2022-12-29

jasoncrawford30 Dec 2022 4:54 UTC

12 points

0 comments1 min readLW link

(rootsofprogress.org)

Announcing The Filan Cabinet

DanielFilan30 Dec 2022 3:10 UTC

21 points

2 comments1 min readLW link

(danielfilan.com)

[Question] Effective Evil Causes?

Ulisse Mini30 Dec 2022 2:56 UTC

−12 points

2 comments1 min readLW link

But is it really in Rome? An investigation of the ROME model editing technique

jacquesthibs30 Dec 2022 2:40 UTC

105 points

2 comments18 min readLW link

A Year of AI Increasing AI Progress

TW12330 Dec 2022 2:09 UTC

148 points

3 comments2 min readLW link

Why not spend more time looking at human alignment?

ajc58630 Dec 2022 0:22 UTC

11 points

3 comments1 min readLW link

Why and how to write things on the Internet

benkuhn29 Dec 2022 22:40 UTC

21 points

2 comments15 min readLW link

(www.benkuhn.net)

Friendly and Unfriendly AGI are Indistinguishable

ErgoEcho29 Dec 2022 22:13 UTC

−4 points

4 comments4 min readLW link

(neologos.co)

200 COP in MI: Looking for Circuits in the Wild

Neel Nanda29 Dec 2022 20:59 UTC

16 points

5 comments13 min readLW link

Thoughts on the implications of GPT-3, two years ago and NOW [here be dragons, we’re swimming, flying and talking with them]

Bill Benzon29 Dec 2022 20:05 UTC

0 points

0 comments5 min readLW link

Covid 12/29/22: Next Up is XBB.1.5

Zvi29 Dec 2022 18:20 UTC

33 points

4 comments10 min readLW link

(thezvi.wordpress.com)

Entrepreneurship ETG Might Be Better Than 80k Thought

Xodarap29 Dec 2022 17:51 UTC

33 points

0 comments2 min readLW link

Internal Interfaces Are a High-Priority Interpretability Target

Thane Ruthenis29 Dec 2022 17:49 UTC

26 points

6 comments7 min readLW link

CFP for Rebellion and Disobedience in AI workshop

Ram Rachum29 Dec 2022 16:08 UTC

15 points

0 comments1 min readLW link

My scorched-earth policy on New Year’s resolutions

PatrickDFarley29 Dec 2022 14:45 UTC

29 points

2 comments4 min readLW link

Don’t feed the void. She is fat enough!

Johannes C. Mayer29 Dec 2022 14:18 UTC

11 points

0 comments1 min readLW link

[Question] Is there any unified resource on Eliezer’s fatigue?

Johannes C. Mayer29 Dec 2022 14:04 UTC

9 points

2 comments1 min readLW link

Logical Probability of Goldbach’s Conjecture: Provable Rule or Coincidence?

avturchin29 Dec 2022 13:37 UTC

5 points

15 comments8 min readLW link

Where do you get your capabilities from?

tailcalled29 Dec 2022 11:39 UTC

38 points

28 comments6 min readLW link

The commercial incentive to intentionally train AI to deceive us

Derek M. Jones29 Dec 2022 11:30 UTC

5 points

1 comment4 min readLW link

(shape-of-code.com)

Infinite necklace: the line as a circle

Alok Singh29 Dec 2022 10:41 UTC

5 points

2 comments1 min readLW link

Privacy Tradeoffs

jefftk29 Dec 2022 3:40 UTC

13 points

1 comment2 min readLW link

(www.jefftk.com)

Against John Searle, Gary Marcus, the Chinese Room thought experiment and its world

philosophybear29 Dec 2022 3:26 UTC

21 points

43 comments8 min readLW link

Large Language Models Suggest a Path to Ems

anithite29 Dec 2022 2:20 UTC

17 points

2 comments5 min readLW link

[Question] Book recommendations for the history of ML?

Eleni Angelou28 Dec 2022 23:50 UTC

2 points

2 comments1 min readLW link

Rock-Paper-Scissors Can Be Weird

winwonce28 Dec 2022 23:12 UTC

14 points

3 comments1 min readLW link

200 COP in MI: The Case for Analysing Toy Language Models

Neel Nanda28 Dec 2022 21:07 UTC

40 points

3 comments7 min readLW link

200 Concrete Open Problems in Mechanistic Interpretability: Introduction

Neel Nanda28 Dec 2022 21:06 UTC

108 points

0 comments10 min readLW link

Effective ways to find love?

anonymoususer28 Dec 2022 20:46 UTC

9 points

6 comments1 min readLW link

Classical logic based on propositions-as-subsingleton-types

Thomas Kehrenberg28 Dec 2022 20:16 UTC

5 points

0 comments16 min readLW link

In Defense of Wrapper-Minds

Thane Ruthenis28 Dec 2022 18:28 UTC

24 points

38 comments3 min readLW link

[Question] What is the best way to approach Expected Value calculations when payoffs are highly skewed?

jmh28 Dec 2022 14:42 UTC

8 points

16 comments1 min readLW link