A few thoughts on my self-study for al­ign­ment research

Thomas Kehrenberg30 Dec 2022 22:05 UTC
6 points
0 comments2 min readLW link

Christ­mas Microscopy

jefftk30 Dec 2022 21:10 UTC
26 points
0 comments1 min readLW link
(www.jefftk.com)

What “up­side” of AI?

False Name30 Dec 2022 20:58 UTC
0 points
5 comments4 min readLW link

Ev­i­dence on re­cur­sive self-im­prove­ment from cur­rent ML

beren30 Dec 2022 20:53 UTC
31 points
12 comments6 min readLW link

[Question] Is ChatGPT TAI?

Amal 30 Dec 2022 19:44 UTC
14 points
5 comments1 min readLW link

My thoughts on OpenAI’s al­ign­ment plan

Akash30 Dec 2022 19:33 UTC
55 points
3 comments20 min readLW link

Beyond Re­wards and Values: A Non-du­al­is­tic Ap­proach to Univer­sal Intelligence

Akira Pyinya30 Dec 2022 19:05 UTC
10 points
4 comments14 min readLW link

10 Years of LessWrong

JohnBuridan30 Dec 2022 17:15 UTC
73 points
2 comments4 min readLW link

Chat­bots as a Publi­ca­tion Format

derek shiller30 Dec 2022 14:11 UTC
6 points
6 comments4 min readLW link

Hu­man sex­u­al­ity as an in­ter­est­ing case study of alignment

beren30 Dec 2022 13:37 UTC
39 points
26 comments3 min readLW link

The Twit­ter Files: Covid Edition

Zvi30 Dec 2022 13:30 UTC
32 points
2 comments10 min readLW link
(thezvi.wordpress.com)

Wor­ldly Po­si­tions archive, briefly with pri­vate drafts

KatjaGrace30 Dec 2022 12:20 UTC
11 points
0 comments1 min readLW link
(worldspiritsockpuppet.com)

Models Don’t “Get Re­ward”

Sam Ringer30 Dec 2022 10:37 UTC
307 points
61 comments5 min readLW link1 review

The hy­per­finite timeline

Alok Singh30 Dec 2022 9:30 UTC
3 points
6 comments1 min readLW link
(alok.github.io)

Re­ac­tive de­val­u­a­tion: Bias in Eval­u­at­ing AGI X-Risks

30 Dec 2022 9:02 UTC
−15 points
9 comments1 min readLW link

Things I carry al­most ev­ery day, as of late De­cem­ber 2022

DanielFilan30 Dec 2022 7:40 UTC
35 points
9 comments5 min readLW link
(danielfilan.com)

More ways to spot abysses

KatjaGrace30 Dec 2022 6:30 UTC
21 points
1 comment1 min readLW link
(worldspiritsockpuppet.com)

Lan­guage mod­els are nearly AGIs but we don’t no­tice it be­cause we keep shift­ing the bar

philosophybear30 Dec 2022 5:15 UTC
105 points
13 comments7 min readLW link

Progress links and tweets, 2022-12-29

jasoncrawford30 Dec 2022 4:54 UTC
12 points
0 comments1 min readLW link
(rootsofprogress.org)

An­nounc­ing The Filan Cabinet

DanielFilan30 Dec 2022 3:10 UTC
21 points
2 comments1 min readLW link
(danielfilan.com)

[Question] Effec­tive Evil Causes?

Ulisse Mini30 Dec 2022 2:56 UTC
−12 points
2 comments1 min readLW link

But is it re­ally in Rome? An in­ves­ti­ga­tion of the ROME model edit­ing technique

jacquesthibs30 Dec 2022 2:40 UTC
102 points
1 comment18 min readLW link

A Year of AI In­creas­ing AI Progress

ThomasW30 Dec 2022 2:09 UTC
148 points
3 comments2 min readLW link

Why not spend more time look­ing at hu­man al­ign­ment?

ajc58630 Dec 2022 0:22 UTC
11 points
3 comments1 min readLW link

Why and how to write things on the Internet

benkuhn29 Dec 2022 22:40 UTC
20 points
2 comments15 min readLW link
(www.benkuhn.net)

Friendly and Un­friendly AGI are Indistinguishable

ErgoEcho29 Dec 2022 22:13 UTC
−4 points
4 comments4 min readLW link
(neologos.co)

200 COP in MI: Look­ing for Cir­cuits in the Wild

Neel Nanda29 Dec 2022 20:59 UTC
16 points
5 comments13 min readLW link

Thoughts on the im­pli­ca­tions of GPT-3, two years ago and NOW [here be drag­ons, we’re swim­ming, fly­ing and talk­ing with them]

Bill Benzon29 Dec 2022 20:05 UTC
0 points
0 comments5 min readLW link

Covid 12/​29/​22: Next Up is XBB.1.5

Zvi29 Dec 2022 18:20 UTC
33 points
4 comments10 min readLW link
(thezvi.wordpress.com)

En­trepreneur­ship ETG Might Be Bet­ter Than 80k Thought

Xodarap29 Dec 2022 17:51 UTC
33 points
0 comments1 min readLW link

In­ter­nal In­ter­faces Are a High-Pri­or­ity In­ter­pretabil­ity Target

Thane Ruthenis29 Dec 2022 17:49 UTC
26 points
6 comments7 min readLW link

CFP for Re­bel­lion and Di­sobe­di­ence in AI workshop

Ram Rachum29 Dec 2022 16:08 UTC
15 points
0 comments1 min readLW link

My scorched-earth policy on New Year’s resolutions

PatrickDFarley29 Dec 2022 14:45 UTC
29 points
2 comments4 min readLW link

Don’t feed the void. She is fat enough!

Johannes C. Mayer29 Dec 2022 14:18 UTC
11 points
0 comments1 min readLW link

[Question] Is there any unified re­source on Eliezer’s fa­tigue?

Johannes C. Mayer29 Dec 2022 14:04 UTC
8 points
2 comments1 min readLW link

Log­i­cal Prob­a­bil­ity of Gold­bach’s Con­jec­ture: Prov­able Rule or Coin­ci­dence?

avturchin29 Dec 2022 13:37 UTC
5 points
15 comments8 min readLW link

Where do you get your ca­pa­bil­ities from?

tailcalled29 Dec 2022 11:39 UTC
24 points
27 comments6 min readLW link

The com­mer­cial in­cen­tive to in­ten­tion­ally train AI to de­ceive us

Derek M. Jones29 Dec 2022 11:30 UTC
5 points
1 comment4 min readLW link
(shape-of-code.com)

In­finite neck­lace: the line as a circle

Alok Singh29 Dec 2022 10:41 UTC
5 points
2 comments1 min readLW link

Pri­vacy Tradeoffs

jefftk29 Dec 2022 3:40 UTC
13 points
1 comment2 min readLW link
(www.jefftk.com)

Against John Searle, Gary Mar­cus, the Chi­nese Room thought ex­per­i­ment and its world

philosophybear29 Dec 2022 3:26 UTC
21 points
43 comments8 min readLW link

Large Lan­guage Models Suggest a Path to Ems

anithite29 Dec 2022 2:20 UTC
17 points
2 comments5 min readLW link

[Question] Book recom­men­da­tions for the his­tory of ML?

Eleni Angelou28 Dec 2022 23:50 UTC
2 points
2 comments1 min readLW link

Rock-Paper-Scis­sors Can Be Weird

winwonce28 Dec 2022 23:12 UTC
14 points
3 comments1 min readLW link

200 COP in MI: The Case for Analysing Toy Lan­guage Models

Neel Nanda28 Dec 2022 21:07 UTC
39 points
3 comments7 min readLW link

200 Con­crete Open Prob­lems in Mechanis­tic In­ter­pretabil­ity: Introduction

Neel Nanda28 Dec 2022 21:06 UTC
103 points
0 comments10 min readLW link

Effec­tive ways to find love?

anonymoususer28 Dec 2022 20:46 UTC
8 points
6 comments1 min readLW link

Clas­si­cal logic based on propo­si­tions-as-sub­s­in­gle­ton-types

Thomas Kehrenberg28 Dec 2022 20:16 UTC
3 points
0 comments16 min readLW link

In Defense of Wrap­per-Minds

Thane Ruthenis28 Dec 2022 18:28 UTC
23 points
38 comments3 min readLW link

[Question] What is the best way to ap­proach Ex­pected Value calcu­la­tions when pay­offs are highly skewed?

jmh28 Dec 2022 14:42 UTC
8 points
16 comments1 min readLW link