RSS

Hu­man Mimicry Mainly Works When We’re Already Close

johnswentworth17 Aug 2022 18:41 UTC
41 points
5 comments5 min readLW link

The longest train­ing run

17 Aug 2022 17:18 UTC
41 points
5 comments9 min readLW link
(epochai.org)

A Mechanis­tic In­ter­pretabil­ity Anal­y­sis of Grokking

15 Aug 2022 2:41 UTC
239 points
15 comments41 min readLW link
(colab.research.google.com)

Matt Ygle­sias on AI Policy

Grant Demaree17 Aug 2022 23:57 UTC
19 points
0 comments1 min readLW link
(www.slowboring.com)

The Core of the Align­ment Prob­lem is...

17 Aug 2022 20:07 UTC
25 points
3 comments9 min readLW link

Au­ton­omy as tak­ing re­spon­si­bil­ity for refer­ence maintenance

Ramana Kumar17 Aug 2022 12:50 UTC
38 points
0 comments5 min readLW link

Con­di­tion­ing, Prompts, and Fine-Tuning

Adam Jermyn17 Aug 2022 20:52 UTC
22 points
1 comment4 min readLW link

Deep­Mind al­ign­ment team opinions on AGI ruin arguments

Vika12 Aug 2022 21:06 UTC
284 points
27 comments14 min readLW link

In­ter­pretabil­ity Tools Are an At­tack Channel

Thane Ruthenis17 Aug 2022 18:47 UTC
24 points
6 comments1 min readLW link

The Parable of the Boy Who Cried 5% Chance of Wolf

KatWoods15 Aug 2022 14:33 UTC
125 points
19 comments2 min readLW link

My thoughts on di­rect work (and join­ing LessWrong)

RobertM16 Aug 2022 18:53 UTC
51 points
4 comments6 min readLW link

What’s Gen­eral-Pur­pose Search, And Why Might We Ex­pect To See It In Trained ML Sys­tems?

johnswentworth15 Aug 2022 22:48 UTC
80 points
8 comments10 min readLW link

Con­crete Ad­vice for Form­ing In­side Views on AI Safety

Neel Nanda17 Aug 2022 22:02 UTC
12 points
0 comments10 min readLW link

I’m mildly skep­ti­cal that blind­ness pre­vents schizophrenia

Steven Byrnes15 Aug 2022 23:36 UTC
67 points
7 comments4 min readLW link

Thoughts on ‘List of Lethal­ities’

alexrjl17 Aug 2022 18:33 UTC
14 points
0 comments10 min readLW link

Progress links and tweets, 2022-08-17

jasoncrawford17 Aug 2022 21:27 UTC
11 points
0 comments2 min readLW link
(rootsofprogress.org)

Lan­guage mod­els seem to be much bet­ter than hu­mans at next-to­ken prediction

11 Aug 2022 17:45 UTC
128 points
52 comments13 min readLW link

Ex­treme Security

lc15 Aug 2022 12:11 UTC
42 points
2 comments5 min readLW link

Against pop­u­la­tion ethics

jasoncrawford16 Aug 2022 5:19 UTC
29 points
30 comments3 min readLW link

chin­chilla’s wild implications

nostalgebraist31 Jul 2022 1:18 UTC
328 points
108 comments12 min readLW link