Not Get­ting Hacked

jefftk21 Dec 2022 21:40 UTC
40 points
14 comments7 min readLW link
(www.jefftk.com)

Me­taphor.systems

the gears to ascension21 Dec 2022 21:31 UTC
25 points
9 comments1 min readLW link
(metaphor.systems)

[Question] How much is DQC (Dy­namic Quan­tum Clus­ter­ing) cur­rently looked into in AI Ca­pa­bil­ities Re­search?

macmillan21 Dec 2022 20:46 UTC
1 point
0 comments1 min readLW link

Think wider about the root causes of progress

jasoncrawford21 Dec 2022 20:05 UTC
49 points
11 comments4 min readLW link
(rootsofprogress.org)

[Question] What read­ings did you con­sider best for the happy parts of the sec­u­lar sols­tice?

ChristianKl21 Dec 2022 15:45 UTC
17 points
0 comments1 min readLW link

Re­cre­at­ing logic in type theory

Thomas Kehrenberg21 Dec 2022 15:19 UTC
12 points
0 comments13 min readLW link

You be­come the UI you use

Viliam21 Dec 2022 15:04 UTC
21 points
7 comments2 min readLW link

Price’s equa­tion for neu­ral networks

tailcalled21 Dec 2022 13:09 UTC
29 points
4 comments2 min readLW link

De­ci­sions: On­tolog­i­cally Shift­ing to Determinism

Chris_Leong21 Dec 2022 12:41 UTC
8 points
11 comments6 min readLW link

A Com­pre­hen­sive Mechanis­tic In­ter­pretabil­ity Ex­plainer & Glossary

Neel Nanda21 Dec 2022 12:35 UTC
82 points
6 comments2 min readLW link
(neelnanda.io)

Google Search loses to ChatGPT fair and square

shminux21 Dec 2022 8:11 UTC
14 points
17 comments1 min readLW link
(www.surgehq.ai)

Sazen

[DEACTIVATED] Duncan Sabien21 Dec 2022 7:54 UTC
275 points
83 comments12 min readLW link2 reviews

Pod­cast: What’s Wrong With LessWrong

Alfred21 Dec 2022 7:06 UTC
−32 points
11 comments1 min readLW link
(youtu.be)

New AI risk in­tro from Vox [link post]

JakubK21 Dec 2022 6:00 UTC
5 points
1 comment2 min readLW link
(www.vox.com)

Lo­cal Memes Against Geo­met­ric Rationality

Scott Garrabrant21 Dec 2022 3:53 UTC
85 points
3 comments6 min readLW link

Log­ging Shell His­tory in Zsh

jefftk21 Dec 2022 3:30 UTC
19 points
2 comments1 min readLW link
(www.jefftk.com)

CIRL Cor­rigi­bil­ity is Fragile

21 Dec 2022 1:40 UTC
58 points
9 comments12 min readLW link

[Question] [DISC] Are Values Ro­bust?

DragonGod21 Dec 2022 1:00 UTC
12 points
9 comments2 min readLW link

Perform­ing an SVD on a time-se­ries ma­trix of gra­di­ent up­dates on an MNIST net­work pro­duces 92.5 sin­gu­lar values

Garrett Baker21 Dec 2022 0:44 UTC
9 points
10 comments5 min readLW link

Progress links and tweets, 2022-12-20

jasoncrawford21 Dec 2022 0:35 UTC
12 points
0 comments2 min readLW link
(rootsofprogress.org)

K-com­plex­ity is silly; use cross-en­tropy instead

So8res20 Dec 2022 23:06 UTC
137 points
53 comments4 min readLW link2 reviews

Pod­cast: Tam­era Lan­ham on AI risk, threat mod­els, al­ign­ment pro­pos­als, ex­ter­nal­ized rea­son­ing over­sight, and work­ing at Anthropic

Akash20 Dec 2022 21:39 UTC
18 points
2 comments11 min readLW link

Dis­cov­er­ing Lan­guage Model Be­hav­iors with Model-Writ­ten Evaluations

20 Dec 2022 20:08 UTC
100 points
34 comments1 min readLW link
(www.anthropic.com)

Reflec­tions: Bureau­cratic Hell

Haris Rashid20 Dec 2022 19:22 UTC
−5 points
1 comment1 min readLW link
(www.harisrab.com)

Pro­lifer­at­ing Education

Haris Rashid20 Dec 2022 19:22 UTC
−1 points
2 comments5 min readLW link
(www.harisrab.com)

AGI is here, but no­body wants it. Why should we even care?

MGow20 Dec 2022 19:14 UTC
−22 points
0 comments17 min readLW link

Prop­er­ties of cur­rent AIs and some pre­dic­tions of the evolu­tion of AI from the per­spec­tive of scale-free the­o­ries of agency and reg­u­la­tive development

Roman Leventov20 Dec 2022 17:13 UTC
33 points
3 comments36 min readLW link

I be­lieve some AI doomers are overconfident

FTPickle20 Dec 2022 17:09 UTC
8 points
15 comments2 min readLW link

Note on al­gorithms with mul­ti­ple trained components

Steven Byrnes20 Dec 2022 17:08 UTC
23 points
4 comments2 min readLW link

Marvel Snap: Phase 2

Zvi20 Dec 2022 14:50 UTC
11 points
1 comment13 min readLW link
(thezvi.wordpress.com)

(Ex­tremely) Naive Gra­di­ent Hack­ing Doesn’t Work

ojorgensen20 Dec 2022 14:35 UTC
14 points
0 comments6 min readLW link

An Open Agency Ar­chi­tec­ture for Safe Trans­for­ma­tive AI

davidad20 Dec 2022 13:04 UTC
79 points
22 comments4 min readLW link

Un­der-Ap­pre­ci­ated Ways to Use Flash­cards—Part I

Florence Hinder20 Dec 2022 12:43 UTC
22 points
5 comments5 min readLW link
(thoughtsaver.ghost.io)

EA & LW Fo­rums Weekly Sum­mary (12th Dec − 18th Dec 22′)

Zoe Williams20 Dec 2022 9:49 UTC
10 points
0 comments1 min readLW link

[link, 2019] AI paradigm: in­ter­ac­tive learn­ing from un­la­beled instructions

the gears to ascension20 Dec 2022 6:45 UTC
2 points
0 comments2 min readLW link
(jgrizou.github.io)

[Fic­tion] Un­spo­ken Stone

Gordon Seidoh Worley20 Dec 2022 5:11 UTC
19 points
0 comments5 min readLW link

No­tice when you stop read­ing right be­fore you understand

just_browsing20 Dec 2022 5:09 UTC
59 points
6 comments1 min readLW link

Take 12: RLHF’s use is ev­i­dence that orgs will jam RL at real-world prob­lems.

Charlie Steiner20 Dec 2022 5:01 UTC
25 points
1 comment3 min readLW link

More notes from rais­ing a late-talk­ing kid

Steven Byrnes20 Dec 2022 2:13 UTC
40 points
2 comments6 min readLW link

The “Min­i­mal La­tents” Ap­proach to Nat­u­ral Abstractions

johnswentworth20 Dec 2022 1:22 UTC
53 points
24 comments12 min readLW link

our deep­est wishes

Tamsin Leake20 Dec 2022 0:23 UTC
29 points
0 comments1 min readLW link
(carado.moe)

Shard The­ory in Nine Th­e­ses: a Distil­la­tion and Crit­i­cal Appraisal

LawrenceC19 Dec 2022 22:52 UTC
138 points
30 comments18 min readLW link

[Question] Will re­search in AI risk jinx it? Con­se­quences of train­ing AI on AI risk arguments

Yann Dubois19 Dec 2022 22:42 UTC
5 points
6 comments1 min readLW link

AGI Timelines in Gover­nance: Differ­ent Strate­gies for Differ­ent Timeframes

19 Dec 2022 21:31 UTC
63 points
28 comments10 min readLW link

Towards Hodge-podge Alignment

Cleo Nardo19 Dec 2022 20:12 UTC
91 points
30 comments9 min readLW link

Com­pu­ta­tional sig­na­tures of psychopathy

Cameron Berg19 Dec 2022 17:01 UTC
28 points
3 comments20 min readLW link

Re­sults from a sur­vey on tool use and work­flows in al­ign­ment research

19 Dec 2022 15:19 UTC
79 points
2 comments19 min readLW link

Does ChatGPT’s perfor­mance war­rant work­ing on a tu­tor for chil­dren? [It’s time to take it to the lab.]

Bill Benzon19 Dec 2022 15:12 UTC
13 points
5 comments4 min readLW link
(new-savanna.blogspot.com)

Con­di­tions for Su­per­ra­tional­ity-mo­ti­vated Co­op­er­a­tion in a one-shot Pri­soner’s Dilemma

Jim Buhler19 Dec 2022 15:00 UTC
24 points
4 comments5 min readLW link

Next Level Seinfeld

Zvi19 Dec 2022 13:30 UTC
50 points
8 comments1 min readLW link
(thezvi.wordpress.com)