What sorts of sys­tems can be de­cep­tive?

Andrei Alexandru31 Oct 2022 22:00 UTC
16 points
0 comments7 min readLW link

“Cars and Elephants”: a hand­wavy ar­gu­ment/​anal­ogy against mechanis­tic interpretability

David Scott Krueger (formerly: capybaralet)31 Oct 2022 21:26 UTC
48 points
25 comments2 min readLW link

Su­per­in­tel­li­gent AI is nec­es­sary for an amaz­ing fu­ture, but far from sufficient

So8res31 Oct 2022 21:16 UTC
132 points
48 comments34 min readLW link

San­ity-check­ing in an age of hyperbole

Ciprian Elliu Ivanof31 Oct 2022 20:04 UTC
2 points
4 comments2 min readLW link

Why Aren’t There More Schel­ling Holi­days?

johnswentworth31 Oct 2022 19:31 UTC
63 points
21 comments1 min readLW link

pub­lish­ing al­ign­ment re­search and exfohazards

Tamsin Leake31 Oct 2022 18:02 UTC
80 points
12 comments1 min readLW link1 review
(carado.moe)

The cir­cu­lar prob­lem of epistemic irresponsibility

Roman Leventov31 Oct 2022 17:23 UTC
5 points
2 comments8 min readLW link

AI as a Civ­i­liza­tional Risk Part 3/​6: Anti-econ­omy and Sig­nal Pollution

PashaKamyshev31 Oct 2022 17:03 UTC
7 points
4 comments14 min readLW link

Aver­age util­i­tar­i­anism is non-local

Yair Halberstadt31 Oct 2022 16:36 UTC
29 points
13 comments1 min readLW link

Marvel Snap: Phase 1

Zvi31 Oct 2022 15:20 UTC
23 points
1 comment14 min readLW link
(thezvi.wordpress.com)

Boundaries vs Frames

Scott Garrabrant31 Oct 2022 15:14 UTC
58 points
10 comments7 min readLW link

Embed­ding safety in ML development

zeshen31 Oct 2022 12:27 UTC
24 points
1 comment18 min readLW link

[Book] In­ter­pretable Ma­chine Learn­ing: A Guide for Mak­ing Black Box Models Explainable

Esben Kran31 Oct 2022 11:38 UTC
20 points
1 comment1 min readLW link
(christophm.github.io)

My (naive) take on Risks from Learned Optimization

Artyom Karpov31 Oct 2022 10:59 UTC
7 points
0 comments5 min readLW link

Tac­ti­cal Nu­clear Weapons Aren’t Cost-Effec­tive Com­pared to Pre­ci­sion Artillery

Lao Mein31 Oct 2022 4:33 UTC
28 points
7 comments3 min readLW link

Gan­dalf or Saru­man? A Soldier in Scout’s Clothing

DirectedEvolution31 Oct 2022 2:40 UTC
41 points
1 comment4 min readLW link

love, not competition

Tamsin Leake30 Oct 2022 19:44 UTC
30 points
20 comments1 min readLW link
(carado.moe)

Me (Steve Byrnes) on the “Brain In­spired” podcast

Steven Byrnes30 Oct 2022 19:15 UTC
26 points
1 comment1 min readLW link
(braininspired.co)

“Nor­mal” is the equil­ibrium state of past op­ti­miza­tion processes

Alex_Altair30 Oct 2022 19:03 UTC
81 points
5 comments5 min readLW link

AI as a Civ­i­liza­tional Risk Part 2/​6: Be­hav­ioral Modification

PashaKamyshev30 Oct 2022 16:57 UTC
9 points
0 comments10 min readLW link

In­stru­men­tal ig­nor­ing AI, Dumb but not use­less.

Donald Hobson30 Oct 2022 16:55 UTC
7 points
6 comments2 min readLW link

Weekly Roundup #3

Zvi30 Oct 2022 12:20 UTC
23 points
5 comments15 min readLW link
(thezvi.wordpress.com)

Quickly re­fac­tor­ing the U.S. Constitution

lc30 Oct 2022 7:17 UTC
7 points
25 comments4 min readLW link

«Boundaries», Part 3a: Defin­ing bound­aries as di­rected Markov blankets

Andrew_Critch30 Oct 2022 6:31 UTC
86 points
20 comments15 min readLW link

Am I se­cretly ex­cited for AI get­ting weird?

porby29 Oct 2022 22:16 UTC
115 points
4 comments4 min readLW link

AI as a Civ­i­liza­tional Risk Part 1/​6: His­tor­i­cal Priors

PashaKamyshev29 Oct 2022 21:59 UTC
2 points
2 comments7 min readLW link

Don’t ex­pect your life part­ner to be bet­ter than your exes in more than one way: a math­e­mat­i­cal model

mdd29 Oct 2022 18:47 UTC
7 points
1 comment9 min readLW link

The So­cial Re­ces­sion: By the Numbers

antonomon29 Oct 2022 18:45 UTC
165 points
29 comments8 min readLW link
(novum.substack.com)

Elec­tric Ket­tle vs Stove

jefftk29 Oct 2022 12:50 UTC
18 points
7 comments1 min readLW link
(www.jefftk.com)

Quan­tum Im­mor­tal­ity, foiled

Ben29 Oct 2022 11:00 UTC
27 points
4 comments2 min readLW link

Some Les­sons Learned from Study­ing Indi­rect Ob­ject Iden­ti­fi­ca­tion in GPT-2 small

28 Oct 2022 23:55 UTC
99 points
9 comments9 min readLW link2 reviews
(arxiv.org)

Re­sources that (I think) new al­ign­ment re­searchers should know about

Akash28 Oct 2022 22:13 UTC
77 points
9 comments4 min readLW link

How of­ten does One Per­son suc­ceed?

Mayank Modi28 Oct 2022 19:32 UTC
1 point
3 comments1 min readLW link

aisafety.com­mu­nity—A liv­ing doc­u­ment of AI safety communities

28 Oct 2022 17:50 UTC
57 points
23 comments1 min readLW link

Rapid Test Throat Swab­bing?

jefftk28 Oct 2022 16:30 UTC
18 points
2 comments1 min readLW link
(www.jefftk.com)

Join the in­ter­pretabil­ity re­search hackathon

Esben Kran28 Oct 2022 16:26 UTC
15 points
0 comments1 min readLW link

Syncretism

Annapurna28 Oct 2022 16:08 UTC
16 points
4 comments1 min readLW link
(jorgevelez.substack.com)

Pon­der­ing com­pu­ta­tion in the real world

Adam Shai28 Oct 2022 15:57 UTC
24 points
13 comments5 min readLW link

Ukraine and the Crimea Question

ChristianKl28 Oct 2022 12:26 UTC
−2 points
153 comments11 min readLW link

New book on s-risks

Tobias_Baumann28 Oct 2022 9:36 UTC
68 points
1 comment1 min readLW link

Cryp­tic symbols

Adam Scherlis28 Oct 2022 6:44 UTC
6 points
17 comments1 min readLW link
(adam.scherlis.com)

All life’s helpers’ beliefs

Tehdastehdas28 Oct 2022 5:47 UTC
−12 points
1 comment5 min readLW link

Prizes for ML Safety Bench­mark Ideas

joshc28 Oct 2022 2:51 UTC
36 points
4 comments1 min readLW link

Wor­ld­view iPeo­ple—Fu­ture Fund’s AI Wor­ld­view Prize

Toni MUENDEL28 Oct 2022 1:53 UTC
−22 points
4 comments9 min readLW link

Anatomy of change

Jose Miguel Cruz y Celis28 Oct 2022 1:21 UTC
1 point
0 comments1 min readLW link

Nash equil­ibria of sym­met­ric zero-sum games

Ege Erdil27 Oct 2022 23:50 UTC
14 points
0 comments14 min readLW link

[Question] Good psy­chol­ogy books/​books that con­tain good psy­cholog­i­cal mod­els?

shuffled-cantaloupe27 Oct 2022 23:04 UTC
1 point
1 comment1 min readLW link

Pod­cast: The Left and Effec­tive Altru­ism with Habiba Islam

garrison27 Oct 2022 17:41 UTC
2 points
2 comments1 min readLW link

Les­sons from ‘Famine, Affluence, and Mo­ral­ity’ and its re­flec­tion on to­day.

Mayank Modi27 Oct 2022 17:20 UTC
4 points
0 comments1 min readLW link

[Question] Is the Orthog­o­nal­ity Th­e­sis true for hu­mans?

Noosphere8927 Oct 2022 14:41 UTC
12 points
20 comments1 min readLW link