Re­in­force­ment Learn­ing Goal Mis­gen­er­al­iza­tion: Can we guess what kind of goals are se­lected by de­fault?

25 Oct 2022 20:48 UTC
14 points
2 comments4 min readLW link

A Walk­through of A Math­e­mat­i­cal Frame­work for Trans­former Circuits

Neel Nanda25 Oct 2022 20:24 UTC
51 points
7 comments1 min readLW link
(www.youtube.com)

Noth­ing.

rogersbacon25 Oct 2022 16:33 UTC
−10 points
4 comments6 min readLW link
(www.secretorum.life)

Maps and Blueprint; the Two Sides of the Align­ment Equation

Nora_Ammann25 Oct 2022 16:29 UTC
21 points
1 comment5 min readLW link

Con­sider Ap­ply­ing to the Fu­ture Fel­low­ship at MIT

jefftk25 Oct 2022 15:40 UTC
29 points
0 comments1 min readLW link
(www.jefftk.com)

Beyond Kol­mogorov and Shannon

25 Oct 2022 15:13 UTC
62 points
17 comments5 min readLW link

What does it take to defend the world against out-of-con­trol AGIs?

Steven Byrnes25 Oct 2022 14:47 UTC
194 points
47 comments30 min readLW link1 review

Refine: what helped me write more?

Alexander Gietelink Oldenziel25 Oct 2022 14:44 UTC
12 points
0 comments2 min readLW link

Log­i­cal De­ci­sion The­o­ries: Our fi­nal failsafe?

Noosphere8925 Oct 2022 12:51 UTC
−7 points
8 comments1 min readLW link
(www.lesswrong.com)

What will the scaled up GATO look like? (Up­dated with ques­tions)

Amal 25 Oct 2022 12:44 UTC
34 points
22 comments1 min readLW link

Mechanism De­sign for AI Safety—Read­ing Group Curriculum

Rubi J. Hudson25 Oct 2022 3:54 UTC
15 points
3 comments1 min readLW link

Furry Ra­tion­al­ists & Effec­tive An­thro­po­mor­phism both exist

agentydragon25 Oct 2022 3:37 UTC
42 points
3 comments1 min readLW link

EA & LW Fo­rums Weekly Sum­mary (17 − 23 Oct 22′)

Zoe Williams25 Oct 2022 2:57 UTC
10 points
0 comments1 min readLW link

Dance Week­ends: Tests not Masks

jefftk25 Oct 2022 2:10 UTC
12 points
0 comments2 min readLW link
(www.jefftk.com)

[Question] What is good Cy­ber Se­cu­rity Ad­vice?

Gunnar_Zarncke24 Oct 2022 23:27 UTC
30 points
12 comments2 min readLW link

Con­nec­tions be­tween Mind-Body Prob­lem & Civilizations

oblivion24 Oct 2022 21:55 UTC
−3 points
1 comment1 min readLW link

[Question] Ra­tion­al­ism and money

David K24 Oct 2022 21:22 UTC
−5 points
2 comments1 min readLW link

[Question] Game semantics

David K24 Oct 2022 21:22 UTC
2 points
2 comments1 min readLW link

A Good Fu­ture (rough draft)

Michael Soareverix24 Oct 2022 20:45 UTC
10 points
5 comments3 min readLW link

A Bare­bones Guide to Mechanis­tic In­ter­pretabil­ity Prerequisites

Neel Nanda24 Oct 2022 20:45 UTC
63 points
12 comments3 min readLW link
(neelnanda.io)

POWER­play: An open-source toolchain to study AI power-seeking

Edouard Harris24 Oct 2022 20:03 UTC
27 points
0 comments1 min readLW link
(github.com)

Con­sider try­ing Vivek Heb­bar’s al­ign­ment exercises

Akash24 Oct 2022 19:46 UTC
38 points
1 comment4 min readLW link

[Question] Ed­u­ca­tion not meant for mass-consumption

Tolo24 Oct 2022 19:45 UTC
7 points
5 comments2 min readLW link

Real­iza­tions in Re­gards to Masculinity

nmc24 Oct 2022 19:42 UTC
−2 points
2 comments2 min readLW link

The Fu­til­ity of Religion

nmc24 Oct 2022 19:42 UTC
−1 points
5 comments3 min readLW link

The op­ti­mal timing of spend­ing on AGI safety work; why we should prob­a­bly be spend­ing more now

Tristan Cook24 Oct 2022 17:42 UTC
62 points
0 comments1 min readLW link

QACI: ques­tion-an­swer coun­ter­fac­tual intervals

Tamsin Leake24 Oct 2022 13:08 UTC
22 points
0 comments4 min readLW link
(carado.moe)

AGI in our life­times is wish­ful thinking

niknoble24 Oct 2022 11:53 UTC
0 points
25 comments8 min readLW link

Deep­Mind on Strat­ego, an im­perfect in­for­ma­tion game

sanxiyn24 Oct 2022 5:57 UTC
15 points
9 comments1 min readLW link
(arxiv.org)

[Question] TOMT: Post from 1-2 years ago talk­ing about a pa­per on so­cial networks

Simon Berens24 Oct 2022 1:29 UTC
5 points
1 comment1 min readLW link

AI re­searchers an­nounce Neu­roAI agenda

Cameron Berg24 Oct 2022 0:14 UTC
37 points
12 comments6 min readLW link
(arxiv.org)

Em­pow­er­ment is (al­most) All We Need

jacob_cannell23 Oct 2022 21:48 UTC
64 points
44 comments17 min readLW link

“Origi­nal­ity is noth­ing but ju­di­cious imi­ta­tion”—Voltaire

Vestozia23 Oct 2022 19:00 UTC
0 points
0 comments13 min readLW link

Mid-Pen­in­sula ACX/​LW Meetup [CANCELLED]

moshezadka23 Oct 2022 17:37 UTC
1 point
0 comments1 min readLW link

I am a Me­moryless System

NicholasKross23 Oct 2022 17:34 UTC
25 points
2 comments9 min readLW link
(www.thinkingmuchbetter.com)

Ac­countabil­ity Bud­dies: Why you might want one.

Samuel Nellessen23 Oct 2022 16:25 UTC
10 points
3 comments1 min readLW link

How to get past Haidt’s elephant and listen

Astynax23 Oct 2022 16:06 UTC
13 points
4 comments2 min readLW link

Writ­ing Rus­sian and Ukrainian words in Latin script

Viliam23 Oct 2022 15:25 UTC
19 points
22 comments6 min readLW link

[Question] Have you no­ticed any ways that ra­tio­nal­ists differ? [Brain­storm­ing ses­sion]

tailcalled23 Oct 2022 11:32 UTC
23 points
22 comments1 min readLW link

Mnestics

Jarred Filmer23 Oct 2022 0:30 UTC
117 points
5 comments4 min readLW link

Telic in­tu­itions across the sciences

mrcbarbier22 Oct 2022 21:31 UTC
4 points
0 comments17 min readLW link

A ba­sic lex­i­con of telic concepts

mrcbarbier22 Oct 2022 21:28 UTC
2 points
0 comments3 min readLW link

Do we have the right kind of math for roles, goals and mean­ing?

mrcbarbier22 Oct 2022 21:28 UTC
13 points
5 comments7 min readLW link

[Question] The Last Year - is there an ex­ist­ing novel about the last year be­fore AI doom?

Luca Petrolati22 Oct 2022 20:44 UTC
4 points
4 comments1 min readLW link

The high­est-prob­a­bil­ity out­come can be out of distribution

tailcalled22 Oct 2022 20:00 UTC
13 points
5 comments1 min readLW link

Newslet­ter for Align­ment Re­search: The ML Safety Updates

Esben Kran22 Oct 2022 16:17 UTC
25 points
0 comments1 min readLW link

Crypto loves im­pact mar­kets: Notes from Schel­ling Point Bogotá

Rachel Shu22 Oct 2022 15:58 UTC
17 points
2 comments1 min readLW link

[Question] When try­ing to define gen­eral in­tel­li­gence is abil­ity to achieve goals the best met­ric?

jmh22 Oct 2022 3:09 UTC
5 points
0 comments1 min readLW link

[Question] Sim­ple ques­tion about cor­rigi­bil­ity and val­ues in AI.

jmh22 Oct 2022 2:59 UTC
6 points
1 comment1 min readLW link

Moorean Statements

David Udell22 Oct 2022 0:50 UTC
11 points
11 comments1 min readLW link