Safetywashing

Adam SchollJul 1, 2022, 11:56 AM
261 points
20 comments1 min readLW link2 reviews

[Question] AGI al­ign­ment with what?

AlignmentMirrorJul 1, 2022, 10:22 AM
6 points
10 comments1 min readLW link

Open & Wel­come Thread—July 2022

Kaj_SotalaJul 1, 2022, 7:47 AM
20 points
61 comments1 min readLW link

[Question] What is the con­trast to coun­ter­fac­tual rea­son­ing?

Dominic RoserJul 1, 2022, 7:39 AM
5 points
10 comments1 min readLW link

Meio­sis is all you need

MetacelsusJul 1, 2022, 7:39 AM
41 points
3 comments2 min readLW link
(denovo.substack.com)

[Question] How to Nav­i­gate Eval­u­at­ing Poli­ti­cized Re­search?

Davis_KingsleyJul 1, 2022, 5:59 AM
11 points
1 comment1 min readLW link

One is (al­most) nor­mal in base π

Adam ScherlisJul 1, 2022, 4:05 AM
14 points
0 comments1 min readLW link
(adam.scherlis.com)

AI safety uni­ver­sity groups: a promis­ing op­por­tu­nity to re­duce ex­is­ten­tial risk

micJul 1, 2022, 3:59 AM
14 points
0 comments11 min readLW link

Look­ing back on my al­ign­ment PhD

TurnTroutJul 1, 2022, 3:19 AM
334 points
66 comments11 min readLW link

Selec­tion pro­cesses for subagents

Ryan KiddJun 30, 2022, 11:57 PM
36 points
2 comments9 min readLW link

[Question] Cry­on­ics-ad­ja­cent question

FlaglandbaseJun 30, 2022, 11:03 PM
12 points
3 comments1 min readLW link

Fore­casts are not enough

Ege ErdilJun 30, 2022, 10:00 PM
44 points
5 comments5 min readLW link

Mur­phyjitsu: an In­ner Si­mu­la­tor algorithm

CFAR!DuncanJun 30, 2022, 9:50 PM
67 points
24 comments11 min readLW link2 reviews

GPT-3 Catch­ing Fish in Morse Code

Megan KinnimentJun 30, 2022, 9:22 PM
117 points
27 comments8 min readLW link

Me­tacog­ni­tion in the Rat

Jacob FalkovichJun 30, 2022, 8:53 PM
19 points
0 comments6 min readLW link

On viewquakes

Dalton MaberyJun 30, 2022, 8:08 PM
8 points
0 comments2 min readLW link

The Track Record of Fu­tur­ists Seems … Fine

HoldenKarnofskyJun 30, 2022, 7:40 PM
91 points
25 comments12 min readLW link
(www.cold-takes.com)

Quick sur­vey on AI al­ign­ment resources

frances_lorenzJun 30, 2022, 7:09 PM
14 points
0 comments1 min readLW link

[Linkpost] Solv­ing Quan­ti­ta­tive Rea­son­ing Prob­lems with Lan­guage Models

YitzJun 30, 2022, 6:58 PM
76 points
15 comments2 min readLW link
(storage.googleapis.com)

Failing to fix a dan­ger­ous intersection

alyssavanceJun 30, 2022, 6:09 PM
110 points
17 comments2 min readLW link

Most Func­tions Have Un­de­sir­able Global Extrema

En KepeigJun 30, 2022, 5:10 PM
8 points
5 comments3 min readLW link

He­donis­tic Iso­topes:

TrozxzrJun 30, 2022, 4:49 PM
1 point
0 comments1 min readLW link

Abadar­ian Trades

David UdellJun 30, 2022, 4:41 PM
17 points
22 comments2 min readLW link

Covid 6/​30/​22: Vac­cine Up­date Update

ZviJun 30, 2022, 2:00 PM
32 points
6 comments12 min readLW link
(thezvi.wordpress.com)

[Question] How should I talk about op­ti­mal but not sub­game-op­ti­mal play?

JamesFavilleJun 30, 2022, 1:58 PM
5 points
1 comment3 min readLW link

For­mal Philos­o­phy and Align­ment Pos­si­ble Projects

Daniel HerrmannJun 30, 2022, 10:42 AM
34 points
5 comments8 min readLW link

Ban­ga­lore LW/​ACX Meetup in person

AdityaJun 30, 2022, 7:21 AM
5 points
2 comments1 min readLW link

Cul­ti­vat­ing And De­stroy­ing Agency

hathJun 30, 2022, 3:59 AM
105 points
11 comments9 min readLW link

$500 bounty for al­ign­ment con­test ideas

Orpheus16Jun 30, 2022, 1:56 AM
29 points
5 comments2 min readLW link

any good ra­tio­nal­ist guides to nu­tri­tion /​ healthy eat­ing?

Ben AJun 30, 2022, 12:50 AM
7 points
15 comments1 min readLW link

A sum­mary of ev­ery Re­plac­ing Guilt post

Orpheus16Jun 30, 2022, 12:46 AM
35 points
3 comments10 min readLW link
(forum.effectivealtruism.org)

Gra­di­ent hack­ing: defi­ni­tions and examples

Richard_NgoJun 29, 2022, 9:35 PM
38 points
2 comments5 min readLW link

Progress links and tweets, 2022-06-29

jasoncrawfordJun 29, 2022, 9:33 PM
9 points
0 comments1 min readLW link
(rootsofprogress.org)

[Question] Cor­rect­ing hu­man er­ror vs do­ing ex­actly what you’re told—is there liter­a­ture on this in con­text of gen­eral sys­tem de­sign?

Jan CzechowskiJun 29, 2022, 9:30 PM
6 points
0 comments1 min readLW link

La­tent Ad­ver­sar­ial Training

Adam JermynJun 29, 2022, 8:04 PM
52 points
13 comments5 min readLW link

Game Re­view: This Mer­chant Life

ZviJun 29, 2022, 6:30 PM
20 points
0 comments13 min readLW link
(thezvi.wordpress.com)

Limits to Legibility

Jan_KulveitJun 29, 2022, 5:42 PM
157 points
11 comments5 min readLW link1 review

Will Ca­pa­bil­ities Gen­er­al­ise More?

Ramana Kumar29 Jun 2022 17:12 UTC
133 points
39 comments4 min readLW link

Kevin Kelly’s “103 Bits of Ad­vice,” Expanded

Dalton Mabery29 Jun 2022 13:36 UTC
19 points
0 comments5 min readLW link

The table of differ­ent sam­pling as­sump­tions in anthropics

avturchin29 Jun 2022 10:41 UTC
39 points
5 comments12 min readLW link

Can We Align AI by Hav­ing It Learn Hu­man Prefer­ences? I’m Scared (sum­mary of last third of Hu­man Com­pat­i­ble)

apollonianblues29 Jun 2022 4:09 UTC
19 points
3 comments6 min readLW link

Kurzge­sagt – The Last Hu­man (Youtube)

habryka29 Jun 2022 3:28 UTC
54 points
7 comments1 min readLW link
(www.youtube.com)

[Question] Liter­a­ture on How to Max­i­mize Preferences

josh28 Jun 2022 22:41 UTC
1 point
0 comments1 min readLW link

Challenge: A Much More Alien Message

kman28 Jun 2022 21:50 UTC
24 points
7 comments1 min readLW link

It’s Prob­a­bly Not Lithium

Natália28 Jun 2022 21:24 UTC
444 points
187 comments28 min readLW link1 review

Reflec­tions on Liv­ing in “Guess Cul­ture”

Dalton Mabery28 Jun 2022 21:00 UTC
13 points
1 comment3 min readLW link

[Question] What is the LessWrong Logo(?) Sup­posed to Rep­re­sent?

DragonGod28 Jun 2022 20:20 UTC
8 points
6 comments1 min readLW link

What Are You Track­ing In Your Head?

johnswentworth28 Jun 2022 19:30 UTC
289 points
83 comments4 min readLW link1 review

Why is so much poli­ti­cal com­men­tary mis­lead­ing?

contrarianbrit28 Jun 2022 17:10 UTC
−2 points
5 comments6 min readLW link
(thomasprosser.substack.com)

CFAR Hand­book: Introduction

CFAR!Duncan28 Jun 2022 16:53 UTC
119 points
12 comments1 min readLW link