Three rea­sons to cooperate

paulfchristianoDec 24, 2022, 5:40 PM
86 points
14 comments10 min readLW link
(sideways-view.com)

A hun­dredth of a bit of ex­tra entropy

Adam ScherlisDec 24, 2022, 9:12 PM
83 points
4 comments3 min readLW link

Reflec­tions on my 5-month al­ign­ment up­skil­ling grant

Jay BaileyDec 27, 2022, 10:51 AM
82 points
4 comments8 min readLW link

An Open Agency Ar­chi­tec­ture for Safe Trans­for­ma­tive AI

davidadDec 20, 2022, 1:04 PM
80 points
22 comments4 min readLW link

Proper scor­ing rules don’t guaran­tee pre­dict­ing fixed points

Dec 16, 2022, 6:22 PM
79 points
8 comments21 min readLW link

Re­sults from a sur­vey on tool use and work­flows in al­ign­ment research

Dec 19, 2022, 3:19 PM
79 points
2 comments19 min readLW link

Prob­a­bly good pro­jects for the AI safety ecosystem

Ryan KiddDec 5, 2022, 2:26 AM
78 points
40 comments2 min readLW link

On sincerity

Joe CarlsmithDec 23, 2022, 5:13 PM
76 points
6 comments42 min readLW link

MrBeast’s Squid Game Tricked Me

lsusrDec 3, 2022, 5:50 AM
75 points
1 comment2 min readLW link

10 Years of LessWrong

SebastianG Dec 30, 2022, 5:15 PM
73 points
2 comments4 min readLW link

Ver­ifi­ca­tion Is Not Easier Than Gen­er­a­tion In General

johnswentworthDec 6, 2022, 5:20 AM
73 points
27 comments1 min readLW link

«Boundaries», Part 3b: Align­ment prob­lems in terms of bound­aries

Andrew_CritchDec 14, 2022, 10:34 PM
72 points
7 comments13 min readLW link

[Question] Who are some promi­nent rea­son­able peo­ple who are con­fi­dent that AI won’t kill ev­ery­one?

Optimization ProcessDec 5, 2022, 9:12 AM
72 points
54 comments1 min readLW link

AI Safety Seems Hard to Measure

HoldenKarnofskyDec 8, 2022, 7:50 PM
71 points
6 comments14 min readLW link
(www.cold-takes.com)

It’s time to worry about on­line pri­vacy again

MalmesburyDec 25, 2022, 9:05 PM
69 points
23 comments6 min readLW link

The True Spirit of Sols­tice?

RaemonDec 19, 2022, 8:00 AM
69 points
31 comments9 min readLW link

Paper: Con­sti­tu­tional AI: Harm­less­ness from AI Feed­back (An­thropic)

LawrenceCDec 16, 2022, 10:12 PM
68 points
11 comments1 min readLW link
(www.anthropic.com)

AI Ne­o­re­al­ism: a threat model & suc­cess crite­rion for ex­is­ten­tial safety

davidadDec 15, 2022, 1:42 PM
67 points
1 comment3 min readLW link

AGI Timelines in Gover­nance: Differ­ent Strate­gies for Differ­ent Timeframes

Dec 19, 2022, 9:31 PM
65 points
28 comments10 min readLW link

Can we effi­ciently ex­plain model be­hav­iors?

paulfchristianoDec 16, 2022, 7:40 PM
64 points
3 comments9 min readLW link
(ai-alignment.com)

Sys­tems of Survival

VaniverDec 9, 2022, 5:13 AM
63 points
5 comments5 min readLW link

Key Mostly Out­ward-Fac­ing Facts From the Story of VaccinateCA

ZviDec 14, 2022, 1:30 PM
61 points
2 comments23 min readLW link
(thezvi.wordpress.com)

No­tice when you stop read­ing right be­fore you understand

just_browsingDec 20, 2022, 5:09 AM
61 points
6 comments1 min readLW link

Sum­mary of a new study on out-group hate (and how to fix it)

DirectedEvolutionDec 4, 2022, 1:53 AM
60 points
30 comments3 min readLW link
(www.pnas.org)

Pre­dict­ing GPU performance

Dec 14, 2022, 4:27 PM
60 points
26 comments1 min readLW link
(epochai.org)

Up­date on Har­vard AI Safety Team and MIT AI Alignment

Dec 2, 2022, 12:56 AM
60 points
4 comments8 min readLW link

The Med­i­ta­tion on Winter

RaemonDec 25, 2022, 4:12 PM
59 points
3 comments3 min readLW link

MIRI’s “Death with Dig­nity” in 60 sec­onds.

Cleo NardoDec 6, 2022, 5:18 PM
59 points
4 comments1 min readLW link

CIRL Cor­rigi­bil­ity is Fragile

Dec 21, 2022, 1:40 AM
58 points
8 comments12 min readLW link

High-level hopes for AI alignment

HoldenKarnofskyDec 15, 2022, 6:00 PM
58 points
3 comments19 min readLW link
(www.cold-takes.com)

Con­crete Steps to Get Started in Trans­former Mechanis­tic Interpretability

Neel NandaDec 25, 2022, 10:21 PM
57 points
7 comments12 min readLW link
(www.neelnanda.io)

YCom­bi­na­tor fraud rates

XodarapDec 25, 2022, 7:21 PM
56 points
3 commentsLW link

In defense of prob­a­bly wrong mechanis­tic models

evhubDec 6, 2022, 11:24 PM
55 points
10 comments2 min readLW link

My thoughts on OpenAI’s al­ign­ment plan

Orpheus16Dec 30, 2022, 7:33 PM
55 points
3 comments20 min readLW link

For­mal­iza­tion as sus­pen­sion of intuition

adamShimiDec 11, 2022, 3:16 PM
54 points
18 comments1 min readLW link
(epistemologicalvigilance.substack.com)

Take 13: RLHF bad, con­di­tion­ing good.

Charlie SteinerDec 22, 2022, 10:44 AM
54 points
4 comments2 min readLW link

Nook Nature

Duncan Sabien (Inactive)Dec 5, 2022, 4:10 AM
54 points
18 comments10 min readLW link

Refram­ing in­ner alignment

davidadDec 11, 2022, 1:53 PM
53 points
13 comments4 min readLW link

The “Min­i­mal La­tents” Ap­proach to Nat­u­ral Abstractions

johnswentworthDec 20, 2022, 1:22 AM
53 points
24 comments12 min readLW link

An­nounc­ing: The In­de­pen­dent AI Safety Registry

Shoshannah TekofskyDec 26, 2022, 9:22 PM
53 points
9 comments1 min readLW link

Air-gap­ping eval­u­a­tion and support

Ryan KiddDec 26, 2022, 10:52 PM
53 points
1 comment2 min readLW link

Pos­i­tive val­ues seem more ro­bust and last­ing than prohibitions

TurnTroutDec 17, 2022, 9:43 PM
52 points
13 comments2 min readLW link

My AGI safety re­search—2022 re­view, ’23 plans

Steven ByrnesDec 14, 2022, 3:15 PM
51 points
10 comments7 min readLW link

Look­ing Back on Posts From 2022

ZviDec 26, 2022, 1:20 PM
50 points
8 comments17 min readLW link
(thezvi.wordpress.com)

China Covid #4

ZviDec 22, 2022, 4:30 PM
50 points
2 comments11 min readLW link
(thezvi.wordpress.com)

Take 7: You should talk about “the hu­man’s util­ity func­tion” less.

Charlie SteinerDec 8, 2022, 8:14 AM
50 points
22 comments2 min readLW link

Next Level Seinfeld

ZviDec 19, 2022, 1:30 PM
50 points
8 comments1 min readLW link
(thezvi.wordpress.com)

My Reser­va­tions about Dis­cov­er­ing La­tent Knowl­edge (Burns, Ye, et al)

Robert_AIZIDec 27, 2022, 5:27 PM
50 points
0 comments4 min readLW link
(aizi.substack.com)

Ap­pli­ca­tions open for AGI Safety Fun­da­men­tals: Align­ment Course

Richard_NgoDec 13, 2022, 6:31 PM
49 points
0 comments2 min readLW link

Ba­sic build­ing blocks of de­pen­dent type theory

Thomas KehrenbergDec 15, 2022, 2:54 PM
49 points
9 comments13 min readLW link