The Filan Cabi­net Pod­cast with Oliver Habryka—Transcript

Feb 14, 2023, 2:38 AM
101 points
9 comments72 min readLW link

La­tent vari­ables for pre­dic­tion mar­kets: mo­ti­va­tion, tech­ni­cal guide, and de­sign considerations

tailcalledFeb 12, 2023, 5:54 PM
100 points
25 comments23 min readLW link2 reviews

Don’t ac­cel­er­ate prob­lems you’re try­ing to solve

Feb 15, 2023, 6:11 PM
100 points
27 comments4 min readLW link

Ba­sic facts about lan­guage mod­els dur­ing training

berenFeb 21, 2023, 11:46 AM
98 points
15 comments18 min readLW link

A cir­cuit for Python doc­strings in a 4-layer at­ten­tion-only transformer

Feb 20, 2023, 7:35 PM
96 points
8 comments21 min readLW link

Re­search agenda: For­mal­iz­ing ab­strac­tions of computations

Erik JennerFeb 2, 2023, 4:29 AM
93 points
10 comments31 min readLW link

Covid 2/​23/​23: Your Best Pos­si­ble Situation

ZviFeb 23, 2023, 1:10 PM
92 points
9 comments5 min readLW link
(thezvi.wordpress.com)

Ex­er­cise is Good, Actually

Gordon Seidoh WorleyFeb 2, 2023, 12:09 AM
91 points
27 comments3 min readLW link

SolidGoldMag­ikarp III: Glitch to­ken archaeology

Feb 14, 2023, 10:17 AM
91 points
35 comments16 min readLW link

Ret­ro­spec­tive on the 2022 Con­jec­ture AI Discussions

Andrea_MiottiFeb 24, 2023, 10:41 PM
90 points
5 comments2 min readLW link

De­cep­tive Align­ment is <1% Likely by Default

DavidWFeb 21, 2023, 3:09 PM
89 points
31 comments14 min readLW link1 review

Con­di­tion­ing Pre­dic­tive Models: Large lan­guage mod­els as predictors

Feb 2, 2023, 8:28 PM
88 points
4 comments13 min readLW link

Qual­ities that al­ign­ment men­tors value in ju­nior researchers

Orpheus16Feb 14, 2023, 11:27 PM
88 points
14 comments3 min readLW link

Pod­cast with Oli Habryka on LessWrong /​ Light­cone Infrastructure

DanielFilanFeb 5, 2023, 2:52 AM
88 points
20 comments1 min readLW link
(thefilancabinet.com)

The Cave Alle­gory Re­vis­ited: Un­der­stand­ing GPT’s Worldview

Jan_KulveitFeb 14, 2023, 4:00 PM
86 points
5 comments3 min readLW link

Build­ing and En­ter­tain­ing Couples

Jacob FalkovichFeb 22, 2023, 7:02 PM
86 points
11 comments4 min readLW link

De­ci­sion Trans­former Interpretability

Feb 6, 2023, 7:29 AM
85 points
13 comments24 min readLW link

You are prob­a­bly not a good al­ign­ment re­searcher, and other blatant lies

junk heap homotopyFeb 2, 2023, 1:55 PM
83 points
16 comments2 min readLW link

LLM Ba­sics: Embed­ding Spaces—Trans­former To­ken Vec­tors Are Not Points in Space

NickyPFeb 13, 2023, 6:52 PM
83 points
11 comments15 min readLW link

Ban­kless Pod­cast: 159 - We’re All Gonna Die with Eliezer Yudkowsky

bayesedFeb 20, 2023, 4:42 PM
83 points
54 comments1 min readLW link
(www.youtube.com)

Teleose­man­tics!

abramdemskiFeb 23, 2023, 11:26 PM
82 points
27 comments6 min readLW link1 review

Tools for find­ing in­for­ma­tion on the internet

RomanHaukssonFeb 9, 2023, 5:05 PM
79 points
11 comments2 min readLW link
(roman.computer)

OpenAI/​Microsoft an­nounce “next gen­er­a­tion lan­guage model” in­te­grated into Bing/​Edge

LawrenceCFeb 7, 2023, 8:38 PM
79 points
4 comments1 min readLW link
(blogs.microsoft.com)

Two prob­lems with ‘Si­mu­la­tors’ as a frame

ryan_greenblattFeb 17, 2023, 11:34 PM
79 points
13 comments5 min readLW link

[Linkpost] Google in­vested $300M in An­thropic in late 2022

Orpheus16Feb 3, 2023, 7:13 PM
73 points
14 comments1 min readLW link
(www.ft.com)

Re­view of AI Align­ment Progress

PeterMcCluskeyFeb 7, 2023, 6:57 PM
72 points
32 comments7 min readLW link
(bayesianinvestor.com)

Con­di­tion­ing Pre­dic­tive Models: Outer al­ign­ment via care­ful conditioning

Feb 2, 2023, 8:28 PM
72 points
15 comments57 min readLW link

Why I’m not work­ing on {de­bate, RRM, ELK, nat­u­ral ab­strac­tions}

Steven ByrnesFeb 10, 2023, 7:22 PM
71 points
19 comments10 min readLW link

Prizes for the 2021 Review

RaemonFeb 10, 2023, 7:47 PM
69 points
2 comments4 min readLW link

Here’s Why I’m He­si­tant To Re­spond In More Depth

DirectedEvolutionFeb 6, 2023, 6:36 PM
67 points
10 comments4 min readLW link1 review

Vot­ing Re­sults for the 2021 Review

RaemonFeb 1, 2023, 8:02 AM
66 points
10 comments38 min readLW link

The Prefer­ence Fulfill­ment Hypothesis

Kaj_SotalaFeb 26, 2023, 10:55 AM
66 points
62 comments11 min readLW link

Paper: The Ca­pac­ity for Mo­ral Self-Cor­rec­tion in Large Lan­guage Models (An­thropic)

LawrenceCFeb 16, 2023, 7:47 PM
65 points
9 comments1 min readLW link
(arxiv.org)

On Devel­op­ing a Math­e­mat­i­cal The­ory of In­ter­pretabil­ity

carboniferous_umbraculum Feb 9, 2023, 1:45 AM
64 points
8 comments6 min readLW link

Ra­tion­al­ity-re­lated things I don’t know as of 2023

Adam ZernerFeb 11, 2023, 6:04 AM
64 points
59 comments3 min readLW link

Emer­gent De­cep­tion and Emer­gent Optimization

jsteinhardtFeb 20, 2023, 2:40 AM
64 points
0 comments14 min readLW link
(bounded-regret.ghost.io)

I Am Scared of Post­ing Nega­tive Takes About Bing’s AI

YitzFeb 17, 2023, 8:50 PM
63 points
28 comments1 min readLW link

Speedrun­ning 4 mis­takes you make when your al­ign­ment strat­egy is based on for­mal proof

QuinnFeb 16, 2023, 1:13 AM
63 points
18 comments2 min readLW link

Learn­ing How to Learn (And 20+ Stud­ies)

maxaFeb 26, 2023, 10:46 PM
63 points
12 comments6 min readLW link
(max2c.com)

Aiming for Con­ver­gence Is Like Dis­cour­ag­ing Betting

Zack_M_DavisFeb 1, 2023, 12:03 AM
62 points
18 comments11 min readLW link1 review

Are short timelines ac­tu­ally bad?

joshcFeb 5, 2023, 9:21 PM
61 points
7 comments3 min readLW link

Chris­ti­ano (ARC) and GA (Con­jec­ture) Dis­cuss Align­ment Cruxes

Feb 24, 2023, 11:03 PM
61 points
7 comments47 min readLW link

Bud­dhist Psy­chotech­nol­ogy for With­stand­ing Apoca­lypse Stress

romeostevensitFeb 25, 2023, 3:11 AM
61 points
10 comments5 min readLW link

A mechanis­tic ex­pla­na­tion for SolidGoldMag­ikarp-like to­kens in GPT2

MadHatterFeb 26, 2023, 1:10 AM
61 points
14 comments6 min readLW link

Who in­vented knit­ting? The plot thick­ens

eukaryoteFeb 5, 2023, 12:24 AM
60 points
9 comments19 min readLW link
(eukaryotewritesblog.com)

AGI sys­tems & hu­mans will both need to solve the al­ign­ment problem

Jeffrey LadishFeb 24, 2023, 3:29 AM
59 points
14 comments4 min readLW link

Hu­man beats SOTA Go AI by learn­ing an ad­ver­sar­ial policy

Vanessa KosoyFeb 19, 2023, 9:38 AM
59 points
32 comments1 min readLW link
(goattack.far.ai)

Re­spect Ch­ester­ton-Schel­ling Fences

ShmiFeb 27, 2023, 12:09 AM
58 points
17 comments1 min readLW link

[Question] How se­ri­ously should we take the hy­poth­e­sis that LW is just wrong on how AI will im­pact the 21st cen­tury?

Noosphere89Feb 16, 2023, 3:25 PM
58 points
66 comments1 min readLW link

What is it like do­ing AI safety work?

KatWoodsFeb 21, 2023, 8:12 PM
57 points
2 commentsLW link