My cur­rent LK99 questions

Eliezer YudkowskyAug 1, 2023, 10:48 PM
206 points
38 comments5 min readLW link

Spiral Staircase

Michael SamoilovAug 1, 2023, 9:51 PM
19 points
2 comments2 min readLW link

Open Mic—Au­gust 2023

Adam ZernerAug 1, 2023, 7:24 PM
8 points
0 comments1 min readLW link

ARC Evals new re­port: Eval­u­at­ing Lan­guage-Model Agents on Real­is­tic Au­tonomous Tasks

Beth BarnesAug 1, 2023, 6:30 PM
153 points
12 comments5 min readLW link
(evals.alignment.org)

[Question] When(if ever) are su­per­stim­uli good/​use­ful/​ad­van­ta­geous?

PerhapsAug 1, 2023, 3:50 PM
−7 points
2 comments1 min readLW link

AISN #17: Au­to­mat­i­cally Cir­cum­vent­ing LLM Guardrails, the Fron­tier Model Fo­rum, and Se­nate Hear­ing on AI Oversight

Dan HAug 1, 2023, 3:40 PM
8 points
0 comments8 min readLW link
(newsletter.safe.ai)

AISN #16: White House Se­cures Vol­un­tary Com­mit­ments from Lead­ing AI Labs and Les­sons from Oppenheimer

Aug 1, 2023, 3:39 PM
3 points
0 comments6 min readLW link
(newsletter.safe.ai)

“Des­per­ate Hon­esty” by Agnes Callard

David GrossAug 1, 2023, 1:34 PM
11 points
0 comments2 min readLW link
(dailynous.com)

Bar­bieheimer: Across the Dead Reckoning

ZviAug 1, 2023, 1:00 PM
49 points
17 comments41 min readLW link
(thezvi.wordpress.com)

Un­tan­gling In­frabayesi­anism: A re­dis­til­la­tion [PDF link; ~12k words + lots of math]

LorxusAug 1, 2023, 12:42 PM
29 points
16 comments2 min readLW link
(docdro.id)

What Is Child­hood Sup­posed To Be?

SableAug 1, 2023, 9:51 AM
21 points
13 comments3 min readLW link
(affablyevil.substack.com)

AI ro­man­tic part­ners will harm so­ciety if they go unregulated

Roman LeventovAug 1, 2023, 9:32 AM
26 points
76 comments13 min readLW link

What is au­ton­omy, and how does it lead to greater risk from AI?

DavidmanheimAug 1, 2023, 7:58 AM
30 points
0 comments6 min readLW link

Eval­u­at­ing Su­per­hu­man Models with Con­sis­tency Checks

Aug 1, 2023, 7:51 AM
21 points
2 comments9 min readLW link
(arxiv.org)

[See link to Sept meetup be­low!] San Fran­cisco ACX Meetup “First Satur­day” Au­gust 5, 1 pm

guenaelAug 1, 2023, 3:38 AM
1 point
0 comments1 min readLW link

[Question] Ex­er­cise: Solve “Think­ing Physics”

RaemonAug 1, 2023, 12:44 AM
102 points
30 comments5 min readLW link1 review

The “pub­lic de­bate” about AI is con­fus­ing for the gen­eral pub­lic and for poli­cy­mak­ers be­cause it is a three-sided de­bate

Adam David LongAug 1, 2023, 12:08 AM
146 points
30 comments4 min readLW link

The “no sand­bag­ging on check­able tasks” hypothesis

Joe CarlsmithJul 31, 2023, 11:06 PM
61 points
14 comments9 min readLW link

A So­cial His­tory of Truth

VaniverJul 31, 2023, 10:49 PM
64 points
2 comments14 min readLW link

Water­mark­ing con­sid­ered over­rated?

DanielFilanJul 31, 2023, 9:36 PM
19 points
4 comments1 min readLW link

What The Lord of the Rings Teaches Us About AI Alignment

Jeffrey HeningerJul 31, 2023, 8:16 PM
24 points
12 comments7 min readLW link

The “spel­ling mir­a­cle”: GPT-3 spel­ling abil­ities and glitch to­kens revisited

mwatkinsJul 31, 2023, 7:47 PM
85 points
29 comments20 min readLW link

“Build­ing a House” Review

jefftkJul 31, 2023, 7:20 PM
62 points
6 comments1 min readLW link
(www.jefftk.com)

The Mean­ing of Shog­goth AI Memes

Dan SmithJul 31, 2023, 6:52 PM
−5 points
5 comments2 min readLW link

[Question] Is there any ex­ist­ing term sum­ma­riz­ing non-scal­able over­sight meth­ods in outer al­ign­ment?

Allen ShenJul 31, 2023, 5:31 PM
1 point
0 comments1 min readLW link

Lack of So­cial Grace Is an Epistemic Virtue

Zack_M_DavisJul 31, 2023, 4:38 PM
41 points
105 comments4 min readLW link2 reviews

Thoughts on shar­ing in­for­ma­tion about lan­guage model capabilities

paulfchristianoJul 31, 2023, 4:04 PM
210 points
44 comments11 min readLW link1 review

Trad­ing off com­pute in train­ing and in­fer­ence (Overview)

Pablo VillalobosJul 31, 2023, 4:03 PM
42 points
2 comments7 min readLW link
(epochai.org)

Open Prob­lems and Fun­da­men­tal Limi­ta­tions of RLHF

scasperJul 31, 2023, 3:31 PM
66 points
6 comments2 min readLW link
(arxiv.org)

“Not Ne­c­es­sar­ily”

Benjamin HendricksJul 31, 2023, 3:19 PM
24 points
2 comments2 min readLW link

How to find AI al­ign­ment re­searchers to col­lab­o­rate with?

Florian DietzJul 31, 2023, 9:05 AM
2 points
2 comments1 min readLW link

[Question] Is Kennedy a Nazi?

Pee DoomJul 31, 2023, 8:51 AM
−12 points
10 comments2 min readLW link

Is Light Drink­ing Pro­tec­tive?

jefftkJul 31, 2023, 3:00 AM
45 points
8 comments2 min readLW link
(www.jefftk.com)

EU’s AI am­bi­tions at risk as US pushes to wa­ter down in­ter­na­tional treaty (linkpost)

micJul 31, 2023, 12:34 AM
10 points
0 comments4 min readLW link
(www.euractiv.com)

The rise of AI in cybercrime

BobyResearcherJul 30, 2023, 8:19 PM
−15 points
1 comment2 min readLW link
(riseofAIincybercryme)

SSA vs. SIA: how fu­ture pop­u­la­tion may provide ev­i­dence for or against the foun­da­tions of poli­ti­cal liberalism

jJul 30, 2023, 8:18 PM
−6 points
10 comments55 min readLW link

Ra­tion­al­iza­tion Max­i­mizes Ex­pected Value

Kevin DorstJul 30, 2023, 8:11 PM
19 points
10 comments7 min readLW link
(kevindorst.substack.com)

Apollo Neuro Results

ElizabethJul 30, 2023, 6:40 PM
85 points
17 comments3 min readLW link
(acesounderglass.com)

Hilbert’s Triumph, Church and Tur­ing’s failure, and what it means (Post #2)

Noosphere89Jul 30, 2023, 2:33 PM
−5 points
16 comments15 min readLW link

[Question] Spe­cific Ar­gu­ments against open source LLMs?

IknownothingJul 30, 2023, 2:27 PM
4 points
2 comments1 min readLW link

So­cial­ism in large organizations

Adam ZernerJul 30, 2023, 7:25 AM
7 points
16 comments2 min readLW link

How to make real-money pre­dic­tion mar­kets on ar­bi­trary top­ics (Out­dated)

yutakaJul 30, 2023, 2:11 AM
57 points
13 comments3 min readLW link

[Question] Does de­cid­abil­ity of a the­ory im­ply com­plete­ness of the the­ory?

Noosphere89Jul 29, 2023, 11:53 PM
6 points
12 comments1 min readLW link

[Question] If I showed the EQ-SQ the­ory’s find­ings to be due to mea­sure­ment bias, would any­one change their minds about it?

tailcalledJul 29, 2023, 7:38 PM
23 points
13 comments1 min readLW link

Self-driv­ing car bets

paulfchristianoJul 29, 2023, 6:10 PM
236 points
44 comments5 min readLW link
(sideways-view.com)

The Parable of the Dag­ger—The Animation

WriterJul 29, 2023, 2:03 PM
20 points
6 comments1 min readLW link
(youtu.be)

Are Guitars Ob­so­lete?

jefftkJul 29, 2023, 1:20 PM
11 points
8 comments2 min readLW link
(www.jefftk.com)

NAMSI: A promis­ing ap­proach to alignment

Georgeo57Jul 29, 2023, 7:03 AM
−6 points
6 comments1 min readLW link

Un­der­stand­ing and Align­ing a Hu­man-like In­duc­tive Bias with Cog­ni­tive Science: a Re­view of Re­lated Liter­a­ture

Claire ShortJul 29, 2023, 6:10 AM
27 points
0 comments12 min readLW link

Why You Should Never Up­date Your Beliefs

Arjun PanicksseryJul 29, 2023, 12:27 AM
76 points
18 comments4 min readLW link1 review
(arjunpanickssery.substack.com)