The Witness

Richard_NgoDec 3, 2023, 10:27 PM
105 points
5 comments14 min readLW link
(www.narrativeark.xyz)

Does schem­ing lead to ad­e­quate fu­ture em­pow­er­ment? (Sec­tion 2.3.1.2 of “Schem­ing AIs”)

Joe CarlsmithDec 3, 2023, 6:32 PM
9 points
0 comments17 min readLW link

[Question] How do you do post mortems?

mattoDec 3, 2023, 2:46 PM
9 points
2 comments1 min readLW link

The benefits and risks of op­ti­mism (about AI safety)

Karl von WendtDec 3, 2023, 12:45 PM
−7 points
6 comments5 min readLW link

Book Re­view: 1948 by Benny Morris

Yair HalberstadtDec 3, 2023, 10:29 AM
41 points
9 comments12 min readLW link

Quick takes on “AI is easy to con­trol”

So8resDec 2, 2023, 10:31 PM
26 points
49 comments4 min readLW link

The goal-guard­ing hy­poth­e­sis (Sec­tion 2.3.1.1 of “Schem­ing AIs”)

Joe CarlsmithDec 2, 2023, 3:20 PM
8 points
1 comment15 min readLW link

The Method of Loci: With some brief re­marks, in­clud­ing trans­form­ers and eval­u­at­ing AIs

Bill BenzonDec 2, 2023, 2:36 PM
6 points
0 comments3 min readLW link

Tak­ing Into Ac­count Sen­tient Non-Hu­mans in AI Am­bi­tious Value Learn­ing: Sen­tien­tist Co­her­ent Ex­trap­o­lated Volition

Adrià MoretDec 2, 2023, 2:07 PM
26 points
31 comments42 min readLW link

Out-of-dis­tri­bu­tion Bioattacks

jefftkDec 2, 2023, 12:20 PM
66 points
15 comments2 min readLW link
(www.jefftk.com)

After Align­ment — Dialogue be­tween RogerDear­naley and Seth Herd

Dec 2, 2023, 6:03 AM
15 points
2 comments25 min readLW link

List of strate­gies for miti­gat­ing de­cep­tive alignment

joshcDec 2, 2023, 5:56 AM
38 points
2 comments6 min readLW link

[Question] What is known about in­var­i­ants in self-mod­ify­ing sys­tems?

mishkaDec 2, 2023, 5:04 AM
9 points
2 comments1 min readLW link

2023 Unoffi­cial LessWrong Cen­sus/​Survey

ScrewtapeDec 2, 2023, 4:41 AM
169 points
81 comments1 min readLW link

Pro­tect­ing against sud­den ca­pa­bil­ity jumps dur­ing training

Nikola JurkovicDec 2, 2023, 4:22 AM
15 points
2 comments2 min readLW link

South Bay Pre-Holi­day Gathering

ISDec 2, 2023, 3:21 AM
10 points
2 comments1 min readLW link

MATS Sum­mer 2023 Retrospective

Dec 1, 2023, 11:29 PM
77 points
34 comments26 min readLW link

Com­plex sys­tems re­search as a field (and its rele­vance to AI Align­ment)

Dec 1, 2023, 10:10 PM
65 points
11 comments19 min readLW link

[Question] Could there be “nat­u­ral im­pact reg­u­lariza­tion” or “im­pact reg­u­lariza­tion by de­fault”?

tailcalledDec 1, 2023, 10:01 PM
24 points
6 comments1 min readLW link

Bench­mark­ing Bowtie2 Threading

jefftkDec 1, 2023, 8:20 PM
9 points
0 comments1 min readLW link
(www.jefftk.com)

Please Bet On My Quan­tified Self De­ci­sion Markets

niplavDec 1, 2023, 8:07 PM
36 points
6 comments6 min readLW link

Speci­fi­ca­tion Gam­ing: How AI Can Turn Your Wishes Against You [RA Video]

WriterDec 1, 2023, 7:30 PM
19 points
0 comments5 min readLW link
(youtu.be)

Carv­ing up prob­lems at their joints

Jakub SmékalDec 1, 2023, 6:48 PM
1 point
0 comments2 min readLW link
(jakubsmekal.com)

Queu­ing the­ory: Benefits of op­er­at­ing at 60% capacity

ampdotDec 1, 2023, 6:48 PM
43 points
4 comments1 min readLW link
(less.works)

Re­searchers and writ­ers can ap­ply for proxy ac­cess to the GPT-3.5 base model (code-davinci-002)

ampdotDec 1, 2023, 6:48 PM
14 points
0 comments1 min readLW link
(airtable.com)

Kol­mogorov Com­plex­ity Lays Bare the Soul

jakejDec 1, 2023, 6:29 PM
5 points
8 comments2 min readLW link

Thoughts on “AI is easy to con­trol” by Pope & Belrose

Steven ByrnesDec 1, 2023, 5:30 PM
197 points
63 comments14 min readLW link1 review

Why Did NEPA Peak in 2016?

Maxwell TabarrokDec 1, 2023, 4:18 PM
10 points
0 comments3 min readLW link
(maximumprogress.substack.com)

Wor­lds where I wouldn’t worry about AI risk

adekczDec 1, 2023, 4:06 PM
2 points
0 comments4 min readLW link

How use­ful for al­ign­ment-rele­vant work are AIs with short-term goals? (Sec­tion 2.2.4.3 of “Schem­ing AIs”)

Joe CarlsmithDec 1, 2023, 2:51 PM
10 points
1 comment7 min readLW link

Real­ity is what­ever you can get away with.

sometimespersonDec 1, 2023, 7:50 AM
−5 points
0 comments1 min readLW link

Re­in­force­ment Learn­ing us­ing Lay­ered Mor­phol­ogy (RLLM)

MiguelDevDec 1, 2023, 5:18 AM
7 points
0 comments29 min readLW link

[Question] Is OpenAI los­ing money on each re­quest?

thenoviceoofDec 1, 2023, 3:27 AM
8 points
8 comments5 min readLW link

How use­ful is mechanis­tic in­ter­pretabil­ity?

Dec 1, 2023, 2:54 AM
167 points
54 comments25 min readLW link

FixDT

abramdemskiNov 30, 2023, 9:57 PM
64 points
15 comments14 min readLW link1 review

Gen­er­al­iza­tion, from ther­mo­dy­nam­ics to statis­ti­cal physics

Jesse HooglandNov 30, 2023, 9:28 PM
64 points
9 comments28 min readLW link

What’s next for the field of Agent Foun­da­tions?

Nov 30, 2023, 5:55 PM
59 points
23 comments10 min readLW link

A Pro­posed Cure for Alzheimer’s Disease???

MadHatterNov 30, 2023, 5:37 PM
4 points
30 comments2 min readLW link

AI #40: A Vi­sion from Vitalik

ZviNov 30, 2023, 5:30 PM
53 points
12 comments42 min readLW link
(thezvi.wordpress.com)

Is schem­ing more likely in mod­els trained to have long-term goals? (Sec­tions 2.2.4.1-2.2.4.2 of “Schem­ing AIs”)

Joe CarlsmithNov 30, 2023, 4:43 PM
8 points
0 comments6 min readLW link

A For­mula for Violence (and Its An­ti­dote)

MadHatterNov 30, 2023, 4:04 PM
−22 points
6 comments1 min readLW link
(blog.simpleheart.org)

Enkrateia: a safe model-based re­in­force­ment learn­ing algorithm

MadHatterNov 30, 2023, 3:51 PM
−15 points
4 comments2 min readLW link
(github.com)

Nor­ma­tive Ethics vs Utilitarianism

Logan ZoellnerNov 30, 2023, 3:36 PM
6 points
0 comments2 min readLW link
(midwitalignment.substack.com)

In­for­ma­tion-The­o­retic Box­ing of Superintelligences

Nov 30, 2023, 2:31 PM
30 points
0 comments7 min readLW link

OpenAI: Alt­man Returns

ZviNov 30, 2023, 2:10 PM
66 points
12 comments11 min readLW link
(thezvi.wordpress.com)

[Linkpost] Re­marks on the Con­ver­gence in Distri­bu­tion of Ran­dom Neu­ral Net­works to Gaus­sian Pro­cesses in the In­finite Width Limit

carboniferous_umbraculum Nov 30, 2023, 2:01 PM
9 points
0 comments1 min readLW link
(drive.google.com)

[Question] Buy Noth­ing Day is a great idea with a ter­rible app— why has no­body built a kil­ler app for crowd­sourced ‘effec­tive com­mu­nism’ yet?

lillybaeumNov 30, 2023, 1:47 PM
8 points
17 comments1 min readLW link

[Question] Com­pre­hen­si­ble In­put is the only way peo­ple learn lan­guages—is it the only way peo­ple *learn*?

lillybaeumNov 30, 2023, 1:31 PM
8 points
2 comments3 min readLW link

Some In­tu­itions for the Ethicophysics

Nov 30, 2023, 6:47 AM
2 points
4 comments8 min readLW link

The Align­ment Agenda THEY Don’t Want You to Know About

MadHatterNov 30, 2023, 4:29 AM
−19 points
16 comments1 min readLW link