Wor­lds where I wouldn’t worry about AI risk

adekczDec 1, 2023, 4:06 PM
2 points
0 comments4 min readLW link

How use­ful for al­ign­ment-rele­vant work are AIs with short-term goals? (Sec­tion 2.2.4.3 of “Schem­ing AIs”)

Joe CarlsmithDec 1, 2023, 2:51 PM
10 points
1 comment7 min readLW link

Real­ity is what­ever you can get away with.

sometimespersonDec 1, 2023, 7:50 AM
−5 points
0 comments1 min readLW link

Re­in­force­ment Learn­ing us­ing Lay­ered Mor­phol­ogy (RLLM)

MiguelDevDec 1, 2023, 5:18 AM
7 points
0 comments29 min readLW link

[Question] Is OpenAI los­ing money on each re­quest?

thenoviceoofDec 1, 2023, 3:27 AM
8 points
8 comments5 min readLW link

How use­ful is mechanis­tic in­ter­pretabil­ity?

Dec 1, 2023, 2:54 AM
167 points
54 comments25 min readLW link

FixDT

abramdemskiNov 30, 2023, 9:57 PM
64 points
15 comments14 min readLW link1 review

Gen­er­al­iza­tion, from ther­mo­dy­nam­ics to statis­ti­cal physics

Jesse HooglandNov 30, 2023, 9:28 PM
64 points
9 comments28 min readLW link

What’s next for the field of Agent Foun­da­tions?

Nov 30, 2023, 5:55 PM
59 points
23 comments10 min readLW link

A Pro­posed Cure for Alzheimer’s Disease???

MadHatterNov 30, 2023, 5:37 PM
4 points
30 comments2 min readLW link

AI #40: A Vi­sion from Vitalik

ZviNov 30, 2023, 5:30 PM
53 points
12 comments42 min readLW link
(thezvi.wordpress.com)

Is schem­ing more likely in mod­els trained to have long-term goals? (Sec­tions 2.2.4.1-2.2.4.2 of “Schem­ing AIs”)

Joe CarlsmithNov 30, 2023, 4:43 PM
8 points
0 comments6 min readLW link

A For­mula for Violence (and Its An­ti­dote)

MadHatterNov 30, 2023, 4:04 PM
−22 points
6 comments1 min readLW link
(blog.simpleheart.org)

Enkrateia: a safe model-based re­in­force­ment learn­ing algorithm

MadHatterNov 30, 2023, 3:51 PM
−15 points
4 comments2 min readLW link
(github.com)

Nor­ma­tive Ethics vs Utilitarianism

Logan ZoellnerNov 30, 2023, 3:36 PM
6 points
0 comments2 min readLW link
(midwitalignment.substack.com)

In­for­ma­tion-The­o­retic Box­ing of Superintelligences

Nov 30, 2023, 2:31 PM
30 points
0 comments7 min readLW link

OpenAI: Alt­man Returns

ZviNov 30, 2023, 2:10 PM
66 points
12 comments11 min readLW link
(thezvi.wordpress.com)

[Linkpost] Re­marks on the Con­ver­gence in Distri­bu­tion of Ran­dom Neu­ral Net­works to Gaus­sian Pro­cesses in the In­finite Width Limit

carboniferous_umbraculum Nov 30, 2023, 2:01 PM
9 points
0 comments1 min readLW link
(drive.google.com)

[Question] Buy Noth­ing Day is a great idea with a ter­rible app— why has no­body built a kil­ler app for crowd­sourced ‘effec­tive com­mu­nism’ yet?

lillybaeumNov 30, 2023, 1:47 PM
8 points
17 comments1 min readLW link

[Question] Com­pre­hen­si­ble In­put is the only way peo­ple learn lan­guages—is it the only way peo­ple *learn*?

lillybaeumNov 30, 2023, 1:31 PM
8 points
2 comments3 min readLW link

Some In­tu­itions for the Ethicophysics

Nov 30, 2023, 6:47 AM
2 points
4 comments8 min readLW link

The Align­ment Agenda THEY Don’t Want You to Know About

MadHatterNov 30, 2023, 4:29 AM
−19 points
16 comments1 min readLW link

Cis fragility

[deactivated]Nov 30, 2023, 4:14 AM
−51 points
9 comments3 min readLW link

Home­work An­swer: Glicko Rat­ings for War

MadHatterNov 30, 2023, 4:08 AM
−45 points
1 comment77 min readLW link
(gist.github.com)

[Question] Fea­ture Re­quest for LessWrong

MadHatterNov 30, 2023, 3:19 AM
11 points
8 comments1 min readLW link

My Align­ment Re­search Agenda (“the Ethico­physics”)

MadHatterNov 30, 2023, 2:57 AM
−13 points
0 comments1 min readLW link

[Question] Stupid Ques­tion: Why am I get­ting con­sis­tently down­voted?

MadHatterNov 30, 2023, 12:21 AM
31 points
138 comments1 min readLW link

Inos­i­tol Non-Results

ElizabethNov 29, 2023, 9:40 PM
20 points
2 comments1 min readLW link
(acesounderglass.com)

Los­ing Me­taphors: Zip and Paste

jefftkNov 29, 2023, 8:31 PM
26 points
6 comments1 min readLW link
(www.jefftk.com)

Pre­serv­ing our her­i­tage: Build­ing a move­ment and a knowl­edge ark for cur­rent and fu­ture generations

rnk8Nov 29, 2023, 7:20 PM
0 points
5 comments12 min readLW link

AGI Align­ment is Absurd

Youssef MohamedNov 29, 2023, 7:11 PM
−9 points
4 comments3 min readLW link

The ori­gins of the steam en­g­ine: An es­say with in­ter­ac­tive an­i­mated diagrams

jasoncrawfordNov 29, 2023, 6:30 PM
30 points
1 comment1 min readLW link
(rootsofprogress.org)

ChatGPT 4 solved all the gotcha prob­lems I posed that tripped ChatGPT 3.5

VipulNaikNov 29, 2023, 6:11 PM
33 points
16 comments14 min readLW link

“Clean” vs. “messy” goal-di­rect­ed­ness (Sec­tion 2.2.3 of “Schem­ing AIs”)

Joe CarlsmithNov 29, 2023, 4:32 PM
29 points
1 comment11 min readLW link

Ly­ing Align­ment Chart

Zack_M_DavisNov 29, 2023, 4:15 PM
77 points
17 comments1 min readLW link

Re­think Pri­ori­ties: Seek­ing Ex­pres­sions of In­ter­est for Spe­cial Pro­jects Next Year

kierangreigNov 29, 2023, 1:59 PM
4 points
0 comments5 min readLW link

[Question] Thoughts on tele­trans­porta­tion with copies?

titotalNov 29, 2023, 12:56 PM
15 points
13 comments1 min readLW link

In­ter­pretabil­ity with Sparse Au­toen­coders (Co­lab ex­er­cises)

CallumMcDougallNov 29, 2023, 12:56 PM
76 points
9 comments4 min readLW link

The 101 Space You Will Always Have With You

ScrewtapeNov 29, 2023, 4:56 AM
277 points
23 comments6 min readLW link1 review

Trust your in­tu­ition—Kah­ne­man’s book misses the for­est for the trees

mnvrNov 29, 2023, 4:37 AM
−2 points
2 comments2 min readLW link

Pro­cess Sub­sti­tu­tion Without Shell?

jefftkNov 29, 2023, 3:20 AM
19 points
18 comments2 min readLW link
(www.jefftk.com)

De­cep­tion Chess: Game #2

ZaneNov 29, 2023, 2:43 AM
29 points
17 comments2 min readLW link

Black Box Biology

GeneSmithNov 29, 2023, 2:27 AM
65 points
30 comments2 min readLW link

[Question] What would be the shelf life of nu­clear weapon-se­crecy if nu­clear weapons had not im­me­di­ately been used in com­bat?

Gram StoneNov 29, 2023, 12:53 AM
7 points
2 comments1 min readLW link

Scal­ing laws for dom­i­nant as­surance contracts

jessicataNov 28, 2023, 11:11 PM
36 points
5 comments7 min readLW link
(unstableontology.com)

I’m con­fused about in­nate smell neuroanatomy

Steven ByrnesNov 28, 2023, 8:49 PM
40 points
2 comments9 min readLW link

How to Con­trol an LLM’s Be­hav­ior (why my P(DOOM) went down)

RogerDearnaleyNov 28, 2023, 7:56 PM
65 points
30 comments11 min readLW link

[Question] Is there a word for dis­crim­i­na­tion against A.I.?

Aaron BohannonNov 28, 2023, 7:03 PM
1 point
4 comments1 min readLW link

Up­date #2 to “Dom­i­nant As­surance Con­tract Plat­form”: EnsureDone

moyamoNov 28, 2023, 6:02 PM
33 points
2 comments1 min readLW link

Ethico­physics II: Poli­tics is the Mind-Savior

MadHatterNov 28, 2023, 4:27 PM
−9 points
9 comments4 min readLW link
(bittertruths.substack.com)