One Minute Every Moment

abramdemskiSep 1, 2023, 8:23 PM
125 points
23 comments3 min readLW link

Ten­sor Trust: An on­line game to un­cover prompt in­jec­tion vulnerabilities

Sep 1, 2023, 7:31 PM
30 points
0 comments5 min readLW link
(tensortrust.ai)

Re­pro­duc­ing ARC Evals’ re­cent re­port on lan­guage model agents

Thomas BroadleySep 1, 2023, 4:52 PM
104 points
17 comments3 min readLW link
(thomasbroadley.com)

[Question] Why aren’t more peo­ple in AIS fa­mil­iar with PDP?

PrometheusSep 1, 2023, 3:27 PM
4 points
9 comments1 min readLW link

AGI isn’t just a technology

Seth HerdSep 1, 2023, 2:35 PM
18 points
12 comments2 min readLW link

Can an LLM iden­tify ring-com­po­si­tion in a liter­ary text? [ChatGPT]

Bill BenzonSep 1, 2023, 2:18 PM
4 points
2 comments11 min readLW link

What is OpenAI’s plan for mak­ing AI Safer?

brookSep 1, 2023, 11:15 AM
6 points
0 comments4 min readLW link
(aisafetyexplained.substack.com)

Progress links di­gest, 2023-09-01: How an­cient peo­ple ma­nipu­lated wa­ter, and more

jasoncrawfordSep 1, 2023, 4:33 AM
13 points
4 comments6 min readLW link
(rootsofprogress.org)

A Golden Age of Build­ing? Ex­cerpts and les­sons from Em­pire State, Pen­tagon, Skunk Works and SpaceX

Bird ConceptSep 1, 2023, 4:03 AM
188 points
26 comments24 min readLW link1 review

[Question] Would AI ex­perts ever agree that AGI sys­tems have at­tained “con­scious­ness”?

Super AGISep 1, 2023, 3:57 AM
−16 points
6 comments1 min readLW link

Meta Ques­tions about Metaphilosophy

Wei DaiSep 1, 2023, 1:17 AM
161 points
80 comments3 min readLW link

[Linkpost] Michael Niel­sen re­marks on ‘Op­pen­heimer’

22tomAug 31, 2023, 3:46 PM
78 points
7 comments2 min readLW link
(michaelnotebook.com)

My thoughts on AI and per­sonal fu­ture plan af­ter learn­ing about AI Safety for 4 months

Ziyue WangAug 31, 2023, 3:32 PM
7 points
0 comments4 min readLW link

Which Ques­tions Are An­thropic Ques­tions?

dadadarrenAug 31, 2023, 3:15 PM
16 points
13 comments3 min readLW link

The Tree of Life, and a Note on Job

Bill BenzonAug 31, 2023, 2:03 PM
13 points
7 comments4 min readLW link

Clean­ing a SoundCraft Mixer

jefftkAug 31, 2023, 1:20 PM
11 points
0 comments1 min readLW link
(www.jefftk.com)

AI #27: Por­tents of Gemini

ZviAug 31, 2023, 12:40 PM
54 points
37 comments47 min readLW link
(thezvi.wordpress.com)

[CANCELLED DUE TO ILLNESS] San Fran­cisco ACX Meetup “First Satur­day”

guenaelAug 31, 2023, 12:34 PM
1 point
0 comments1 min readLW link

Long-Term Fu­ture Fund Ask Us Any­thing (Septem­ber 2023)

Aug 31, 2023, 12:28 AM
33 points
6 comments1 min readLW link
(forum.effectivealtruism.org)

Re­sponses to ap­par­ent ra­tio­nal­ist con­fu­sions about game /​ de­ci­sion theory

Anthony DiGiovanniAug 30, 2023, 10:02 PM
142 points
20 comments12 min readLW link1 review

In­vuln­er­a­ble In­com­plete Prefer­ences: A For­mal Statement

SCPAug 30, 2023, 9:59 PM
134 points
39 comments35 min readLW link

Re­port on Fron­tier Model Training

YafahEdelmanAug 30, 2023, 8:02 PM
122 points
21 comments21 min readLW link
(docs.google.com)

An ad­ver­sar­ial ex­am­ple for Direct Logit At­tri­bu­tion: mem­ory man­age­ment in gelu-4l

Aug 30, 2023, 5:36 PM
17 points
0 comments8 min readLW link
(arxiv.org)

A Let­ter to the Edi­tor of MIT Tech­nol­ogy Review

JeffsAug 30, 2023, 4:59 PM
0 points
0 comments2 min readLW link

Biose­cu­rity Cul­ture, Com­puter Se­cu­rity Culture

jefftkAug 30, 2023, 4:40 PM
103 points
11 comments2 min readLW link
(www.jefftk.com)

Why I hang out at LessWrong and why you should check-in there ev­ery now and then

Bill BenzonAug 30, 2023, 3:20 PM
16 points
5 comments5 min readLW link

“Want­ing” and “lik­ing”

Mateusz BagińskiAug 30, 2023, 2:52 PM
23 points
3 comments29 min readLW link

Open Call for Re­search As­sis­tants in Devel­op­men­tal Interpretability

Aug 30, 2023, 9:02 AM
55 points
11 comments4 min readLW link

LTFF and EAIF are un­usu­ally fund­ing-con­strained right now

Aug 30, 2023, 1:03 AM
90 points
24 comments15 min readLW link
(forum.effectivealtruism.org)

Paper Walk­through: Au­to­mated Cir­cuit Dis­cov­ery with Arthur Conmy

Neel NandaAug 29, 2023, 10:07 PM
36 points
1 comment1 min readLW link
(www.youtube.com)

An OV-Co­her­ent Toy Model of At­ten­tion Head Superposition

Aug 29, 2023, 7:44 PM
26 points
2 comments6 min readLW link

The Eco­nomics of the As­teroid Deflec­tion Prob­lem (Dom­i­nant As­surance Con­tracts)

moyamoAug 29, 2023, 6:28 PM
78 points
71 comments15 min readLW link

Demo­cratic Fine-Tuning

Joe EdelmanAug 29, 2023, 6:13 PM
22 points
2 comments1 min readLW link
(open.substack.com)

Should ra­tio­nal­ists (be seen to) win?

Will_PearsonAug 29, 2023, 6:13 PM
6 points
7 comments1 min readLW link

Frank­furt meetup

sultanAug 29, 2023, 6:10 PM
2 points
0 comments1 min readLW link

Is­tan­bul meetup

sultanAug 29, 2023, 6:10 PM
2 points
0 comments1 min readLW link

Bro­ken Bench­mark: MMLU

awgAug 29, 2023, 6:09 PM
24 points
5 comments1 min readLW link
(www.youtube.com)

AISN #20: LLM Pro­lifer­a­tion, AI De­cep­tion, and Con­tin­u­ing Drivers of AI Capabilities

Dan HAug 29, 2023, 3:07 PM
12 points
0 comments8 min readLW link
(newsletter.safe.ai)

Loft Bed Fan Guard

jefftkAug 29, 2023, 1:30 PM
16 points
3 comments1 min readLW link
(www.jefftk.com)

Dat­ing Roundup #1: This is Why You’re Single

ZviAug 29, 2023, 12:50 PM
87 points
28 comments38 min readLW link
(thezvi.wordpress.com)

Neu­ral Rec­og­niz­ers: Some [old] notes based on a TV tube metaphor [per­cep­tual con­tact with the world]

Bill BenzonAug 29, 2023, 11:33 AM
4 points
0 comments5 min readLW link

Bar­ri­ers to Mechanis­tic In­ter­pretabil­ity for AGI Safety

Connor LeahyAug 29, 2023, 10:56 AM
63 points
13 comments1 min readLW link
(www.youtube.com)

New­comb Variant

lsusrAug 29, 2023, 7:02 AM
25 points
23 comments1 min readLW link

[Question] In­cen­tives af­fect­ing al­ign­ment-re­searcher encouragement

Nicholas / Heather KrossAug 29, 2023, 5:11 AM
28 points
3 comments1 min readLW link

Any­one want to de­bate pub­li­cly about FDT?

omnizoidAug 29, 2023, 3:45 AM
13 points
31 comments1 min readLW link

AI De­cep­tion: A Sur­vey of Ex­am­ples, Risks, and Po­ten­tial Solutions

Aug 29, 2023, 1:29 AM
54 points
3 comments10 min readLW link

An In­ter­pretabil­ity Illu­sion for Ac­ti­va­tion Patch­ing of Ar­bi­trary Subspaces

Aug 29, 2023, 1:04 AM
77 points
4 comments1 min readLW link

OpenAI API base mod­els are not syco­phan­tic, at any size

nostalgebraist29 Aug 2023 0:58 UTC
183 points
20 comments2 min readLW link
(colab.research.google.com)

Paradigms and The­ory Choice in AI: Adap­tivity, Econ­omy and Control

particlemania28 Aug 2023 22:19 UTC
4 points
0 comments16 min readLW link

[Question] Hu­man­i­ties In A Post-Con­scious AI World?

Netcentrica28 Aug 2023 21:59 UTC
1 point
1 comment2 min readLW link