[Question] Which ra­tio­nal­ity posts are beg­ging for fur­ther prac­ti­cal de­vel­op­ment?

LoganStrohlJul 23, 2023, 10:22 PM
60 points
17 comments1 min readLW link

Please speak unpredictably

dkl9Jul 23, 2023, 10:09 PM
21 points
16 comments1 min readLW link
(dkl9.net)

QAPR 5: grokking is maybe not *that* big a deal?

Quintin PopeJul 23, 2023, 8:14 PM
114 points
15 comments9 min readLW link

My fa­vorite AI gov­er­nance re­search this year so far

Zach Stein-PerlmanJul 23, 2023, 4:30 PM
26 points
1 comment7 min readLW link
(blog.aiimpacts.org)

“Jus­tice, Cher­ryl.”

Zack_M_DavisJul 23, 2023, 4:16 PM
91 points
21 comments9 min readLW link1 review

Sup­ple­men­tary Align­ment In­sights Through a Highly Con­trol­led Shut­down Incentive

JustausernameJul 23, 2023, 4:08 PM
4 points
1 comment3 min readLW link

Au­to­g­y­nephilia dis­course is so ab­surdly bad on all sides

tailcalledJul 23, 2023, 1:12 PM
44 points
24 comments2 min readLW link

Ex­am­ples of Prompts that Make GPT-4 Out­put Falsehoods

Jul 22, 2023, 8:21 PM
21 points
5 comments6 min readLW link

Think like a con­sul­tant not a salesperson

Adam ZernerJul 22, 2023, 7:31 PM
16 points
5 comments2 min readLW link

Op­ti­miza­tion, loss set at var­i­ance in RL

ClairstanJul 22, 2023, 6:25 PM
1 point
1 comment3 min readLW link

Com­pute Thresh­olds: pro­posed rules to miti­gate risk of a “lab leak” ac­ci­dent dur­ing AI train­ing runs

davidadJul 22, 2023, 6:09 PM
80 points
2 comments2 min readLW link

Apollo Neuro Fol­low Up

ElizabethJul 22, 2023, 5:20 PM
28 points
0 comments1 min readLW link
(acesounderglass.com)

Ex­pert trap – Ways out (Part 3 of 3)

Paweł SysiakJul 22, 2023, 1:06 PM
4 points
0 comments9 min readLW link

GPTs’ abil­ity to keep a se­cret is weirdly prompt-dependent

Jul 22, 2023, 12:21 PM
31 points
0 comments9 min readLW link

Re­plac­ing the Big Air Purifier

jefftkJul 22, 2023, 12:10 PM
10 points
0 comments1 min readLW link
(www.jefftk.com)

[Question] I’m con­sis­tently over­whelmed by ba­sic obli­ga­tions. Are there any paradigm shifts or other ra­tio­nal­ity-based tips that would be helpful?

Benjamin HendricksJul 21, 2023, 9:10 PM
71 points
42 comments2 min readLW link

Fun­da­men­tally Fuzzy Con­cepts Can’t Have Crisp Defi­ni­tions: Co­op­er­a­tion and Align­ment vs Math and Physics

VojtaKovarikJul 21, 2023, 9:03 PM
12 points
18 comments3 min readLW link

Cook­ing Air Quality

jefftkJul 21, 2023, 7:30 PM
16 points
1 comment2 min readLW link
(www.jefftk.com)

Re­ward Hack­ing from a Causal Perspective

Jul 21, 2023, 6:27 PM
29 points
6 comments7 min readLW link

News : Bi­den-⁠Har­ris Ad­minis­tra­tion Se­cures Vol­un­tary Com­mit­ments from Lead­ing Ar­tifi­cial In­tel­li­gence Com­pa­nies to Man­age the Risks Posed by AI

Jonathan ClaybroughJul 21, 2023, 6:00 PM
65 points
10 comments2 min readLW link
(www.whitehouse.gov)

The UAP Dis­clo­sure Act of 2023 and its implications

andeslodesJul 21, 2023, 5:21 PM
36 points
47 comments20 min readLW link
(www.congress.gov)

To use com­put­ers well, learn their rules

dkl9Jul 21, 2023, 5:00 PM
4 points
6 comments4 min readLW link
(dkl9.net)

BCIs and the ecosys­tem of mod­u­lar minds

berenJul 21, 2023, 3:58 PM
88 points
14 comments11 min readLW link

Pri­ori­ties for the UK Foun­da­tion Models Taskforce

Andrea_MiottiJul 21, 2023, 3:23 PM
105 points
4 comments5 min readLW link
(www.conjecture.dev)

Train­ing Pro­cess Trans­parency through Gra­di­ent In­ter­pretabil­ity: Early ex­per­i­ments on toy lan­guage models

Jul 21, 2023, 2:52 PM
56 points
1 comment1 min readLW link

[Question] Can AI Align­ment please cre­ate a Red­dit-like plat­form that would make it much eas­ier for al­ign­ment re­searchers to find and help each other?

Georgeo57Jul 21, 2023, 2:03 PM
−5 points
2 comments1 min readLW link

Case for Foun­da­tion Models be­yond English

Varshul GuptaJul 21, 2023, 1:59 PM
1 point
0 comments3 min readLW link
(dubverseblack.substack.com)

Meta is hiring for LLM red team­ing position

Michael TontchevJul 21, 2023, 1:57 PM
7 points
0 comments1 min readLW link
(us.meta.talentnet.community)

[Linkpost] In­ter­pret­ing Mul­ti­modal Video Trans­form­ers Us­ing Brain Recordings

Bogdan Ionut CirsteaJul 21, 2023, 11:26 AM
5 points
0 comments1 min readLW link

Ber­lin AI Align­ment Open Meetup Au­gust 2023

GuyPJul 21, 2023, 10:58 AM
1 point
0 comments1 min readLW link

De­cod­ing in­ter­me­di­ate ac­ti­va­tions in llama-2-7b

Nina PanicksseryJul 21, 2023, 5:35 AM
39 points
3 comments4 min readLW link

GPT-2′s po­si­tional em­bed­ding ma­trix is a helix

AdamYedidiaJul 21, 2023, 4:16 AM
45 points
21 comments4 min readLW link

Prob­lems with pre­dic­tive his­tory classes

dkl9Jul 20, 2023, 11:28 PM
15 points
5 comments1 min readLW link

An­nounce­ment: AI Nar­ra­tions Available for All New LessWrong Posts

Jul 20, 2023, 10:17 PM
71 points
28 comments1 min readLW link

AI #21: The Cup Overfloweth

ZviJul 20, 2023, 9:30 PM
47 points
4 comments64 min readLW link
(thezvi.wordpress.com)

All AGI Safety ques­tions wel­come (es­pe­cially ba­sic ones) [July 2023]

smallsiloJul 20, 2023, 8:20 PM
38 points
40 comments2 min readLW link
(forum.effectivealtruism.org)

Growth of Publi­cly Available Ge­netic Se­quenc­ing Data

jefftkJul 20, 2023, 7:50 PM
11 points
2 comments1 min readLW link
(www.jefftk.com)

Progress links and tweets, 2023-07-20: “A god­dess en­throned on a car”

jasoncrawfordJul 20, 2023, 6:28 PM
12 points
4 comments2 min readLW link
(rootsofprogress.org)

Boundary Place­ment Rebellion

tailcalledJul 20, 2023, 5:40 PM
54 points
21 comments12 min readLW link

Go­ing Beyond Lin­ear Mode Con­nec­tivity: The Lay­er­wise Lin­ear Fea­ture Connectivity

zhanpeng_zhouJul 20, 2023, 5:38 PM
22 points
13 comments3 min readLW link
(openreview.net)

Even Su­per­hu­man Go AIs Have Sur­pris­ing Failure Modes

Jul 20, 2023, 5:31 PM
130 points
22 comments10 min readLW link
(far.ai)

Paper di­ges­tion: “May We Have Your At­ten­tion Please? Hu­man-Rights NGOs and the Prob­lem of Global Com­mu­ni­ca­tion”

Klara Helene NielsenJul 20, 2023, 5:08 PM
4 points
1 comment2 min readLW link
(journals.sagepub.com)

The (short) case for pre­dict­ing what Aliens value

Jim BuhlerJul 20, 2023, 3:25 PM
14 points
5 comments3 min readLW link

Does Cir­cuit Anal­y­sis In­ter­pretabil­ity Scale? Ev­i­dence from Mul­ti­ple Choice Ca­pa­bil­ities in Chinchilla

Jul 20, 2023, 10:50 AM
44 points
3 comments2 min readLW link
(arxiv.org)

Spec­u­la­tive in­fer­ences about path de­pen­dence in LLM su­per­vised fine-tun­ing from re­sults on lin­ear mode con­nec­tivity and model souping

RobertKirkJul 20, 2023, 9:56 AM
39 points
2 comments5 min readLW link

A case for ga­mete per­son­hood (re­duc­tio ad ab­sur­dum)

Ansyn1312Jul 20, 2023, 8:25 AM
−1 points
4 comments1 min readLW link

Con­tra Con­tra the So­cial Model of Disability

DirectedEvolutionJul 20, 2023, 6:59 AM
20 points
22 comments16 min readLW link

[Question] Do you speed up ca­pa­bil­ities when you do AI in­te­gra­tions and con­sume over­hangs?

Michael TontchevJul 20, 2023, 6:40 AM
6 points
1 comment1 min readLW link

[Question] How nec­es­sary is in­tu­ition, for ad­vanced math?

Nicholas / Heather KrossJul 20, 2023, 12:18 AM
11 points
8 comments1 min readLW link

Pro­ject Lawful Au­dio­book: An Unoffi­cial Fan Pro­duc­tion with ElevenLabs AI

AskwhoJul 19, 2023, 11:34 PM
22 points
3 comments1 min readLW link
(askwhocastsai.substack.com)