Open prob­lems in ac­ti­va­tion engineering

24 Jul 2023 19:46 UTC
43 points
2 comments1 min readLW link
(coda.io)

Sub­di­vi­sions for Use­ful Distil­la­tions?

Sharat Jacob Jacob24 Jul 2023 18:55 UTC
8 points
2 comments2 min readLW link

Op­ti­miz­ing For Ap­proval And Disapproval

Thoth Hermes24 Jul 2023 18:46 UTC
−1 points
0 comments12 min readLW link
(thothhermes.substack.com)

An Opinionated Guide to Com­putabil­ity and Com­plex­ity (Post #0)

Noosphere8924 Jul 2023 17:53 UTC
10 points
10 comments3 min readLW link

Slow­ing down AI progress is an un­der­ex­plored al­ign­ment strategy

Norman Borlaug24 Jul 2023 16:56 UTC
40 points
27 comments5 min readLW link

An­ti­ci­pa­tion in LLMs

derek shiller24 Jul 2023 15:53 UTC
6 points
0 comments13 min readLW link

The cone of free­dom (or, free­dom might only be in­stru­men­tally valuable)

dkl924 Jul 2023 15:38 UTC
−10 points
6 comments2 min readLW link
(dkl9.net)

A re­for­mu­la­tion of Finite Fac­tored Sets

Matthias G. Mayer24 Jul 2023 13:02 UTC
74 points
1 comment8 min readLW link

Brain Effi­ciency Can­nell Prize Con­test Award Ceremony

Alexander Gietelink Oldenziel24 Jul 2023 11:30 UTC
145 points
12 comments7 min readLW link

[Cross­post] An AI Pause Is Hu­man­ity’s Best Bet For Prevent­ing Ex­tinc­tion (TIME)

otto.barten24 Jul 2023 10:07 UTC
12 points
0 comments7 min readLW link
(time.com)

Cry­on­ics and Regret

MvB24 Jul 2023 9:16 UTC
172 points
34 comments2 min readLW link

Ra­tion­al­ity !== Winning

Raemon24 Jul 2023 2:53 UTC
145 points
49 comments4 min readLW link

[Question] Which ra­tio­nal­ity posts are beg­ging for fur­ther prac­ti­cal de­vel­op­ment?

LoganStrohl23 Jul 2023 22:22 UTC
60 points
17 comments1 min readLW link

Please speak unpredictably

dkl923 Jul 2023 22:09 UTC
10 points
16 comments1 min readLW link
(dkl9.net)

QAPR 5: grokking is maybe not *that* big a deal?

Quintin Pope23 Jul 2023 20:14 UTC
114 points
15 comments9 min readLW link

My fa­vorite AI gov­er­nance re­search this year so far

Zach Stein-Perlman23 Jul 2023 16:30 UTC
26 points
1 comment7 min readLW link
(blog.aiimpacts.org)

“Jus­tice, Cher­ryl.”

Zack_M_Davis23 Jul 2023 16:16 UTC
73 points
20 comments9 min readLW link

Sup­ple­men­tary Align­ment In­sights Through a Highly Con­trol­led Shut­down Incentive

Justausername23 Jul 2023 16:08 UTC
4 points
1 comment3 min readLW link

Au­to­g­y­nephilia dis­course is so ab­surdly bad on all sides

tailcalled23 Jul 2023 13:12 UTC
43 points
24 comments2 min readLW link

Ex­am­ples of Prompts that Make GPT-4 Out­put Falsehoods

22 Jul 2023 20:21 UTC
21 points
5 comments6 min readLW link

Think like a con­sul­tant not a salesperson

Adam Zerner22 Jul 2023 19:31 UTC
16 points
5 comments2 min readLW link

Op­ti­miza­tion, loss set at var­i­ance in RL

Clairstan22 Jul 2023 18:25 UTC
1 point
1 comment3 min readLW link

Com­pute Thresh­olds: pro­posed rules to miti­gate risk of a “lab leak” ac­ci­dent dur­ing AI train­ing runs

davidad22 Jul 2023 18:09 UTC
80 points
2 comments2 min readLW link

Apollo Neuro Fol­low Up

Elizabeth22 Jul 2023 17:20 UTC
28 points
0 comments1 min readLW link
(acesounderglass.com)

Ex­pert trap – Ways out (Part 3 of 3)

Paweł Sysiak22 Jul 2023 13:06 UTC
4 points
0 comments9 min readLW link

GPTs’ abil­ity to keep a se­cret is weirdly prompt-dependent

22 Jul 2023 12:21 UTC
31 points
0 comments9 min readLW link

Re­plac­ing the Big Air Purifier

jefftk22 Jul 2023 12:10 UTC
10 points
0 comments1 min readLW link
(www.jefftk.com)

[Question] I’m con­sis­tently over­whelmed by ba­sic obli­ga­tions. Are there any paradigm shifts or other ra­tio­nal­ity-based tips that would be helpful?

Benjamin Hendricks21 Jul 2023 21:10 UTC
60 points
37 comments2 min readLW link

Fun­da­men­tally Fuzzy Con­cepts Can’t Have Crisp Defi­ni­tions: Co­op­er­a­tion and Align­ment vs Math and Physics

VojtaKovarik21 Jul 2023 21:03 UTC
12 points
18 comments3 min readLW link

Cook­ing Air Quality

jefftk21 Jul 2023 19:30 UTC
16 points
1 comment2 min readLW link
(www.jefftk.com)

Re­ward Hack­ing from a Causal Perspective

21 Jul 2023 18:27 UTC
29 points
5 comments7 min readLW link

News : Bi­den-⁠Har­ris Ad­minis­tra­tion Se­cures Vol­un­tary Com­mit­ments from Lead­ing Ar­tifi­cial In­tel­li­gence Com­pa­nies to Man­age the Risks Posed by AI

Jonathan Claybrough21 Jul 2023 18:00 UTC
65 points
9 comments2 min readLW link
(www.whitehouse.gov)

The UAP Dis­clo­sure Act of 2023 and its implications

andeslodes21 Jul 2023 17:21 UTC
36 points
47 comments20 min readLW link
(www.congress.gov)

To use com­put­ers well, learn their rules

dkl921 Jul 2023 17:00 UTC
4 points
6 comments4 min readLW link
(dkl9.net)

BCIs and the ecosys­tem of mod­u­lar minds

beren21 Jul 2023 15:58 UTC
84 points
14 comments11 min readLW link

Pri­ori­ties for the UK Foun­da­tion Models Taskforce

Andrea_Miotti21 Jul 2023 15:23 UTC
105 points
4 comments5 min readLW link
(www.conjecture.dev)

Train­ing Pro­cess Trans­parency through Gra­di­ent In­ter­pretabil­ity: Early ex­per­i­ments on toy lan­guage models

21 Jul 2023 14:52 UTC
56 points
1 comment1 min readLW link

[Question] Can AI Align­ment please cre­ate a Red­dit-like plat­form that would make it much eas­ier for al­ign­ment re­searchers to find and help each other?

Georgeo5721 Jul 2023 14:03 UTC
−5 points
2 comments1 min readLW link

Case for Foun­da­tion Models be­yond English

Varshul Gupta21 Jul 2023 13:59 UTC
1 point
0 comments3 min readLW link
(dubverseblack.substack.com)

Meta is hiring for LLM red team­ing position

Michael Tontchev21 Jul 2023 13:57 UTC
7 points
0 comments1 min readLW link
(us.meta.talentnet.community)

[Linkpost] In­ter­pret­ing Mul­ti­modal Video Trans­form­ers Us­ing Brain Recordings

Bogdan Ionut Cirstea21 Jul 2023 11:26 UTC
5 points
0 comments1 min readLW link

Ber­lin AI Align­ment Open Meetup Au­gust 2023

GuyP21 Jul 2023 10:58 UTC
1 point
0 comments1 min readLW link

De­cod­ing in­ter­me­di­ate ac­ti­va­tions in llama-2-7b

Nina Rimsky21 Jul 2023 5:35 UTC
36 points
3 comments4 min readLW link

GPT-2′s po­si­tional em­bed­ding ma­trix is a helix

AdamYedidia21 Jul 2023 4:16 UTC
42 points
18 comments4 min readLW link

Prob­lems with pre­dic­tive his­tory classes

dkl920 Jul 2023 23:28 UTC
15 points
5 comments1 min readLW link

An­nounce­ment: AI Nar­ra­tions Available for All New LessWrong Posts

20 Jul 2023 22:17 UTC
67 points
28 comments1 min readLW link

AI #21: The Cup Overfloweth

Zvi20 Jul 2023 21:30 UTC
47 points
4 comments64 min readLW link
(thezvi.wordpress.com)

All AGI Safety ques­tions wel­come (es­pe­cially ba­sic ones) [July 2023]

smallsilo20 Jul 2023 20:20 UTC
38 points
40 comments2 min readLW link
(forum.effectivealtruism.org)

Growth of Publi­cly Available Ge­netic Se­quenc­ing Data

jefftk20 Jul 2023 19:50 UTC
11 points
2 comments1 min readLW link
(www.jefftk.com)

Progress links and tweets, 2023-07-20: “A god­dess en­throned on a car”

jasoncrawford20 Jul 2023 18:28 UTC
12 points
4 comments2 min readLW link
(rootsofprogress.org)