Over­com­ing the MWC

Mark FreedJul 25, 2023, 5:31 PM
3 points
0 comments3 min readLW link

Rus­sian par­li­a­men­tar­ian: let’s ban per­sonal com­put­ers and the Internet

RomanSJul 25, 2023, 5:30 PM
11 points
6 comments2 min readLW link

AISN #16: White House Se­cures Vol­un­tary Com­mit­ments from Lead­ing AI Labs and Les­sons from Oppenheimer

Jul 25, 2023, 4:58 PM
6 points
0 comments6 min readLW link
(newsletter.safe.ai)

“The Uni­verse of Minds”—call for re­view­ers (Seeds of Science)

rogersbaconJul 25, 2023, 4:53 PM
7 points
0 comments1 min readLW link

Thoughts on Loss Land­scapes and why Deep Learn­ing works

berenJul 25, 2023, 4:41 PM
53 points
4 comments18 min readLW link

Should you work at a lead­ing AI lab? (in­clud­ing in non-safety roles)

Benjamin HiltonJul 25, 2023, 4:29 PM
7 points
0 comments12 min readLW link

Whisper’s Word-Level Times­tamps are Out

Varshul GuptaJul 25, 2023, 2:32 PM
−18 points
2 comments2 min readLW link
(dubverseblack.substack.com)

AIS 101: Task de­com­po­si­tion for scal­able oversight

Charbel-RaphaëlJul 25, 2023, 1:34 PM
35 points
0 comments19 min readLW link
(docs.google.com)

An­thropic Observations

ZviJul 25, 2023, 12:50 PM
104 points
1 comment10 min readLW link
(thezvi.wordpress.com)

Au­tonomous Align­ment Over­sight Frame­work (AAOF)

JustausernameJul 25, 2023, 10:25 AM
−9 points
0 comments4 min readLW link

How LLMs are and are not myopic

janusJul 25, 2023, 2:19 AM
135 points
16 comments8 min readLW link

Se­cure Hand Holding

jefftkJul 25, 2023, 1:40 AM
28 points
43 comments1 min readLW link
(www.jefftk.com)

Open prob­lems in ac­ti­va­tion engineering

Jul 24, 2023, 7:46 PM
51 points
2 comments1 min readLW link
(coda.io)

Sub­di­vi­sions for Use­ful Distil­la­tions?

Sharat Jacob JacobJul 24, 2023, 6:55 PM
9 points
2 comments2 min readLW link

Op­ti­miz­ing For Ap­proval And Disapproval

Thoth HermesJul 24, 2023, 6:46 PM
−1 points
0 comments12 min readLW link
(thothhermes.substack.com)

An Opinionated Guide to Com­putabil­ity and Com­plex­ity (Post #0)

Noosphere89Jul 24, 2023, 5:53 PM
10 points
10 comments3 min readLW link

Slow­ing down AI progress is an un­der­ex­plored al­ign­ment strategy

Norman BorlaugJul 24, 2023, 4:56 PM
42 points
27 comments5 min readLW link

An­ti­ci­pa­tion in LLMs

derek shillerJul 24, 2023, 3:53 PM
6 points
0 comments13 min readLW link

The cone of free­dom (or, free­dom might only be in­stru­men­tally valuable)

dkl9Jul 24, 2023, 3:38 PM
−10 points
6 comments2 min readLW link
(dkl9.net)

A re­for­mu­la­tion of Finite Fac­tored Sets

Matthias G. MayerJul 24, 2023, 1:02 PM
76 points
1 comment8 min readLW link

Brain Effi­ciency Can­nell Prize Con­test Award Ceremony

Alexander Gietelink OldenzielJul 24, 2023, 11:30 AM
149 points
12 comments7 min readLW link

[Cross­post] An AI Pause Is Hu­man­ity’s Best Bet For Prevent­ing Ex­tinc­tion (TIME)

otto.bartenJul 24, 2023, 10:07 AM
12 points
0 comments7 min readLW link
(time.com)

Cry­on­ics and Regret

MvBJul 24, 2023, 9:16 AM
192 points
35 comments2 min readLW link1 review

Ra­tion­al­ity !== Winning

RaemonJul 24, 2023, 2:53 AM
170 points
51 comments4 min readLW link

[Question] Which ra­tio­nal­ity posts are beg­ging for fur­ther prac­ti­cal de­vel­op­ment?

LoganStrohlJul 23, 2023, 10:22 PM
60 points
17 comments1 min readLW link

Please speak unpredictably

dkl9Jul 23, 2023, 10:09 PM
21 points
16 comments1 min readLW link
(dkl9.net)

QAPR 5: grokking is maybe not *that* big a deal?

Quintin PopeJul 23, 2023, 8:14 PM
114 points
15 comments9 min readLW link

My fa­vorite AI gov­er­nance re­search this year so far

Zach Stein-PerlmanJul 23, 2023, 4:30 PM
26 points
1 comment7 min readLW link
(blog.aiimpacts.org)

“Jus­tice, Cher­ryl.”

Zack_M_DavisJul 23, 2023, 4:16 PM
91 points
21 comments9 min readLW link1 review

Sup­ple­men­tary Align­ment In­sights Through a Highly Con­trol­led Shut­down Incentive

JustausernameJul 23, 2023, 4:08 PM
4 points
1 comment3 min readLW link

Au­to­g­y­nephilia dis­course is so ab­surdly bad on all sides

tailcalledJul 23, 2023, 1:12 PM
44 points
24 comments2 min readLW link

Ex­am­ples of Prompts that Make GPT-4 Out­put Falsehoods

Jul 22, 2023, 8:21 PM
21 points
5 comments6 min readLW link

Think like a con­sul­tant not a salesperson

Adam ZernerJul 22, 2023, 7:31 PM
16 points
5 comments2 min readLW link

Op­ti­miza­tion, loss set at var­i­ance in RL

ClairstanJul 22, 2023, 6:25 PM
1 point
1 comment3 min readLW link

Com­pute Thresh­olds: pro­posed rules to miti­gate risk of a “lab leak” ac­ci­dent dur­ing AI train­ing runs

davidadJul 22, 2023, 6:09 PM
80 points
2 comments2 min readLW link

Apollo Neuro Fol­low Up

ElizabethJul 22, 2023, 5:20 PM
28 points
0 comments1 min readLW link
(acesounderglass.com)

Ex­pert trap – Ways out (Part 3 of 3)

Paweł SysiakJul 22, 2023, 1:06 PM
4 points
0 comments9 min readLW link

GPTs’ abil­ity to keep a se­cret is weirdly prompt-dependent

Jul 22, 2023, 12:21 PM
31 points
0 comments9 min readLW link

Re­plac­ing the Big Air Purifier

jefftkJul 22, 2023, 12:10 PM
10 points
0 comments1 min readLW link
(www.jefftk.com)

[Question] I’m con­sis­tently over­whelmed by ba­sic obli­ga­tions. Are there any paradigm shifts or other ra­tio­nal­ity-based tips that would be helpful?

Benjamin HendricksJul 21, 2023, 9:10 PM
71 points
42 comments2 min readLW link

Fun­da­men­tally Fuzzy Con­cepts Can’t Have Crisp Defi­ni­tions: Co­op­er­a­tion and Align­ment vs Math and Physics

VojtaKovarikJul 21, 2023, 9:03 PM
12 points
18 comments3 min readLW link

Cook­ing Air Quality

jefftkJul 21, 2023, 7:30 PM
16 points
1 comment2 min readLW link
(www.jefftk.com)

Re­ward Hack­ing from a Causal Perspective

Jul 21, 2023, 6:27 PM
29 points
6 comments7 min readLW link

News : Bi­den-⁠Har­ris Ad­minis­tra­tion Se­cures Vol­un­tary Com­mit­ments from Lead­ing Ar­tifi­cial In­tel­li­gence Com­pa­nies to Man­age the Risks Posed by AI

Jonathan ClaybroughJul 21, 2023, 6:00 PM
65 points
10 comments2 min readLW link
(www.whitehouse.gov)

The UAP Dis­clo­sure Act of 2023 and its implications

andeslodesJul 21, 2023, 5:21 PM
36 points
47 comments20 min readLW link
(www.congress.gov)

To use com­put­ers well, learn their rules

dkl9Jul 21, 2023, 5:00 PM
4 points
6 comments4 min readLW link
(dkl9.net)

BCIs and the ecosys­tem of mod­u­lar minds

berenJul 21, 2023, 3:58 PM
88 points
14 comments11 min readLW link

Pri­ori­ties for the UK Foun­da­tion Models Taskforce

Andrea_MiottiJul 21, 2023, 3:23 PM
105 points
4 comments5 min readLW link
(www.conjecture.dev)

Train­ing Pro­cess Trans­parency through Gra­di­ent In­ter­pretabil­ity: Early ex­per­i­ments on toy lan­guage models

Jul 21, 2023, 2:52 PM
56 points
1 comment1 min readLW link

[Question] Can AI Align­ment please cre­ate a Red­dit-like plat­form that would make it much eas­ier for al­ign­ment re­searchers to find and help each other?

Georgeo57Jul 21, 2023, 2:03 PM
−5 points
2 comments1 min readLW link