A Short Memo on AI In­ter­pretabil­ity Rain­bows

scasperJul 27, 2023, 11:05 PM
18 points
0 comments2 min readLW link

Pul­ling the Rope Side­ways: Em­piri­cal Test Results

Daniel KokotajloJul 27, 2023, 10:18 PM
61 points
18 comments1 min readLW link

A $10k retroac­tive grant for VaccinateCA

Austin ChenJul 27, 2023, 6:14 PM
82 points
0 commentsLW link
(manifund.org)

Prefer­ence Ag­gre­ga­tion as Bayesian Inference

berenJul 27, 2023, 5:59 PM
14 points
1 comment1 min readLW link

AI #22: Into the Weeds

ZviJul 27, 2023, 5:40 PM
49 points
8 comments84 min readLW link
(thezvi.wordpress.com)

SSA re­jects an­thropic shadow, too

jessicataJul 27, 2023, 5:25 PM
74 points
38 comments11 min readLW link
(unstableontology.com)

[Question] What are ex­am­ples of some­one do­ing a lot of work to find the best of some­thing?

chanamessingerJul 27, 2023, 3:58 PM
29 points
16 comments1 min readLW link

AI-Plans.com 10-day Cri­tique-a-Thon

IknownothingJul 27, 2023, 11:44 AM
8 points
2 comments2 min readLW link
(manifund.org)

Pri­vacy in a Digi­tal World

FaustifyJul 27, 2023, 10:46 AM
2 points
0 comments5 min readLW link

Cul­ti­vat­ing a state of mind where new ideas are born

Henrik KarlssonJul 27, 2023, 9:16 AM
244 points
21 comments14 min readLW link2 reviews
(www.henrikkarlsson.xyz)

Par­tial Tran­script of Re­cent Se­nate Hear­ing Dis­cussing AI X-Risk

Daniel_EthJul 27, 2023, 9:16 AM
55 points
0 commentsLW link
(medium.com)

AXRP Epi­sode 24 - Su­per­al­ign­ment with Jan Leike

DanielFilanJul 27, 2023, 4:00 AM
55 points
3 comments69 min readLW link

[Question] Have you ever con­sid­ered tak­ing the ‘Tur­ing Test’ your­self?

Super AGIJul 27, 2023, 3:48 AM
2 points
6 comments1 min readLW link

AXRP Epi­sode 23 - Mechanis­tic Ano­maly De­tec­tion with Mark Xu

DanielFilanJul 27, 2023, 1:50 AM
22 points
0 comments72 min readLW link

GPT-4 can catch sub­tle cross-lan­guage trans­la­tion mistakes

Michael TontchevJul 27, 2023, 1:39 AM
7 points
1 comment1 min readLW link

So­cial Balance through Em­brac­ing So­cial Credit

dhruvvJul 26, 2023, 8:07 PM
−39 points
9 comments3 min readLW link

Why no Ro­man In­dus­trial Revolu­tion?

jasoncrawfordJul 26, 2023, 7:34 PM
62 points
30 comments3 min readLW link
(rootsofprogress.org)

Why you can’t treat de­cid­abil­ity and com­plex­ity as a con­stant (Post #1)

Noosphere89Jul 26, 2023, 5:54 PM
6 points
13 comments5 min readLW link

A re­sponse to the Richards et al.’s “The Illu­sion of AI’s Ex­is­ten­tial Risk”

Harrison FellJul 26, 2023, 5:34 PM
1 point
0 comments10 min readLW link

Meta-level ad­ver­sar­ial eval­u­a­tion of over­sight tech­niques might al­low ro­bust mea­sure­ment of their adequacy

Jul 26, 2023, 5:02 PM
100 points
19 comments1 min readLW link1 review

Neuronpedia

Johnny LinJul 26, 2023, 4:29 PM
135 points
51 comments2 min readLW link
(neuronpedia.org)

Fron­tier Model Forum

Zach Stein-PerlmanJul 26, 2023, 2:30 PM
27 points
0 comments4 min readLW link
(blog.google)

Pod­casts: Fu­ture of Life In­sti­tute, Break­through Science Sum­mit panel

jasoncrawfordJul 26, 2023, 2:28 PM
8 points
0 comments1 min readLW link
(rootsofprogress.org)

Llama We Do­ing This Again?

ZviJul 26, 2023, 1:00 PM
48 points
3 comments16 min readLW link
(thezvi.wordpress.com)

Fron­tier Model Security

VaniverJul 26, 2023, 4:48 AM
32 points
1 comment3 min readLW link
(www.anthropic.com)

The First Room-Tem­per­a­ture Am­bi­ent-Pres­sure Superconductor

AnnapurnaJul 26, 2023, 2:27 AM
35 points
28 comments1 min readLW link
(arxiv.org)

Un­der­wa­ter Tor­ture Cham­bers: The Hor­ror Of Fish Farming

Bentham's BulldogJul 26, 2023, 12:27 AM
83 points
50 comments10 min readLW link1 review

Con­tra Alexan­der on the Bit­ter Les­son and IQ

Andrew Keenan RichardsonJul 26, 2023, 12:07 AM
9 points
1 comment4 min readLW link
(mechanisticmind.com)

Over­com­ing the MWC

Mark FreedJul 25, 2023, 5:31 PM
3 points
0 comments3 min readLW link

Rus­sian par­li­a­men­tar­ian: let’s ban per­sonal com­put­ers and the Internet

RomanSJul 25, 2023, 5:30 PM
11 points
6 comments2 min readLW link

AISN #16: White House Se­cures Vol­un­tary Com­mit­ments from Lead­ing AI Labs and Les­sons from Oppenheimer

Jul 25, 2023, 4:58 PM
6 points
0 comments6 min readLW link
(newsletter.safe.ai)

“The Uni­verse of Minds”—call for re­view­ers (Seeds of Science)

rogersbaconJul 25, 2023, 4:53 PM
7 points
0 comments1 min readLW link

Thoughts on Loss Land­scapes and why Deep Learn­ing works

berenJul 25, 2023, 4:41 PM
53 points
4 comments18 min readLW link

Should you work at a lead­ing AI lab? (in­clud­ing in non-safety roles)

Benjamin HiltonJul 25, 2023, 4:29 PM
7 points
0 comments12 min readLW link

Whisper’s Word-Level Times­tamps are Out

Varshul GuptaJul 25, 2023, 2:32 PM
−18 points
2 comments2 min readLW link
(dubverseblack.substack.com)

AIS 101: Task de­com­po­si­tion for scal­able oversight

Charbel-RaphaëlJul 25, 2023, 1:34 PM
35 points
0 comments19 min readLW link
(docs.google.com)

An­thropic Observations

ZviJul 25, 2023, 12:50 PM
104 points
1 comment10 min readLW link
(thezvi.wordpress.com)

Au­tonomous Align­ment Over­sight Frame­work (AAOF)

JustausernameJul 25, 2023, 10:25 AM
−9 points
0 comments4 min readLW link

How LLMs are and are not myopic

janusJul 25, 2023, 2:19 AM
135 points
16 comments8 min readLW link

Se­cure Hand Holding

jefftkJul 25, 2023, 1:40 AM
28 points
43 comments1 min readLW link
(www.jefftk.com)

Open prob­lems in ac­ti­va­tion engineering

Jul 24, 2023, 7:46 PM
51 points
2 comments1 min readLW link
(coda.io)

Sub­di­vi­sions for Use­ful Distil­la­tions?

Sharat Jacob JacobJul 24, 2023, 6:55 PM
9 points
2 comments2 min readLW link

Op­ti­miz­ing For Ap­proval And Disapproval

Thoth HermesJul 24, 2023, 6:46 PM
−1 points
0 comments12 min readLW link
(thothhermes.substack.com)

An Opinionated Guide to Com­putabil­ity and Com­plex­ity (Post #0)

Noosphere89Jul 24, 2023, 5:53 PM
10 points
10 comments3 min readLW link

Slow­ing down AI progress is an un­der­ex­plored al­ign­ment strategy

Norman BorlaugJul 24, 2023, 4:56 PM
42 points
27 comments5 min readLW link

An­ti­ci­pa­tion in LLMs

derek shillerJul 24, 2023, 3:53 PM
6 points
0 comments13 min readLW link

The cone of free­dom (or, free­dom might only be in­stru­men­tally valuable)

dkl9Jul 24, 2023, 3:38 PM
−10 points
6 comments2 min readLW link
(dkl9.net)

A re­for­mu­la­tion of Finite Fac­tored Sets

Matthias G. MayerJul 24, 2023, 1:02 PM
76 points
1 comment8 min readLW link

Brain Effi­ciency Can­nell Prize Con­test Award Ceremony

Alexander Gietelink Oldenziel24 Jul 2023 11:30 UTC
149 points
12 comments7 min readLW link

[Cross­post] An AI Pause Is Hu­man­ity’s Best Bet For Prevent­ing Ex­tinc­tion (TIME)

otto.barten24 Jul 2023 10:07 UTC
12 points
0 comments7 min readLW link
(time.com)