Ad­ver­sar­ial train­ing, im­por­tance sam­pling, and anti-ad­ver­sar­ial train­ing for AI whistleblowing

BuckJun 2, 2022, 11:48 PM
42 points
0 comments3 min readLW link

Tao, Kont­se­vich & oth­ers on HLAI in Math

intersticeJun 10, 2022, 2:25 AM
41 points
5 comments2 min readLW link
(www.youtube.com)

Linkpost: Robin Han­son—Why Not Wait On AI Risk?

Yair HalberstadtJun 24, 2022, 2:23 PM
41 points
14 comments1 min readLW link
(www.overcomingbias.com)

Blake Richards on Why he is Skep­ti­cal of Ex­is­ten­tial Risk from AI

Michaël TrazziJun 14, 2022, 7:09 PM
41 points
12 comments4 min readLW link
(theinsideview.ai)

Ge­or­gism, in theory

Stuart_ArmstrongJun 15, 2022, 3:20 PM
40 points
22 comments4 min readLW link

Key Papers in Lan­guage Model Safety

aogJun 20, 2022, 3:00 PM
40 points
1 comment22 min readLW link

D&D.Sci June 2022: A God­dess Tried To Rein­car­nate Me Into A Fan­tasy World, But I In­sisted On Us­ing Data Science To Select An Op­ti­mal Com­bi­na­tion Of Cheat Skills!

abstractapplicJun 4, 2022, 1:28 AM
40 points
22 comments3 min readLW link

A Li­tany Miss­ing from the Canon

benwrJun 17, 2022, 1:39 AM
39 points
3 comments1 min readLW link
(www.benwr.net)

Four rea­sons I find AI safety emo­tion­ally compelling

Jun 28, 2022, 2:10 PM
39 points
3 comments4 min readLW link

Another Calming Example

jefftkJun 3, 2022, 2:20 AM
39 points
13 comments2 min readLW link
(www.jefftk.com)

The table of differ­ent sam­pling as­sump­tions in anthropics

avturchinJun 29, 2022, 10:41 AM
39 points
5 comments12 min readLW link

[Yann Le­cun] A Path Towards Au­tonomous Ma­chine In­tel­li­gence

DragonGodJun 27, 2022, 7:24 PM
38 points
14 comments1 min readLW link
(openreview.net)

Grokking “Fore­cast­ing TAI with biolog­i­cal an­chors”

anson.hoJun 6, 2022, 6:58 PM
38 points
0 comments14 min readLW link

Beauty and the Beast

Tomás B.Jun 11, 2022, 6:59 PM
38 points
8 comments6 min readLW link

Gra­di­ent hack­ing: defi­ni­tions and examples

Richard_NgoJun 29, 2022, 9:35 PM
38 points
2 comments5 min readLW link

Vael Gates: Risks from Ad­vanced AI (June 2022)

Vael GatesJun 14, 2022, 12:54 AM
38 points
2 comments30 min readLW link

[Question] What’s the “This AI is of moral con­cern.” fire alarm?

Quintin PopeJun 13, 2022, 8:05 AM
37 points
56 comments2 min readLW link

Quick Look: Asymp­tomatic Her­pes Shedding

ElizabethJun 4, 2022, 9:40 PM
37 points
4 comments2 min readLW link
(acesounderglass.com)

Scott Aaron­son and Steven Pinker De­bate AI Scaling

LironJun 28, 2022, 4:04 PM
37 points
7 comments1 min readLW link
(scottaaronson.blog)

Why agents are powerful

Daniel KokotajloJun 6, 2022, 1:37 AM
37 points
7 comments7 min readLW link

An­nounc­ing the Clearer Think­ing Re­grants program

spencergJun 17, 2022, 1:14 PM
36 points
1 comment1 min readLW link

[Link] Ad­ver­sar­i­ally trained neu­ral rep­re­sen­ta­tions may already be as ro­bust as cor­re­spond­ing biolog­i­cal neu­ral representations

Gunnar_ZarnckeJun 24, 2022, 8:51 PM
35 points
9 comments1 min readLW link

Op­ti­miza­tion and Ad­e­quacy in Five Bullets

james.lucassenJun 6, 2022, 5:48 AM
35 points
2 comments4 min readLW link
(jlucassen.com)

Align­ment Risk Doesn’t Re­quire Superintelligence

JustisMillsJun 15, 2022, 3:12 AM
35 points
4 comments2 min readLW link

D&D.Sci June 2022 Eval­u­a­tion and Ruleset

abstractapplicJun 13, 2022, 10:31 AM
34 points
11 comments4 min readLW link

Steganog­ra­phy and the Cy­cleGAN—al­ign­ment failure case study

Jan CzechowskiJun 11, 2022, 9:41 AM
34 points
0 comments4 min readLW link

[Question] Are long-form dat­ing pro­files pro­duc­tive?

AABoylesJun 27, 2022, 5:03 PM
34 points
32 comments1 min readLW link

[Question] How much does cy­ber­se­cu­rity re­duce AI risk?

DarmaniJun 12, 2022, 10:13 PM
34 points
23 comments1 min readLW link

[Question] Why don’t you in­tro­duce re­ally im­pres­sive peo­ple you per­son­ally know to AI al­ign­ment (more of­ten)?

VerdenJun 11, 2022, 3:59 PM
33 points
14 comments1 min readLW link

To what ex­tent have ideas and sci­en­tific dis­cov­er­ies got­ten harder to find?

lsusrJun 18, 2022, 7:15 AM
33 points
10 comments6 min readLW link

Reflec­tion Mechanisms as an Align­ment tar­get: A survey

Jun 22, 2022, 3:05 PM
32 points
1 comment14 min readLW link

Google’s new text-to-image model—Parti, a demon­stra­tion of scal­ing benefits

KaydenJun 22, 2022, 8:00 PM
32 points
4 comments1 min readLW link

A claim that Google’s LaMDA is sentient

Ben LivengoodJun 12, 2022, 4:18 AM
31 points
133 comments1 min readLW link

[Question] How are com­pute as­sets dis­tributed in the world?

Chris van MerwijkJun 12, 2022, 10:13 PM
30 points
7 comments1 min readLW link

[Question] Why don’t we think we’re in the sim­plest uni­verse with in­tel­li­gent life?

ADifferentAnonymousJun 18, 2022, 3:05 AM
30 points
33 comments1 min readLW link

Assess­ing AlephAlphas Mul­ti­modal Model

p.b.Jun 28, 2022, 9:28 AM
30 points
5 comments3 min readLW link

Com­mon but ne­glected risk fac­tors that may let you get Paxlovid

DirectedEvolutionJun 21, 2022, 7:34 AM
29 points
8 comments4 min readLW link

Covid 6/​16/​22: Do Not Hand it to Them

ZviJun 16, 2022, 2:40 PM
29 points
5 comments7 min readLW link
(thezvi.wordpress.com)

En­ti­tle­ment as a ma­jor am­plifier of unhappiness

VipulNaikJun 8, 2022, 10:08 PM
29 points
6 comments7 min readLW link

Fore­cast­ing Fu­sion Power

Daniel KokotajloJun 18, 2022, 12:04 AM
29 points
8 comments1 min readLW link
(astralcodexten.substack.com)

Juneberry Cake

jefftkJun 19, 2022, 1:40 AM
29 points
0 comments1 min readLW link
(www.jefftk.com)

A But­terfly’s View of Probability

Gabriel WuJun 15, 2022, 2:14 AM
29 points
17 comments11 min readLW link

Why it’s bad to kill Grandma

dynomightJun 9, 2022, 6:12 PM
29 points
14 comments8 min readLW link
(dynomight.substack.com)

Was the In­dus­trial Revolu­tion The In­dus­trial Revolu­tion?

Davis KedroskyJun 14, 2022, 2:48 PM
29 points
0 comments12 min readLW link
(daviskedrosky.substack.com)

Wield­ing civilization

dominicqJun 1, 2022, 7:11 AM
29 points
2 comments2 min readLW link

[Link-post] On Defer­ence and Yud­kowsky’s AI Risk Estimates

bmgJun 19, 2022, 5:25 PM
29 points
8 comments1 min readLW link

In­ves­ti­gat­ing causal un­der­stand­ing in LLMs

Jun 14, 2022, 1:57 PM
28 points
6 comments13 min readLW link

[Question] Is CIRL a promis­ing agenda?

Chris_LeongJun 23, 2022, 5:12 PM
28 points
16 comments1 min readLW link

In­tel­li­gence in Com­mit­ment Races

David UdellJun 24, 2022, 2:30 PM
28 points
8 comments5 min readLW link

Limits of Bodily Autonomy

jefftkJun 27, 2022, 7:50 PM
28 points
18 comments1 min readLW link
(www.jefftk.com)