OpenAI: GPT-based LLMs show abil­ity to dis­crim­i­nate be­tween its own wrong an­swers, but in­abil­ity to ex­plain how/​why it makes that dis­crim­i­na­tion, even as model scales

Aditya Jain13 Jun 2022 23:33 UTC
14 points
5 comments1 min readLW link
(openai.com)

[Question] Who said some­thing like “The fact that putting 2 ap­ples next to 2 other ap­ples leads to there be­ing 4 ap­ples there has noth­ing to do with the fact that 2 + 2 = 4”?

hunterglenn13 Jun 2022 22:23 UTC
1 point
2 comments1 min readLW link

Con­ti­nu­ity Assumptions

Jan_Kulveit13 Jun 2022 21:31 UTC
35 points
13 comments4 min readLW link

Crypto-fed Computation

aaguirre13 Jun 2022 21:20 UTC
23 points
7 comments7 min readLW link

A Modest Pivotal Act

anonymousaisafety13 Jun 2022 19:24 UTC
−16 points
1 comment5 min readLW link

Con­tra EY: Can AGI de­stroy us with­out trial & er­ror?

Nikita Sokolsky13 Jun 2022 18:26 UTC
136 points
72 comments15 min readLW link

What are some smaller-but-con­crete challenges re­lated to AI safety that are im­pact­ing peo­ple to­day?

nonzerosum13 Jun 2022 17:36 UTC
4 points
3 comments1 min readLW link

[Link] New SEP ar­ti­cle on Bayesian Epistemology

Aryeh Englander13 Jun 2022 15:03 UTC
6 points
0 comments1 min readLW link

Train­ing Trace Priors

Adam Jermyn13 Jun 2022 14:22 UTC
12 points
17 comments4 min readLW link

[Question] Can you MRI a deep learn­ing model?

Yair Halberstadt13 Jun 2022 13:43 UTC
3 points
3 comments1 min readLW link

On A List of Lethalities

Zvi13 Jun 2022 12:30 UTC
161 points
49 comments54 min readLW link1 review
(thezvi.wordpress.com)

D&D.Sci June 2022 Eval­u­a­tion and Ruleset

abstractapplic13 Jun 2022 10:31 UTC
30 points
10 comments4 min readLW link

[Question] What’s the “This AI is of moral con­cern.” fire alarm?

Quintin Pope13 Jun 2022 8:05 UTC
37 points
56 comments2 min readLW link

The beau­tiful mag­i­cal en­chanted golden Dall-e Mini is underrated

p.b.13 Jun 2022 7:58 UTC
14 points
0 comments1 min readLW link

Why so lit­tle AI risk on ra­tio­nal­ist-ad­ja­cent blogs?

Grant Demaree13 Jun 2022 6:31 UTC
46 points
23 comments8 min readLW link

Code Qual­ity and Rule Consequentialism

Adam Zerner13 Jun 2022 3:12 UTC
17 points
13 comments6 min readLW link

Grokking “Semi-in­for­ma­tive pri­ors over AI timelines”

anson.ho12 Jun 2022 22:17 UTC
15 points
7 comments14 min readLW link

[Question] How much does cy­ber­se­cu­rity re­duce AI risk?

Darmani12 Jun 2022 22:13 UTC
34 points
23 comments1 min readLW link

[Question] How are com­pute as­sets dis­tributed in the world?

Chris van Merwijk12 Jun 2022 22:13 UTC
30 points
7 comments1 min readLW link

In­tu­itive Ex­pla­na­tion of AIXI

Thomas Larsen12 Jun 2022 21:41 UTC
21 points
0 comments5 min readLW link

Why all the fuss about re­cur­sive self-im­prove­ment?

So8res12 Jun 2022 20:53 UTC
158 points
62 comments7 min readLW link1 review

Why the Kal­dor-Hicks crite­rion can be non-transitive

Rupert12 Jun 2022 17:26 UTC
4 points
10 comments2 min readLW link

[Question] How do you post links here?

skybrian12 Jun 2022 16:23 UTC
1 point
1 comment1 min readLW link

Pure Altruism

UtilityMonster12 Jun 2022 15:53 UTC
2 points
4 comments4 min readLW link

[Question] Filter out tags from the front page?

jaspax12 Jun 2022 10:59 UTC
9 points
2 comments1 min readLW link

How To: A Work­shop (or any­thing)

[DEACTIVATED] Duncan Sabien12 Jun 2022 8:00 UTC
53 points
13 comments37 min readLW link1 review

A claim that Google’s LaMDA is sentient

Ben Livengood12 Jun 2022 4:18 UTC
31 points
133 comments1 min readLW link

[Question] How much stupi­der than hu­mans can AI be and still kill us all through sheer num­bers and re­source ac­cess?

shminux12 Jun 2022 1:01 UTC
11 points
11 comments1 min readLW link

ELK Pro­posal—Make the Re­porter care about the Pre­dic­tor’s beliefs

11 Jun 2022 22:53 UTC
8 points
0 comments6 min readLW link

[Question] Why has no per­son /​ group ever taken over the world?

Aryeh Englander11 Jun 2022 20:51 UTC
25 points
19 comments1 min readLW link

[Question] Are there English-speak­ing mee­tups in Frank­furt/​Mu­nich/​Zurich?

Grant Demaree11 Jun 2022 20:02 UTC
6 points
2 comments1 min readLW link

Beauty and the Beast

Tomás B.11 Jun 2022 18:59 UTC
32 points
8 comments6 min readLW link

Poorly-Aimed Death Rays

Thane Ruthenis11 Jun 2022 18:29 UTC
48 points
5 comments4 min readLW link

AGI Safety Com­mu­ni­ca­tions Initiative

ines11 Jun 2022 17:34 UTC
7 points
0 comments1 min readLW link

A gam­ing group for ra­tio­nal­ity-aware people

dhatas11 Jun 2022 16:04 UTC
7 points
0 comments1 min readLW link

[Question] Why don’t you in­tro­duce re­ally im­pres­sive peo­ple you per­son­ally know to AI al­ign­ment (more of­ten)?

Verden11 Jun 2022 15:59 UTC
33 points
14 comments1 min readLW link

Godzilla Strategies

johnswentworth11 Jun 2022 15:44 UTC
146 points
71 comments3 min readLW link

Steganog­ra­phy and the Cy­cleGAN—al­ign­ment failure case study

Jan Czechowski11 Jun 2022 9:41 UTC
33 points
0 comments4 min readLW link

The Moun­tain Troll

lsusr11 Jun 2022 9:14 UTC
95 points
25 comments2 min readLW link

Show LW: Yo­daTimer.com

Adam Zerner11 Jun 2022 8:52 UTC
27 points
4 comments1 min readLW link

How fast can we perform a for­ward pass?

jsteinhardt10 Jun 2022 23:30 UTC
53 points
9 comments15 min readLW link
(bounded-regret.ghost.io)

Sum­mary of “AGI Ruin: A List of Lethal­ities”

Stephen McAleese10 Jun 2022 22:35 UTC
44 points
2 comments8 min readLW link

How dan­ger­ous is hu­man-level AI?

Alex_Altair10 Jun 2022 17:38 UTC
21 points
4 comments8 min readLW link

Another plau­si­ble sce­nario of AI risk: AI builds mil­i­tary in­fras­truc­ture while col­lab­o­rat­ing with hu­mans, defects later.

avturchin10 Jun 2022 17:24 UTC
10 points
2 comments1 min readLW link

Leav­ing Google, Join­ing the Nu­cleic Acid Observatory

jefftk10 Jun 2022 17:00 UTC
114 points
4 comments3 min readLW link
(www.jefftk.com)

On The Spec­trum, On The Guest List: (v) The Fleur Room

party girl10 Jun 2022 14:50 UTC
8 points
1 comment14 min readLW link
(onthespectrumontheguestlist.substack.com)

Progress Re­port 6: get the tool working

Nathan Helm-Burger10 Jun 2022 11:18 UTC
4 points
0 comments2 min readLW link

[Question] Is AI Align­ment Im­pos­si­ble?

Heighn10 Jun 2022 10:08 UTC
3 points
3 comments1 min readLW link

I No Longer Believe In­tel­li­gence to be “Mag­i­cal”

DragonGod10 Jun 2022 8:58 UTC
27 points
34 comments6 min readLW link

[linkpost] The fi­nal AI bench­mark: BIG-bench

RomanS10 Jun 2022 8:53 UTC
25 points
21 comments1 min readLW link