Re­mote AI Align­ment Over­hang?

tryactionsFeb 19, 2023, 10:30 PM
37 points
5 comments4 min readLW link

A Neu­ral Net­work un­der­go­ing Gra­di­ent-based Train­ing as a Com­plex System

carboniferous_umbraculum Feb 19, 2023, 10:08 PM
22 points
1 comment19 min readLW link

Another Way to Be Okay

Gretta DulebaFeb 19, 2023, 8:49 PM
107 points
15 comments6 min readLW link

A Way To Be Okay

Duncan Sabien (Inactive)Feb 19, 2023, 8:27 PM
109 points
38 comments10 min readLW link1 review

Ex­plor­ing Lily’s world with ChatGPT [things an AI won’t do]

Bill BenzonFeb 19, 2023, 4:39 PM
5 points
0 comments20 min readLW link

EIS VIII: An Eng­ineer’s Un­der­stand­ing of De­cep­tive Alignment

scasperFeb 19, 2023, 3:25 PM
30 points
5 comments4 min readLW link

Does novel un­der­stand­ing im­ply novel agency /​ val­ues?

TsviBTFeb 19, 2023, 2:41 PM
18 points
0 comments7 min readLW link

There are (prob­a­bly) no su­per­hu­man Go AIs: strong hu­man play­ers beat the strongest AIs

TaranFeb 19, 2023, 12:25 PM
125 points
34 comments4 min readLW link

Nav­i­gat­ing pub­lic AI x-risk hype while pur­su­ing tech­ni­cal solutions

Dan BraunFeb 19, 2023, 12:22 PM
18 points
0 comments2 min readLW link

Some­what against “just up­date all the way”

tailcalledFeb 19, 2023, 10:49 AM
31 points
10 comments2 min readLW link

Hu­man beats SOTA Go AI by learn­ing an ad­ver­sar­ial policy

Vanessa KosoyFeb 19, 2023, 9:38 AM
59 points
32 comments1 min readLW link
(goattack.far.ai)

Degamification

Nate ShowellFeb 19, 2023, 5:35 AM
23 points
2 comments2 min readLW link

Stop post­ing prompt in­jec­tions on Twit­ter and call­ing it “mis­al­ign­ment”

lcFeb 19, 2023, 2:21 AM
144 points
9 comments1 min readLW link

AGI in sight: our look at the game board

Feb 18, 2023, 10:17 PM
227 points
135 comments6 min readLW link
(andreamiotti.substack.com)

We should be sig­nal-boost­ing anti Bing chat content

mbrooksFeb 18, 2023, 6:52 PM
−4 points
13 comments2 min readLW link

Can talk, can think, can suffer.

IlioFeb 18, 2023, 6:43 PM
1 point
8 comments3 min readLW link

Para­met­ri­cally re­tar­getable de­ci­sion-mak­ers tend to seek power

TurnTroutFeb 18, 2023, 6:41 PM
172 points
10 comments2 min readLW link
(arxiv.org)

Near-Term Risks of an Obe­di­ent Ar­tifi­cial Intelligence

ymeskhoutFeb 18, 2023, 6:30 PM
20 points
1 comment6 min readLW link

EIS VII: A Challenge for Mechanists

scasperFeb 18, 2023, 6:27 PM
36 points
4 comments3 min readLW link

Read­ing Speed Ex­ists!

Johannes C. MayerFeb 18, 2023, 3:30 PM
12 points
9 comments1 min readLW link

The Prac­ti­tioner’s Path 2.0: the Med­i­ta­tive Archetype

EvenflairFeb 18, 2023, 3:23 PM
14 points
1 comment2 min readLW link
(guildoftherose.org)

Should we cry “wolf”?

TapataktFeb 18, 2023, 11:24 AM
24 points
5 comments1 min readLW link

[Question] Name of the fal­lacy of as­sum­ing an ex­treme value (e.g. 0) with the illu­sion of ‘avoid­ing to have to make an as­sump­tion’?

FlorianHFeb 18, 2023, 8:11 AM
4 points
1 comment1 min readLW link

I Think We’re Ap­proach­ing The Bit­ter Les­son’s Asymptote

SomeoneYouOnceKnewFeb 18, 2023, 5:33 AM
−3 points
9 comments5 min readLW link

Bus-Only Bus Lane Enforcement

jefftkFeb 18, 2023, 2:50 AM
19 points
15 comments1 min readLW link
(www.jefftk.com)

Run Head on Towards the Fal­ling Tears

Johannes C. MayerFeb 18, 2023, 1:33 AM
6 points
0 comments2 min readLW link

Two prob­lems with ‘Si­mu­la­tors’ as a frame

ryan_greenblattFeb 17, 2023, 11:34 PM
79 points
13 comments5 min readLW link

GPT-4 Predictions

Stephen McAleeseFeb 17, 2023, 11:20 PM
110 points
27 comments11 min readLW link

On Board Vi­sion, Hol­low Words, and the End of the World

MarcelloFeb 17, 2023, 11:18 PM
52 points
27 comments5 min readLW link

PICT: A Zero-Shot Prompt Tem­plate to Au­to­mate Evaluation

Quentin FEUILLADE--MONTIXIFeb 17, 2023, 11:16 PM
17 points
1 comment11 min readLW link

Hunch seeds: Info bio

the gears to ascensionFeb 17, 2023, 9:25 PM
12 points
0 comments9 min readLW link

Why Do We Believe

ScrewtapeFeb 17, 2023, 8:58 PM
9 points
3 comments3 min readLW link

I Am Scared of Post­ing Nega­tive Takes About Bing’s AI

YitzFeb 17, 2023, 8:50 PM
63 points
28 comments1 min readLW link

EIS VI: Cri­tiques of Mechanis­tic In­ter­pretabil­ity Work in AI Safety

scasperFeb 17, 2023, 8:48 PM
49 points
9 comments12 min readLW link

Tinker Bell The­ory and LLMs

Fergus FettesFeb 17, 2023, 8:23 PM
1 point
11 comments1 min readLW link

Recom­men­da­tion: Bug Boun­ties and Re­spon­si­ble Dis­clo­sure for Ad­vanced ML Systems

VaniverFeb 17, 2023, 8:11 PM
125 points
12 comments2 min readLW link

Microsoft and OpenAI, stop tel­ling chat­bots to role­play as AI

hold_my_fishFeb 17, 2023, 7:55 PM
50 points
10 comments1 min readLW link

A warm-up for the AI gov­er­nance project

jacekFeb 17, 2023, 6:06 PM
10 points
2 comments3 min readLW link

Link Post > Blog Post

party girlFeb 17, 2023, 5:59 PM
4 points
6 comments1 min readLW link
(onthespectrumontheguestlist.substack.com)

One-layer trans­form­ers aren’t equiv­a­lent to a set of skip-trigrams

BuckFeb 17, 2023, 5:26 PM
127 points
11 comments7 min readLW link

[Question] Should we be kind and po­lite to emerg­ing AIs?

David GrossFeb 17, 2023, 4:58 PM
9 points
13 comments1 min readLW link

Fol­low-up Post­ing on Cy­borg Psychologist

Hopkins StanleyFeb 17, 2023, 4:56 PM
0 points
2 comments1 min readLW link
(www.lesswrong.com)

A “slow take­off” might still look fast

MichaelDickensFeb 17, 2023, 4:51 PM
5 points
3 comments1 min readLW link

AI Safety Info Distil­la­tion Fellowship

Feb 17, 2023, 4:16 PM
47 points
3 comments3 min readLW link

Noz­ick’s Dilemma: A Cri­tique of Game Theory

Edward P. KöningsFeb 17, 2023, 4:11 PM
10 points
1 comment13 min readLW link

[Question] Are LLMs suffi­cient for AI take­off?

rpglover64Feb 17, 2023, 3:46 PM
8 points
2 comments1 min readLW link

Syd­ney’s Se­cret: A Short Story by Bing Chat

felaFeb 17, 2023, 1:31 PM
36 points
1 comment5 min readLW link

Au­tomat­ing Consistency

HoagyFeb 17, 2023, 1:24 PM
10 points
0 comments1 min readLW link

Hu­man de­ci­sion pro­cesses are not well factored

Feb 17, 2023, 1:11 PM
33 points
3 comments2 min readLW link

2023 ACX Pre­dic­tions: Buy/​Sell/​Hold

ZviFeb 17, 2023, 1:10 PM
25 points
3 comments20 min readLW link
(thezvi.wordpress.com)