Align­ment Me­gapro­jects: You’re Not Even Try­ing to Have Ideas

NicholasKross12 Jul 2023 23:39 UTC
55 points
30 comments2 min readLW link

Eric Michaud on the Quan­ti­za­tion Model of Neu­ral Scal­ing, In­ter­pretabil­ity and Grokking

Michaël Trazzi12 Jul 2023 22:45 UTC
10 points
0 comments2 min readLW link
(theinsideview.ai)

[Question] Are there any good, easy-to-un­der­stand ex­am­ples of cases where statis­ti­cal causal net­work dis­cov­ery worked well in prac­tice?

tailcalled12 Jul 2023 22:08 UTC
42 points
6 comments1 min readLW link

The Opt-In Revolu­tion — My vi­sion of a pos­i­tive fu­ture with ASI (An ex­per­i­ment with LLM sto­ry­tel­ling)

Tachikoma12 Jul 2023 21:08 UTC
2 points
0 comments2 min readLW link

[Question] What does the launch of x.ai mean for AI Safety?

Chris_Leong12 Jul 2023 19:42 UTC
35 points
3 comments1 min readLW link

Towards Devel­op­men­tal Interpretability

12 Jul 2023 19:33 UTC
172 points
8 comments9 min readLW link

Flowchart: How might rogue AIs defeat all hu­mans?

Aryeh Englander12 Jul 2023 19:23 UTC
12 points
0 comments1 min readLW link

A re­view of Prin­cipia Qualia

jessicata12 Jul 2023 18:38 UTC
56 points
6 comments10 min readLW link
(unstablerontology.substack.com)

How I Learned To Stop Wor­ry­ing And Love The Shoggoth

Peter Merel12 Jul 2023 17:47 UTC
10 points
9 comments5 min readLW link

Goal-Direc­tion for Si­mu­lated Agents

Raymond D12 Jul 2023 17:06 UTC
33 points
2 comments6 min readLW link

AISN#14: OpenAI’s ‘Su­per­al­ign­ment’ team, Musk’s xAI launches, and de­vel­op­ments in mil­i­tary AI use

Dan H12 Jul 2023 16:58 UTC
16 points
0 comments1 min readLW link

Re­port on mod­el­ing ev­i­den­tial co­op­er­a­tion in large worlds

Johannes Treutlein12 Jul 2023 16:37 UTC
44 points
3 comments1 min readLW link
(arxiv.org)

Com­pres­sion of morbidity

DirectedEvolution12 Jul 2023 15:26 UTC
12 points
0 comments3 min readLW link

An Overview of the AI Safety Fund­ing Situation

Stephen McAleese12 Jul 2023 14:54 UTC
63 points
3 comments1 min readLW link

[Question] What is some un­nec­es­sar­ily ob­scure jar­gon that peo­ple here tend to use?

jchan12 Jul 2023 13:52 UTC
17 points
5 comments1 min readLW link

Hous­ing and Tran­sit Roundup #5

Zvi12 Jul 2023 13:10 UTC
25 points
1 comment20 min readLW link
(thezvi.wordpress.com)

A tran­script of the TED talk by Eliezer Yudkowsky

Mikhail Samin12 Jul 2023 12:12 UTC
103 points
13 comments4 min readLW link

Lightweight min­i­mal speech recog­ni­tion?

jefftk12 Jul 2023 12:00 UTC
9 points
6 comments1 min readLW link
(www.jefftk.com)

Aging and the gero­science hypothesis

DirectedEvolution12 Jul 2023 7:16 UTC
54 points
14 comments5 min readLW link

Pop­u­lariz­ing vibes vs. models

DirectedEvolution12 Jul 2023 5:44 UTC
19 points
0 comments2 min readLW link

An­nounc­ing the AI Fables Writ­ing Con­test!

DaystarEld12 Jul 2023 3:04 UTC
36 points
3 comments1 min readLW link

Why it’s nec­es­sary to shoot your­self in the foot

Jacob G-W11 Jul 2023 21:17 UTC
39 points
7 comments2 min readLW link
(g-w1.github.io)

How do low level hy­pothe­ses con­strain high level ones? The mys­tery of the dis­ap­pear­ing di­a­mond.

Christopher King11 Jul 2023 19:27 UTC
17 points
11 comments2 min readLW link

[Question] Do we au­to­mat­i­cally ac­cept propo­si­tions?

Aaron Graifman11 Jul 2023 17:45 UTC
10 points
5 comments1 min readLW link

fMRI LIKE APPROACH TO AI ALIGNMENT /​ DECEPTIVE BEHAVIOUR

Escaque 6611 Jul 2023 17:17 UTC
−1 points
3 comments2 min readLW link

In­tro­duc­ing Fate­book: the fastest way to make and track predictions

11 Jul 2023 15:28 UTC
127 points
34 comments1 min readLW link
(fatebook.io)

My Weirdest Experience

Bridgett Kay11 Jul 2023 14:44 UTC
37 points
19 comments1 min readLW link
(dxmrevealed.wordpress.com)

An­nounc­ing The Roots of Progress Blog-Build­ing Intensive

jasoncrawford11 Jul 2023 14:04 UTC
10 points
0 comments1 min readLW link
(rootsofprogress.org)

OpenAI Launches Su­per­al­ign­ment Taskforce

Zvi11 Jul 2023 13:00 UTC
149 points
40 comments49 min readLW link
(thezvi.wordpress.com)

Cri­tiquing Risks From Learned Op­ti­miza­tion, and Avoid­ing Cached Theories

ProofBySonnet11 Jul 2023 11:38 UTC
1 point
0 comments6 min readLW link

[UPDATE: dead­line ex­tended to July 24!] New wind in ra­tio­nal­ity’s sails: Ap­pli­ca­tions for Epistea Res­i­dency 2023 are now open

11 Jul 2023 11:02 UTC
80 points
7 comments3 min readLW link

Two Hot Takes about Quine

Charlie Steiner11 Jul 2023 6:42 UTC
15 points
0 comments2 min readLW link

Dis­in­cen­tiviz­ing de­cep­tion in mesa op­ti­miz­ers with Model Tampering

martinkunev11 Jul 2023 0:44 UTC
3 points
0 comments2 min readLW link

Drawn Out: a story

Richard_Ngo11 Jul 2023 0:08 UTC
68 points
2 comments8 min readLW link

Defi­ni­tions are about effi­ciency and con­sis­tency with com­mon lan­guage.

Nacruno9610 Jul 2023 23:46 UTC
1 point
0 comments4 min readLW link

Refram­ing Evolu­tion—An in­for­ma­tion wavefront trav­el­ing through time

Joshua Clancy10 Jul 2023 22:36 UTC
1 point
0 comments5 min readLW link
(midflip.org)

GPT-7: The Tale of the Big Com­puter (An Ex­per­i­men­tal Story)

Justin Bullock10 Jul 2023 20:22 UTC
4 points
4 comments5 min readLW link

Cost-effec­tive­ness of pro­fes­sional field-build­ing pro­grams for AI safety research

Dan H10 Jul 2023 18:28 UTC
8 points
5 comments1 min readLW link

Cost-effec­tive­ness of stu­dent pro­grams for AI safety research

Dan H10 Jul 2023 18:28 UTC
15 points
2 comments1 min readLW link

Model­ing the im­pact of AI safety field-build­ing programs

Dan H10 Jul 2023 18:27 UTC
21 points
0 comments1 min readLW link

I think Michael Bailey’s dis­mis­sal of my au­to­g­y­nephilia ques­tions for Scott Alexan­der and Aella makes very lit­tle sense

tailcalled10 Jul 2023 17:39 UTC
45 points
45 comments2 min readLW link

In­cen­tives from a causal perspective

10 Jul 2023 17:16 UTC
27 points
0 comments6 min readLW link

Is the En­dow­ment Effect Due to In­com­pa­ra­bil­ity?

Kevin Dorst10 Jul 2023 16:26 UTC
21 points
10 comments7 min readLW link
(kevindorst.substack.com)

Fron­tier AI Regulation

Zach Stein-Perlman10 Jul 2023 14:30 UTC
20 points
4 comments8 min readLW link
(arxiv.org)

Why is it so hard to change peo­ple’s minds? Well, imag­ine if it wasn’t...

Celarix10 Jul 2023 13:55 UTC
6 points
9 comments6 min readLW link

Con­sider Join­ing the UK Foun­da­tion Model Taskforce

Zvi10 Jul 2023 13:50 UTC
105 points
12 comments1 min readLW link
(thezvi.wordpress.com)

“Refram­ing Su­per­in­tel­li­gence” + LLMs + 4 years

Eric Drexler10 Jul 2023 13:42 UTC
116 points
8 comments12 min readLW link

Open-minded updatelessness

10 Jul 2023 11:08 UTC
65 points
21 comments12 min readLW link

Ar­gu­ments against ex­is­ten­tial risk from AI, part 2

Nina Rimsky10 Jul 2023 8:25 UTC
7 points
0 comments5 min readLW link
(ninarimsky.substack.com)

Con­scious­ness as a con­fla­tion­ary al­li­ance term for in­trin­si­cally val­ued in­ter­nal experiences

Andrew_Critch10 Jul 2023 8:09 UTC
190 points
46 comments11 min readLW link