AI Safety Move­ment Builders should help the com­mu­nity to op­ti­mise three fac­tors: con­trib­u­tors, con­tri­bu­tions and coordination

peterslattery15 Dec 2022 22:50 UTC
4 points
0 comments6 min readLW link

Mask­ing to Avoid Miss­ing Things

jefftk15 Dec 2022 21:00 UTC
17 points
2 comments1 min readLW link
(www.jefftk.com)

Con­sider work­ing more hours and tak­ing more stimulants

Arjun Panickssery15 Dec 2022 20:38 UTC
36 points
11 comments1 min readLW link

We’ve stepped over the thresh­old into the Fourth Arena, but don’t rec­og­nize it

Bill Benzon15 Dec 2022 20:22 UTC
2 points
0 comments7 min readLW link

[Question] How is ARC plan­ning to use ELK?

jacquesthibs15 Dec 2022 20:11 UTC
24 points
5 comments1 min readLW link

How “Dis­cov­er­ing La­tent Knowl­edge in Lan­guage Models Without Su­per­vi­sion” Fits Into a Broader Align­ment Scheme

Collin15 Dec 2022 18:22 UTC
243 points
39 comments16 min readLW link1 review

High-level hopes for AI alignment

HoldenKarnofsky15 Dec 2022 18:00 UTC
58 points
3 comments19 min readLW link
(www.cold-takes.com)

Two Dog­mas of LessWrong

omnizoid15 Dec 2022 17:56 UTC
−6 points
155 comments69 min readLW link

Covid 12/​15/​22: China’s Wave Begins

Zvi15 Dec 2022 16:20 UTC
32 points
7 comments10 min readLW link
(thezvi.wordpress.com)

The next decades might be wild

Marius Hobbhahn15 Dec 2022 16:10 UTC
175 points
42 comments41 min readLW link1 review

Ba­sic build­ing blocks of de­pen­dent type theory

Thomas Kehrenberg15 Dec 2022 14:54 UTC
47 points
8 comments13 min readLW link

AI Ne­o­re­al­ism: a threat model & suc­cess crite­rion for ex­is­ten­tial safety

davidad15 Dec 2022 13:42 UTC
64 points
1 comment3 min readLW link

Who should write the defini­tive post on Ziz?

NicholasKross15 Dec 2022 6:37 UTC
3 points
45 comments3 min readLW link

[Question] Is Paul Chris­ti­ano still as op­ti­mistic about Ap­proval-Directed Agents as he was in 2018?

Chris_Leong14 Dec 2022 23:28 UTC
8 points
0 comments1 min readLW link

«Boundaries», Part 3b: Align­ment prob­lems in terms of bound­aries

Andrew_Critch14 Dec 2022 22:34 UTC
72 points
7 comments13 min readLW link

Align­ing al­ign­ment with performance

Marv K14 Dec 2022 22:19 UTC
2 points
0 comments2 min readLW link

Con­trary to List of Lethal­ity’s point 22, al­ign­ment’s door num­ber 2

False Name14 Dec 2022 22:01 UTC
−2 points
5 comments22 min readLW link

Kol­mogorov Com­plex­ity and Si­mu­la­tion Hypothesis

False Name14 Dec 2022 22:01 UTC
−3 points
0 comments7 min readLW link

[Question] Stan­ley Meyer’s wa­ter fuel cell

mikbp14 Dec 2022 21:19 UTC
2 points
6 comments1 min readLW link

all claw, no world — and other thoughts on the uni­ver­sal distribution

Tamsin Leake14 Dec 2022 18:55 UTC
15 points
0 comments7 min readLW link
(carado.moe)

[Question] Is the AI timeline too short to have chil­dren?

Yoreth14 Dec 2022 18:32 UTC
38 points
20 comments1 min readLW link

Pre­dict­ing GPU performance

14 Dec 2022 16:27 UTC
60 points
26 comments1 min readLW link
(epochai.org)

[In­com­plete] What is Com­pu­ta­tion Any­way?

DragonGod14 Dec 2022 16:17 UTC
16 points
1 comment13 min readLW link
(arxiv.org)

Chair Hang­ing Peg

jefftk14 Dec 2022 15:30 UTC
11 points
0 comments1 min readLW link
(www.jefftk.com)

My AGI safety re­search—2022 re­view, ’23 plans

Steven Byrnes14 Dec 2022 15:15 UTC
51 points
10 comments7 min readLW link

Ex­tract­ing and Eval­u­at­ing Causal Direc­tion in LLMs’ Activations

14 Dec 2022 14:33 UTC
29 points
5 comments11 min readLW link

Key Mostly Out­ward-Fac­ing Facts From the Story of VaccinateCA

Zvi14 Dec 2022 13:30 UTC
61 points
2 comments23 min readLW link
(thezvi.wordpress.com)

Dis­cov­er­ing La­tent Knowl­edge in Lan­guage Models Without Supervision

Xodarap14 Dec 2022 12:32 UTC
45 points
1 comment1 min readLW link
(arxiv.org)

[Question] COVID China Per­sonal Ad­vice (No mRNA vax, pos­si­ble hos­pi­tal over­load, bug-chas­ing edi­tion)

Lao Mein14 Dec 2022 10:31 UTC
20 points
11 comments1 min readLW link

Beyond a bet­ter world

Davidmanheim14 Dec 2022 10:18 UTC
14 points
7 comments4 min readLW link
(progressforum.org)

Proof as mere strong evidence

adamShimi14 Dec 2022 8:56 UTC
28 points
16 comments2 min readLW link
(epistemologicalvigilance.substack.com)

Try­ing to dis­am­biguate differ­ent ques­tions about whether RLHF is “good”

Buck14 Dec 2022 4:03 UTC
106 points
47 comments7 min readLW link1 review

[Question] How can one liter­ally buy time (from x-risk) with money?

Alex_Altair13 Dec 2022 19:24 UTC
24 points
3 comments1 min readLW link

[Question] Best in­tro­duc­tory overviews of AGI safety?

JakubK13 Dec 2022 19:01 UTC
21 points
9 comments2 min readLW link
(forum.effectivealtruism.org)

Ap­pli­ca­tions open for AGI Safety Fun­da­men­tals: Align­ment Course

13 Dec 2022 18:31 UTC
48 points
0 comments2 min readLW link

What Does It Mean to Align AI With Hu­man Values?

Algon13 Dec 2022 16:56 UTC
8 points
3 comments1 min readLW link
(www.quantamagazine.org)

It Takes Two Parac­eta­mol?

Eli_13 Dec 2022 16:29 UTC
33 points
10 comments2 min readLW link

[In­terim re­search re­port] Tak­ing fea­tures out of su­per­po­si­tion with sparse autoencoders

13 Dec 2022 15:41 UTC
137 points
22 comments22 min readLW link2 reviews

[Question] Is the ChatGPT-simu­lated Linux vir­tual ma­chine real?

Kenoubi13 Dec 2022 15:41 UTC
18 points
7 comments1 min readLW link

Ex­is­ten­tial AI Safety is NOT sep­a­rate from near-term applications

scasper13 Dec 2022 14:47 UTC
37 points
17 comments3 min readLW link

What is the cor­re­la­tion be­tween up­vot­ing and benefit to read­ers of LW?

banev13 Dec 2022 14:26 UTC
8 points
15 comments1 min readLW link

Limits of Superintelligence

Aleksei Petrenko13 Dec 2022 12:19 UTC
1 point
5 comments1 min readLW link

Bay 2022 Solstice

Raemon13 Dec 2022 8:58 UTC
17 points
0 comments1 min readLW link

Last day to nom­i­nate things for the Re­view. Also, 2019 books still ex­ist.

Raemon13 Dec 2022 8:53 UTC
15 points
0 comments1 min readLW link

AI al­ign­ment is dis­tinct from its near-term applications

paulfchristiano13 Dec 2022 7:10 UTC
254 points
21 comments2 min readLW link
(ai-alignment.com)

Take 10: Fine-tun­ing with RLHF is aes­thet­i­cally un­satis­fy­ing.

Charlie Steiner13 Dec 2022 7:04 UTC
37 points
3 comments2 min readLW link

[Question] Are law­suits against AGI com­pa­nies ex­tend­ing AGI timelines?

SlowingAGI13 Dec 2022 6:00 UTC
1 point
1 comment1 min readLW link

EA & LW Fo­rums Weekly Sum­mary (5th Dec − 11th Dec 22′)

Zoe Williams13 Dec 2022 2:53 UTC
7 points
0 comments1 min readLW link

Align­ment with ar­gu­ment-net­works and as­sess­ment-predictions

Tor Økland Barstad13 Dec 2022 2:17 UTC
10 points
5 comments45 min readLW link

Re­vis­it­ing al­gorith­mic progress

13 Dec 2022 1:39 UTC
94 points
15 comments2 min readLW link1 review
(arxiv.org)