Re­spon­si­ble Scal­ing Poli­cies Are Risk Man­age­ment Done Wrong

simeon_cOct 25, 2023, 11:46 PM
123 points

58 votes

Overall karma indicates overall quality.

35 comments22 min readLW link1 review
(www.navigatingrisks.ai)

AI as a sci­ence, and three ob­sta­cles to al­ign­ment strategies

So8resOct 25, 2023, 9:00 PM
194 points

101 votes

Overall karma indicates overall quality.

80 comments11 min readLW link

My hopes for al­ign­ment: Sin­gu­lar learn­ing the­ory and whole brain emulation

Garrett BakerOct 25, 2023, 6:31 PM
61 points

27 votes

Overall karma indicates overall quality.

5 comments12 min readLW link

[Question] Ly­ing to chess play­ers for alignment

ZaneOct 25, 2023, 5:47 PM
100 points

48 votes

Overall karma indicates overall quality.

55 comments1 min readLW link

An­thropic, Google, Microsoft & OpenAI an­nounce Ex­ec­u­tive Direc­tor of the Fron­tier Model Fo­rum & over $10 mil­lion for a new AI Safety Fund

Zach Stein-PerlmanOct 25, 2023, 3:20 PM
31 points

16 votes

Overall karma indicates overall quality.

8 comments4 min readLW link
(www.frontiermodelforum.org)

“The Eco­nomics of Time Travel”—call for re­view­ers (Seeds of Science)

rogersbaconOct 25, 2023, 3:13 PM
4 points

3 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Com­po­si­tional prefer­ence mod­els for al­ign­ing LMs

Tomek KorbakOct 25, 2023, 12:17 PM
18 points

6 votes

Overall karma indicates overall quality.

2 comments5 min readLW link

[Question] Should the US House of Rep­re­sen­ta­tives adopt rank choice vot­ing for lead­er­ship po­si­tions?

jmhOct 25, 2023, 11:16 AM
16 points

4 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

Re­searchers be­lieve they have found a way for artists to fight back against AI style capture

vernamcipherOct 25, 2023, 10:54 AM
3 points

2 votes

Overall karma indicates overall quality.

1 comment1 min readLW link
(finance.yahoo.com)

Why We Disagree

zulupineappleOct 25, 2023, 10:50 AM
7 points

5 votes

Overall karma indicates overall quality.

2 comments2 min readLW link

Beyond the Data: Why aid to poor doesn’t work

LyrongolemOct 25, 2023, 5:03 AM
2 points

19 votes

Overall karma indicates overall quality.

31 comments12 min readLW link

An­nounc­ing Epoch’s newly ex­panded Pa­ram­e­ters, Com­pute and Data Trends in Ma­chine Learn­ing database

Oct 25, 2023, 2:55 AM
18 points

6 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(epochai.org)

What is a Se­quenc­ing Read?

jefftkOct 25, 2023, 2:10 AM
17 points

4 votes

Overall karma indicates overall quality.

2 comments2 min readLW link
(www.jefftk.com)

Ver­ifi­able pri­vate ex­e­cu­tion of ma­chine learn­ing mod­els with Risc0?

mako yassOct 25, 2023, 12:44 AM
30 points

8 votes

Overall karma indicates overall quality.

2 comments2 min readLW link

[Question] How to Re­solve Fore­casts With No Cen­tral Author­ity?

Nathan YoungOct 25, 2023, 12:28 AM
17 points

9 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

Thoughts on re­spon­si­ble scal­ing poli­cies and regulation

paulfchristianoOct 24, 2023, 10:21 PM
220 points

96 votes

Overall karma indicates overall quality.

34 comments6 min readLW link

The Screen­play Method

Yeshua GodOct 24, 2023, 5:41 PM
−15 points

4 votes

Overall karma indicates overall quality.

0 comments25 min readLW link

Blunt Razor

fryolysisOct 24, 2023, 5:27 PM
3 points

2 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

Hal­loween Problem

Saint BlasphemerOct 24, 2023, 4:46 PM
−10 points

5 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

Who is Harry Pot­ter? Some pre­dic­tions.

Donald HobsonOct 24, 2023, 4:14 PM
23 points

17 votes

Overall karma indicates overall quality.

7 comments2 min readLW link

Book Re­view: Go­ing Infinite

ZviOct 24, 2023, 3:00 PM
247 points

139 votes

Overall karma indicates overall quality.

113 comments97 min readLW link1 review
(thezvi.wordpress.com)

[In­ter­view w/​ Quintin Pope] Evolu­tion, val­ues, and AI Safety

fowlertmOct 24, 2023, 1:53 PM
11 points

5 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Ly­ing is Cowardice, not Strategy

Oct 24, 2023, 1:24 PM
30 points

154 votes

Overall karma indicates overall quality.

73 comments5 min readLW link
(cognition.cafe)

[Question] Any­one Else Us­ing Brilli­ant?

SableOct 24, 2023, 12:12 PM
19 points

10 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

An­nounc­ing #AISum­mitTalks fea­tur­ing Pro­fes­sor Stu­art Rus­sell and many others

otto.bartenOct 24, 2023, 10:11 AM
17 points

4 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

Linkpost: A Post Mortem on the Gino Case

LinchOct 24, 2023, 6:50 AM
89 points

33 votes

Overall karma indicates overall quality.

7 comments2 min readLW link
(www.theorgplumber.com)

South Bay SSC Meetup, San Jose, Novem­ber 5th.

David FriedmanOct 24, 2023, 4:50 AM
2 points

1 vote

Overall karma indicates overall quality.

1 comment1 min readLW link

AI Pause Will Likely Back­fire (Guest Post)

jsteinhardtOct 24, 2023, 4:30 AM
47 points

44 votes

Overall karma indicates overall quality.

6 comments15 min readLW link
(bounded-regret.ghost.io)

Hu­man wanting

TsviBTOct 24, 2023, 1:05 AM
53 points

19 votes

Overall karma indicates overall quality.

1 comment10 min readLW link

Towards Un­der­stand­ing Sy­co­phancy in Lan­guage Models

Oct 24, 2023, 12:30 AM
66 points

24 votes

Overall karma indicates overall quality.

0 comments2 min readLW link
(arxiv.org)

Man­i­fold Hal­loween Hackathon

Austin ChenOct 23, 2023, 10:47 PM
8 points

2 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Open Source Repli­ca­tion & Com­men­tary on An­thropic’s Dic­tionary Learn­ing Paper

Neel NandaOct 23, 2023, 10:38 PM
93 points

42 votes

Overall karma indicates overall quality.

12 comments9 min readLW link

The Shut­down Prob­lem: An AI Eng­ineer­ing Puz­zle for De­ci­sion Theorists

EJTOct 23, 2023, 9:00 PM
79 points

29 votes

Overall karma indicates overall quality.

22 comments39 min readLW link
(philpapers.org)

AI Align­ment [In­cre­men­tal Progress Units] this Week (10/​22/​23)

Logan ZoellnerOct 23, 2023, 8:32 PM
22 points

10 votes

Overall karma indicates overall quality.

0 comments6 min readLW link
(midwitalignment.substack.com)

z is not the cause of x

hrbigelowOct 23, 2023, 5:43 PM
6 points

5 votes

Overall karma indicates overall quality.

2 comments9 min readLW link

Some of my pre­dictable up­dates on AI

Aaron_ScherOct 23, 2023, 5:24 PM
32 points

15 votes

Overall karma indicates overall quality.

8 comments9 min readLW link

Pro­gram­matic back­doors: DNNs can use SGD to run ar­bi­trary state­ful computation

Oct 23, 2023, 4:37 PM
107 points

46 votes

Overall karma indicates overall quality.

3 comments8 min readLW link

Ma­chine Un­learn­ing Eval­u­a­tions as In­ter­pretabil­ity Benchmarks

Oct 23, 2023, 4:33 PM
33 points

17 votes

Overall karma indicates overall quality.

2 comments11 min readLW link

VLM-RM: Spec­i­fy­ing Re­wards with Nat­u­ral Language

Oct 23, 2023, 2:11 PM
20 points

6 votes

Overall karma indicates overall quality.

2 comments5 min readLW link
(far.ai)

Con­tra Dance Dialect Survey

jefftkOct 23, 2023, 1:40 PM
11 points

3 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(www.jefftk.com)

[Question] Which LessWrongers are (as­piring) YouTu­bers?

Mati_RoyOct 23, 2023, 1:21 PM
22 points

8 votes

Overall karma indicates overall quality.

13 comments1 min readLW link

[Question] What is an “anti-Oc­camian prior”?

ZaneOct 23, 2023, 2:26 AM
35 points

18 votes

Overall karma indicates overall quality.

22 comments1 min readLW link

An­nounc­ing Timaeus

Oct 22, 2023, 11:59 AM
188 points

83 votes

Overall karma indicates overall quality.

15 comments4 min readLW link

Into AI Safety—Epi­sode 0

jacobhaimesOct 22, 2023, 3:30 AM
5 points

4 votes

Overall karma indicates overall quality.

1 comment1 min readLW link
(into-ai-safety.github.io)

Thoughts On (Solv­ing) Deep Deception

JozdienOct 21, 2023, 10:40 PM
72 points

34 votes

Overall karma indicates overall quality.

6 comments6 min readLW link

Best effort beliefs

Adam ZernerOct 21, 2023, 10:05 PM
14 points

11 votes

Overall karma indicates overall quality.

9 comments4 min readLW link

How toy mod­els of on­tol­ogy changes can be misleading

Stuart_ArmstrongOct 21, 2023, 9:13 PM
42 points

16 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

Soups as Spreads

jefftkOct 21, 2023, 8:30 PM
22 points

14 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(www.jefftk.com)

Which COVID booster to get?

SameerishereOct 21, 2023, 7:43 PM
8 points

3 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

Align­ment Im­pli­ca­tions of LLM Suc­cesses: a De­bate in One Act

Zack_M_DavisOct 21, 2023, 3:22 PM
266 points

124 votes

Overall karma indicates overall quality.

56 comments13 min readLW link2 reviews