Are AIs like An­i­mals? Per­spec­tives and Strate­gies from Biology

Jackson EmanuelMay 16, 2023, 11:39 PM
1 point
0 comments21 min readLW link

A Mechanis­tic In­ter­pretabil­ity Anal­y­sis of a GridWorld Agent-Si­mu­la­tor (Part 1 of N)

Joseph BloomMay 16, 2023, 10:59 PM
36 points
2 comments16 min readLW link

A TAI which kills all hu­mans might also doom itself

Jeffrey HeningerMay 16, 2023, 10:36 PM
7 points
3 comments3 min readLW link

Brief notes on the Se­nate hear­ing on AI oversight

DizietMay 16, 2023, 10:29 PM
77 points
2 comments2 min readLW link

$500 Bounty/​Prize Prob­lem: Chan­nel Ca­pac­ity Us­ing “Insen­si­tive” Functions

johnswentworthMay 16, 2023, 9:31 PM
40 points
11 comments2 min readLW link

Progress links and tweets, 2023-05-16

jasoncrawfordMay 16, 2023, 8:54 PM
14 points
0 comments1 min readLW link
(rootsofprogress.org)

AI Will Not Want to Self-Improve

petersalibMay 16, 2023, 8:53 PM
28 points
24 comments20 min readLW link

Nice in­tro video to RSI

Nathan Helm-BurgerMay 16, 2023, 6:48 PM
12 points
0 comments1 min readLW link
(youtu.be)

[In­ter­view w/​ Zvi Mow­show­itz] Should we halt progress in AI?

fowlertmMay 16, 2023, 6:12 PM
18 points
2 comments3 min readLW link

AI Risk & Policy Fore­casts from Me­tac­u­lus & FLI’s AI Path­ways Workshop

_will_May 16, 2023, 6:06 PM
11 points
4 comments8 min readLW link

[Question] Why doesn’t the pres­ence of log-loss for prob­a­bil­is­tic mod­els (e.g. se­quence pre­dic­tion) im­ply that any util­ity func­tion ca­pa­ble of pro­duc­ing a “fairly ca­pa­ble” agent will have at least some non-neg­ligible frac­tion of over­lap with hu­man val­ues?

Thoth HermesMay 16, 2023, 6:02 PM
2 points
0 comments1 min readLW link

De­ci­sion The­ory with the Magic Parts Highlighted

moridinamaelMay 16, 2023, 5:39 PM
175 points
24 comments5 min readLW link

We learn long-last­ing strate­gies to pro­tect our­selves from dan­ger and rejection

Richard_NgoMay 16, 2023, 4:36 PM
86 points
5 comments5 min readLW link

Pro­posal: Align Sys­tems Ear­lier In Training

OneManyNoneMay 16, 2023, 4:24 PM
18 points
0 comments11 min readLW link

Pro­ce­du­ral Ex­ec­u­tive Func­tion, Part 2

DaystarEldMay 16, 2023, 4:22 PM
24 points
0 comments18 min readLW link
(daystareld.com)

My cur­rent work­flow to study the in­ter­nal mechanisms of LLM

Yulu PiMay 16, 2023, 3:27 PM
4 points
0 comments1 min readLW link

Pro­posal: we should start refer­ring to the risk from un­al­igned AI as a type of *ac­ci­dent risk*

Christopher KingMay 16, 2023, 3:18 PM
22 points
6 comments2 min readLW link

AI Safety Newslet­ter #6: Ex­am­ples of AI safety progress, Yoshua Ben­gio pro­poses a ban on AI agents, and les­sons from nu­clear arms control

May 16, 2023, 3:14 PM
31 points
0 comments6 min readLW link
(newsletter.safe.ai)

Lazy Baked Mac and Cheese

jefftkMay 16, 2023, 2:40 PM
18 points
2 comments1 min readLW link
(www.jefftk.com)

Tyler Cowen’s challenge to de­velop an ‘ac­tual math­e­mat­i­cal model’ for AI X-Risk

Joe BrentonMay 16, 2023, 11:57 AM
6 points
4 comments1 min readLW link

Eval­u­at­ing Lan­guage Model Be­havi­ours for Shut­down Avoidance in Tex­tual Scenarios

May 16, 2023, 10:53 AM
26 points
0 comments13 min readLW link

[Re­view] Two Peo­ple Smok­ing Be­hind the Supermarket

lsusrMay 16, 2023, 7:25 AM
32 points
1 comment1 min readLW link

Su­per­po­si­tion and Dropout

Edoardo PonaMay 16, 2023, 7:24 AM
21 points
5 comments6 min readLW link

[Question] What is the liter­a­ture on long term wa­ter fasts?

lcMay 16, 2023, 3:23 AM
16 points
4 comments1 min readLW link

Les­sons learned from offer­ing in-office nu­tri­tional testing

ElizabethMay 15, 2023, 11:20 PM
80 points
11 comments14 min readLW link
(acesounderglass.com)

Judg­ments of­ten smug­gle in im­plicit standards

Richard_NgoMay 15, 2023, 6:50 PM
95 points
4 comments3 min readLW link

Ra­tional re­tire­ment plans

IkMay 15, 2023, 5:49 PM
5 points
17 comments1 min readLW link

[Question] (Cross­post) Ask­ing for on­line calls on AI s-risks dis­cus­sions

jackchang110May 15, 2023, 5:42 PM
1 point
0 comments1 min readLW link
(forum.effectivealtruism.org)

Sim­ple ex­per­i­ments with de­cep­tive alignment

Andreas_MoeMay 15, 2023, 5:41 PM
7 points
0 comments4 min readLW link

Some Sum­maries of Agent Foun­da­tions Work

mattmacdermottMay 15, 2023, 4:09 PM
62 points
1 comment13 min readLW link

Face­book In­creased Visibility

jefftkMay 15, 2023, 3:40 PM
15 points
1 comment1 min readLW link
(www.jefftk.com)

Un-un­plug­ga­bil­ity—can’t we just un­plug it?

Oliver SourbutMay 15, 2023, 1:23 PM
26 points
10 comments12 min readLW link
(www.oliversourbut.net)

[Question] Can we learn much by study­ing the be­havi­our of RL poli­cies?

AidanGothMay 15, 2023, 12:56 PM
1 point
0 comments1 min readLW link

How I ap­ply (so-called) Non-Violent Communication

Kaj_SotalaMay 15, 2023, 9:56 AM
86 points
28 comments3 min readLW link

Let’s build a fire alarm for AGI

chaosmageMay 15, 2023, 9:16 AM
−1 points
0 comments2 min readLW link

From fear to excitement

Richard_NgoMay 15, 2023, 6:23 AM
132 points
9 comments3 min readLW link

Re­ward is the op­ti­miza­tion tar­get (of ca­pa­bil­ities re­searchers)

Max HMay 15, 2023, 3:22 AM
32 points
4 comments5 min readLW link

The Light­cone The­o­rem: A Bet­ter Foun­da­tion For Nat­u­ral Ab­strac­tion?

johnswentworthMay 15, 2023, 2:24 AM
69 points
25 comments6 min readLW link

GovAI: Towards best prac­tices in AGI safety and gov­er­nance: A sur­vey of ex­pert opinion

Zach Stein-PerlmanMay 15, 2023, 1:42 AM
28 points
11 comments1 min readLW link
(arxiv.org)

[Question] Why don’t quan­tiliz­ers also cut off the up­per end of the dis­tri­bu­tion?

Alex_AltairMay 15, 2023, 1:40 AM
25 points
2 comments1 min readLW link

Sup­port Struc­tures for Nat­u­ral­ist Study

LoganStrohlMay 15, 2023, 12:25 AM
47 points
6 comments10 min readLW link

Catas­trophic Re­gres­sional Good­hart: Appendix

May 15, 2023, 12:10 AM
25 points
1 comment9 min readLW link

Helping your Se­na­tor Pre­pare for the Up­com­ing Sam Alt­man Hearing

Tiago de VassalMay 14, 2023, 10:45 PM
69 points
2 comments1 min readLW link
(aisafetytour.com)

Difficul­ties in mak­ing pow­er­ful al­igned AI

DanielFilanMay 14, 2023, 8:50 PM
41 points
1 comment10 min readLW link
(danielfilan.com)

How much do mar­kets value Open AI?

XodarapMay 14, 2023, 7:28 PM
21 points
5 commentsLW link

Misal­igned AGI Death Match

Nate Reinar WindwoodMay 14, 2023, 6:00 PM
1 point
0 comments1 min readLW link

[Question] What new tech­nol­ogy, for what in­sti­tu­tions?

bhauth14 May 2023 17:33 UTC
29 points
6 comments3 min readLW link

A strong mind con­tinues its tra­jec­tory of creativity

TsviBT14 May 2023 17:24 UTC
22 points
8 comments6 min readLW link

On­tolo­gies Should Be Back­wards-Compatible

Thoth Hermes14 May 2023 17:21 UTC
3 points
3 comments4 min readLW link
(thothhermes.substack.com)

Jaan Tal­linn’s 2022 Philan­thropy Overview

jaan14 May 2023 15:35 UTC
64 points
2 comments1 min readLW link
(jaan.online)