[Question] AI in­ter­pretabil­ity could be harm­ful?

Roman LeventovMay 10, 2023, 8:43 PM
13 points
2 comments1 min readLW link

Athens, Greece – ACX Mee­tups Every­where Spring 2023

Spyros DovasMay 10, 2023, 7:45 PM
1 point
0 comments1 min readLW link

Bet­ter debates

TsviBTMay 10, 2023, 7:34 PM
78 points
7 comments3 min readLW link

Men­tal Health and the Align­ment Prob­lem: A Com­pila­tion of Re­sources (up­dated April 2023)

May 10, 2023, 7:04 PM
256 points
54 comments21 min readLW link

A Cor­rigi­bil­ity Me­taphore—Big Gambles

WCargoMay 10, 2023, 6:13 PM
16 points
0 comments4 min readLW link

Roadmap for a col­lab­o­ra­tive pro­to­type of an Open Agency Architecture

Deger TuranMay 10, 2023, 5:41 PM
31 points
0 comments12 min readLW link

AGI-Au­to­mated In­ter­pretabil­ity is Suicide

__RicG__May 10, 2023, 2:20 PM
25 points
33 comments7 min readLW link

Class-Based Addressing

jefftkMay 10, 2023, 1:40 PM
22 points
6 comments1 min readLW link
(www.jefftk.com)

In defence of epistemic mod­esty [dis­til­la­tion]

LuiseMay 10, 2023, 9:44 AM
17 points
2 comments9 min readLW link

[Question] How much of a con­cern are open-source LLMs in the short, medium and long terms?

JavierCCMay 10, 2023, 9:14 AM
5 points
0 comments1 min readLW link

10 great rea­sons why Lex Frid­man should in­vite Eliezer and Robin to re-do the FOOM de­bate on his podcast

chaosmageMay 10, 2023, 8:27 AM
−7 points
1 comment1 min readLW link
(www.reddit.com)

New OpenAI Paper—Lan­guage mod­els can ex­plain neu­rons in lan­guage models

MrThinkMay 10, 2023, 7:46 AM
47 points
14 comments1 min readLW link

Nat­u­ral­ist Experimentation

LoganStrohlMay 10, 2023, 4:28 AM
62 points
14 comments10 min readLW link

[Question] Could A Su­per­in­tel­li­gence Out-Ar­gue A Doomer?

tjaffeeMay 10, 2023, 2:40 AM
−16 points
6 comments1 min readLW link

Gra­di­ent hack­ing via ac­tual hacking

Max HMay 10, 2023, 1:57 AM
12 points
7 comments3 min readLW link

Red team­ing: challenges and re­search directions

joshcMay 10, 2023, 1:40 AM
31 points
1 comment10 min readLW link

[Question] Look­ing for a post I read if any­one rec­og­nizes it

SilverFlameMay 10, 2023, 1:24 AM
2 points
2 comments1 min readLW link

Re­search Re­port: In­cor­rect­ness Cas­cades (Cor­rected)

Robert_AIZIMay 9, 2023, 9:54 PM
9 points
0 comments9 min readLW link
(aizi.substack.com)

Stop­ping dan­ger­ous AI: Ideal US behavior

Zach Stein-PerlmanMay 9, 2023, 9:00 PM
17 points
0 comments3 min readLW link

Stop­ping dan­ger­ous AI: Ideal lab behavior

Zach Stein-PerlmanMay 9, 2023, 9:00 PM
8 points
0 comments2 min readLW link

Progress links and tweets, 2023-05-09

jasoncrawfordMay 9, 2023, 8:22 PM
14 points
0 comments2 min readLW link
(rootsofprogress.org)

[Question] Have you heard about MIT’s “liquid neu­ral net­works”? What do you think about them?

PpauMay 9, 2023, 8:16 PM
35 points
14 comments1 min readLW link

Re­spect for Boundaries as non-ar­bir­trary co­or­di­na­tion norms

Jonas HallgrenMay 9, 2023, 7:42 PM
9 points
3 comments7 min readLW link

Solv­ing the Mechanis­tic In­ter­pretabil­ity challenges: EIS VII Challenge 1

May 9, 2023, 7:41 PM
119 points
1 comment10 min readLW link

Fore­cast­ing as a tool for teach­ing the gen­eral pub­lic to make bet­ter judge­ments?

Dominik Hajduk | České priorityMay 9, 2023, 5:35 PM
3 points
0 comments3 min readLW link

Lan­guage mod­els can ex­plain neu­rons in lan­guage models

nzMay 9, 2023, 5:29 PM
23 points
0 comments1 min readLW link
(openai.com)

Asi­mov on build­ing robots with­out the First Law

rossryMay 9, 2023, 4:44 PM
4 points
1 comment2 min readLW link

Mak­ing Up Baby Signs

jefftkMay 9, 2023, 4:40 PM
44 points
6 comments2 min readLW link
(www.jefftk.com)

Ex­cit­ing New In­ter­pretabil­ity Paper!

research_prime_spaceMay 9, 2023, 4:39 PM
12 points
1 comment1 min readLW link

Re­sult Of The Bounty/​Con­test To Ex­plain In­fra-Bayes In The Lan­guage Of Game Theory

johnswentworthMay 9, 2023, 4:35 PM
79 points
0 comments1 min readLW link

The Bleak Har­mony of Diets and Sur­vival: A Glimpse into Na­ture’s Un­for­giv­ing Balance

bardstaleMay 9, 2023, 4:08 PM
−16 points
0 comments1 min readLW link

En­tropic Abyss

bardstaleMay 9, 2023, 3:59 PM
−12 points
0 comments2 min readLW link

AI Safety Newslet­ter #5: Ge­offrey Hin­ton speaks out on AI risk, the White House meets with AI labs, and Tro­jan at­tacks on lan­guage models

May 9, 2023, 3:26 PM
28 points
1 comment4 min readLW link
(newsletter.safe.ai)

A Search for More ChatGPT /​ GPT-3.5 /​ GPT-4 “Un­speak­able” Glitch Tokens

Martin FellMay 9, 2023, 2:36 PM
26 points
9 comments6 min readLW link

How to In­ter­pret Pre­dic­tion Mar­ket Prices as Probabilities

SimonMMay 9, 2023, 2:12 PM
14 points
1 comment4 min readLW link

Stampy’s AI Safety Info—New Distil­la­tions #2 [April 2023]

markovMay 9, 2023, 1:31 PM
25 points
1 comment1 min readLW link
(aisafety.info)

Quote quiz answer

jasoncrawfordMay 9, 2023, 1:27 PM
19 points
0 comments4 min readLW link
(rootsofprogress.org)

[Question] Does re­versible com­pu­ta­tion let you com­pute the com­plex­ity class PSPACE as effi­ciently as nor­mal com­put­ers com­pute the com­plex­ity class P?

Noosphere89May 9, 2023, 1:18 PM
6 points
14 comments1 min readLW link

EconTalk pod­cast: “Eliezer Yud­kowsky on the Dangers of AI”

TekhneMakreMay 9, 2023, 11:14 AM
15 points
1 comment1 min readLW link
(www.econtalk.org)

Most peo­ple should prob­a­bly feel safe most of the time

Kaj_SotalaMay 9, 2023, 9:35 AM
95 points
28 comments10 min readLW link

Sum­maries of top fo­rum posts (1st to 7th May 2023)

Zoe WilliamsMay 9, 2023, 9:30 AM
21 points
0 commentsLW link

Fo­cus­ing on longevity re­search as a way to avoid the AI apocalypse

Random TraderMay 9, 2023, 4:47 AM
14 points
2 comments2 min readLW link

When is Good­hart catas­trophic?

May 9, 2023, 3:59 AM
180 points
29 comments8 min readLW link1 review

Chilean AIS Hackathon Retrospective

agucovaMay 9, 2023, 1:34 AM
9 points
0 commentsLW link

An­nounc­ing “Key Phenom­ena in AI Risk” (fa­cil­i­tated read­ing group)

May 9, 2023, 12:31 AM
65 points
4 comments2 min readLW link

Yoshua Ben­gio ar­gues for tool-AI and to ban “ex­ec­u­tive-AI”

habrykaMay 9, 2023, 12:13 AM
53 points
15 comments7 min readLW link
(yoshuabengio.org)

South Bay ACX/​LW Meetup

ISMay 8, 2023, 11:55 PM
2 points
0 comments1 min readLW link

H-JEPA might be tech­ni­cally al­ignable in a mod­ified form

Roman LeventovMay 8, 2023, 11:04 PM
12 points
2 comments7 min readLW link

All AGI Safety ques­tions wel­come (es­pe­cially ba­sic ones) [May 2023]

steven0461May 8, 2023, 10:30 PM
33 points
44 comments2 min readLW link

Pre­dictable up­dat­ing about AI risk

Joe CarlsmithMay 8, 2023, 9:53 PM
293 points
25 comments36 min readLW link1 review