Cy­borg Pe­ri­ods: There will be mul­ti­ple AI transitions

Feb 22, 2023, 4:09 PM
108 points
9 comments6 min readLW link

The Open Agency Model

Eric DrexlerFeb 22, 2023, 10:35 AM
114 points
18 comments4 min readLW link

In­ter­ven­ing in the Resi­d­ual Stream

MadHatterFeb 22, 2023, 6:29 AM
30 points
1 comment9 min readLW link

What do lan­guage mod­els know about fic­tional char­ac­ters?

skybrianFeb 22, 2023, 5:58 AM
6 points
0 comments4 min readLW link

Power-Seek­ing = Min­imis­ing free energy

Jonas HallgrenFeb 22, 2023, 4:28 AM
21 points
10 comments7 min readLW link

The shal­low re­al­ity of ‘deep learn­ing the­ory’

Jesse HooglandFeb 22, 2023, 4:16 AM
34 points
11 comments3 min readLW link
(www.jessehoogland.com)

Candy­land is Terrible

jefftkFeb 22, 2023, 1:50 AM
16 points
2 comments1 min readLW link
(www.jefftk.com)

A proof of in­ner Löb’s theorem

James PayorFeb 21, 2023, 9:11 PM
13 points
0 comments2 min readLW link

Fight­ing For Our Lives—What Or­di­nary Peo­ple Can Do

TinkerBirdFeb 21, 2023, 8:36 PM
14 points
18 comments4 min readLW link

The Emo­tional Type of a Decision

moridinamaelFeb 21, 2023, 8:35 PM
13 points
0 comments4 min readLW link

What is it like do­ing AI safety work?

KatWoodsFeb 21, 2023, 8:12 PM
57 points
2 commentsLW link

Pre­train­ing Lan­guage Models with Hu­man Preferences

Feb 21, 2023, 5:57 PM
135 points
20 comments11 min readLW link2 reviews

A Stranger Pri­or­ity? Topics at the Outer Reaches of Effec­tive Altru­ism (my dis­ser­ta­tion)

Joe CarlsmithFeb 21, 2023, 5:26 PM
38 points
16 comments1 min readLW link

EIS X: Con­tinual Learn­ing, Mo­du­lar­ity, Com­pres­sion, and Biolog­i­cal Brains

scasperFeb 21, 2023, 4:59 PM
14 points
4 comments3 min readLW link

No Room for Poli­ti­cal Philosophy

Arturo MaciasFeb 21, 2023, 4:11 PM
−1 points
7 comments3 min readLW link

De­cep­tive Align­ment is <1% Likely by Default

DavidWFeb 21, 2023, 3:09 PM
89 points
31 comments14 min readLW link1 review

AI #1: Syd­ney and Bing

ZviFeb 21, 2023, 2:00 PM
171 points
45 comments61 min readLW link1 review
(thezvi.wordpress.com)

You’re not a simu­la­tion, ’cause you’re hallucinating

Stuart_ArmstrongFeb 21, 2023, 12:12 PM
25 points
6 comments1 min readLW link

Ba­sic facts about lan­guage mod­els dur­ing training

berenFeb 21, 2023, 11:46 AM
98 points
15 comments18 min readLW link

[Preprint] Pre­train­ing Lan­guage Models with Hu­man Preferences

GiulioFeb 21, 2023, 11:44 AM
12 points
0 comments1 min readLW link
(arxiv.org)

Break­ing the Op­ti­mizer’s Curse, and Con­se­quences for Ex­is­ten­tial Risks and Value Learning

Roger DearnaleyFeb 21, 2023, 9:05 AM
10 points
1 comment23 min readLW link

Medlife Cri­sis: “Why Do Peo­ple Keep Fal­ling For Things That Don’t Work?”

RomanHaukssonFeb 21, 2023, 6:22 AM
12 points
5 comments1 min readLW link
(www.youtube.com)

A foun­da­tion model ap­proach to value inference

senFeb 21, 2023, 5:09 AM
6 points
0 comments3 min readLW link

In­stru­men­tal­ity makes agents agenty

porbyFeb 21, 2023, 4:28 AM
20 points
7 comments6 min readLW link

Gam­ified nar­row re­verse imi­ta­tion learn­ing

TekhneMakreFeb 21, 2023, 4:26 AM
8 points
0 comments2 min readLW link

Feel­ings are Good, Actually

Gordon Seidoh WorleyFeb 21, 2023, 2:38 AM
18 points
1 comment4 min readLW link

AI al­ign­ment re­searchers don’t (seem to) stack

So8resFeb 21, 2023, 12:48 AM
193 points
40 comments3 min readLW link

EA & LW Fo­rum Weekly Sum­mary (6th − 19th Feb 2023)

Zoe WilliamsFeb 21, 2023, 12:26 AM
8 points
0 commentsLW link

What to think when a lan­guage model tells you it’s sentient

RobboFeb 21, 2023, 12:01 AM
9 points
6 comments6 min readLW link

On sec­ond thought, prompt in­jec­tions are prob­a­bly ex­am­ples of misalignment

lcFeb 20, 2023, 11:56 PM
22 points
5 comments1 min readLW link

Noth­ing Is Ever Taught Correctly

LVSNFeb 20, 2023, 10:31 PM
5 points
3 comments1 min readLW link

Be­hav­ioral and mechanis­tic defi­ni­tions (of­ten con­fuse AI al­ign­ment dis­cus­sions)

LawrenceCFeb 20, 2023, 9:33 PM
33 points
5 comments6 min readLW link

Val­ida­tor mod­els: A sim­ple ap­proach to de­tect­ing goodharting

berenFeb 20, 2023, 9:32 PM
14 points
1 comment4 min readLW link

There are no co­her­ence theorems

Feb 20, 2023, 9:25 PM
149 points
130 comments19 min readLW link1 review

[Question] Are there any AI safety rele­vant fully re­mote roles suit­able for some­one with 2-3 years of ma­chine learn­ing en­g­ineer­ing in­dus­try ex­pe­rience?

Malleable_shapeFeb 20, 2023, 7:57 PM
7 points
2 comments1 min readLW link

A cir­cuit for Python doc­strings in a 4-layer at­ten­tion-only transformer

Feb 20, 2023, 7:35 PM
96 points
8 comments21 min readLW link

Syd­ney the Bin­gena­tor Can’t Think, But It Still Threat­ens People

Valentin BaltadzhievFeb 20, 2023, 6:37 PM
−3 points
2 comments8 min readLW link

EIS IX: In­ter­pretabil­ity and Adversaries

scasperFeb 20, 2023, 6:25 PM
30 points
8 comments8 min readLW link

What AI com­pa­nies can do to­day to help with the most im­por­tant century

HoldenKarnofskyFeb 20, 2023, 5:00 PM
38 points
3 comments9 min readLW link
(www.cold-takes.com)

Ban­kless Pod­cast: 159 - We’re All Gonna Die with Eliezer Yudkowsky

bayesedFeb 20, 2023, 4:42 PM
83 points
54 comments1 min readLW link
(www.youtube.com)

Spec­u­la­tive Tech­nolo­gies launch and Ben Rein­hardt AMA

jasoncrawfordFeb 20, 2023, 4:33 PM
16 points
0 comments1 min readLW link
(rootsofprogress.org)

[MLSN #8] Mechanis­tic in­ter­pretabil­ity, us­ing law to in­form AI al­ign­ment, scal­ing laws for proxy gaming

Feb 20, 2023, 3:54 PM
20 points
0 comments4 min readLW link
(newsletter.mlsafety.org)

Bing find­ing ways to by­pass Microsoft’s filters with­out be­ing asked. Is it re­pro­ducible?

Christopher KingFeb 20, 2023, 3:11 PM
27 points
15 comments1 min readLW link

Me­tac­u­lus In­tro­duces New ‘Con­di­tional Pair’ Fore­cast Ques­tions for Mak­ing Con­di­tional Predictions

ChristianWilliamsFeb 20, 2023, 1:36 PM
40 points
0 commentsLW link

On In­ves­ti­gat­ing Con­spir­acy Theories

ZviFeb 20, 2023, 12:50 PM
116 points
38 comments5 min readLW link
(thezvi.wordpress.com)

The Es­ti­ma­tion Game: a monthly Fermi es­ti­ma­tion web app

Feb 20, 2023, 11:33 AM
20 points
2 comments1 min readLW link

The idea that ChatGPT is sim­ply “pre­dict­ing” the next word is, at best, misleading

Bill BenzonFeb 20, 2023, 11:32 AM
55 points
88 comments5 min readLW link

Rus­sell Con­ju­ga­tions list & vot­ing thread

Daniel KokotajloFeb 20, 2023, 6:39 AM
23 points
63 comments1 min readLW link

Emer­gent De­cep­tion and Emer­gent Optimization

jsteinhardtFeb 20, 2023, 2:40 AM
64 points
0 comments14 min readLW link
(bounded-regret.ghost.io)

AGI doesn’t need un­der­stand­ing, in­ten­tion, or con­scious­ness in or­der to kill us, only intelligence

James BlahaFeb 20, 2023, 12:55 AM
10 points
2 comments18 min readLW link