D&D.Sci Alchemy: Arch­mage Anachronos and the Sup­ply Chain Is­sues Eval­u­a­tion & Ruleset

aphyerJun 17, 2024, 9:29 PM
61 points

17 votes

Overall karma indicates overall quality.

11 comments6 min readLW link

Ques­tion­able Nar­ra­tives of “Si­tu­a­tional Aware­ness”

fergusqJun 17, 2024, 9:01 PM
0 points

5 votes

Overall karma indicates overall quality.

1 comment1 min readLW link
(forum.effectivealtruism.org)

ZuVillage Ge­or­gia – Mis­sion Statement

BurnsJun 17, 2024, 7:53 PM
3 points

13 votes

Overall karma indicates overall quality.

3 comments9 min readLW link

Get­ting 50% (SoTA) on ARC-AGI with GPT-4o

ryan_greenblattJun 17, 2024, 6:44 PM
263 points

127 votes

Overall karma indicates overall quality.

50 comments13 min readLW link

Sy­co­phancy to sub­ter­fuge: In­ves­ti­gat­ing re­ward tam­per­ing in large lan­guage models

Jun 17, 2024, 6:41 PM
163 points

54 votes

Overall karma indicates overall quality.

22 comments8 min readLW link
(arxiv.org)

La­bor Par­ti­ci­pa­tion is a High-Pri­or­ity AI Align­ment Risk

alexJun 17, 2024, 6:09 PM
7 points

6 votes

Overall karma indicates overall quality.

0 comments17 min readLW link

Towards a Less Bul­lshit Model of Semantics

Jun 17, 2024, 3:51 PM
94 points

37 votes

Overall karma indicates overall quality.

44 comments21 min readLW link

Analysing Ad­ver­sar­ial At­tacks with Lin­ear Probing

Jun 17, 2024, 2:16 PM
9 points

5 votes

Overall karma indicates overall quality.

0 comments8 min readLW link

What’s the fu­ture of AI hard­ware?

Itay DreyfusJun 17, 2024, 1:05 PM
2 points

3 votes

Overall karma indicates overall quality.

0 comments8 min readLW link
(productidentity.co)

OpenAI #8: The Right to Warn

ZviJun 17, 2024, 12:00 PM
97 points

38 votes

Overall karma indicates overall quality.

8 comments34 min readLW link
(thezvi.wordpress.com)

Logit Prisms: De­com­pos­ing Trans­former Out­puts for Mechanis­tic Interpretability

ntt123Jun 17, 2024, 11:46 AM
5 points

5 votes

Overall karma indicates overall quality.

4 comments6 min readLW link
(neuralblog.github.io)

Weak AGIs Kill Us First

yrimonJun 17, 2024, 11:13 AM
15 points

9 votes

Overall karma indicates overall quality.

4 comments9 min readLW link

[Linkpost] Guardian ar­ti­cle cov­er­ing Light­cone In­fras­truc­ture, Man­i­fest and CFAR ties to FTX

ROMJun 17, 2024, 10:05 AM
8 points

9 votes

Overall karma indicates overall quality.

9 comments1 min readLW link
(www.theguardian.com)

Fat Tails Dis­cour­age Compromise

niplavJun 17, 2024, 9:39 AM
53 points

27 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

Our In­tu­itions About The Crim­i­nal Jus­tice Sys­tem Are Screwed Up

Bentham's BulldogJun 17, 2024, 6:22 AM
12 points

11 votes

Overall karma indicates overall quality.

15 comments4 min readLW link

A Case for Co­op­er­a­tion: Depen­dence in the Pri­soner’s Dilemma

grantstengerJun 17, 2024, 1:10 AM
10 points

8 votes

Overall karma indicates overall quality.

3 comments23 min readLW link

De­gen­era­cies are sticky for SGD

Jun 16, 2024, 9:19 PM
56 points

25 votes

Overall karma indicates overall quality.

1 comment16 min readLW link

YM’s Shortform

YMJun 16, 2024, 8:57 PM
3 points

2 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

“Is-Ought” is Fraught

MiSteR KitttyJun 16, 2024, 5:27 PM
−5 points

6 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

The type of AI hu­man­ity has cho­sen to cre­ate so far is un­safe, for soft so­cial rea­sons and not tech­ni­cal ones.

l8cJun 16, 2024, 1:31 PM
−6 points

7 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Self-Con­trol of LLM Be­hav­iors by Com­press­ing Suffix Gra­di­ent into Pre­fix Controller

Henry CaiJun 16, 2024, 1:01 PM
7 points

7 votes

Overall karma indicates overall quality.

0 comments7 min readLW link
(arxiv.org)

CIV: a story

Richard_NgoJun 15, 2024, 10:36 PM
99 points

56 votes

Overall karma indicates overall quality.

6 comments9 min readLW link
(www.narrativeark.xyz)

Yann LeCun: We only de­sign ma­chines that min­i­mize costs [there­fore they are safe]

tailcalledJun 15, 2024, 5:25 PM
19 points

13 votes

Overall karma indicates overall quality.

8 comments1 min readLW link
(twitter.com)

(Ap­pet­i­tive, Con­sum­ma­tory) ≈ (RL, re­flex)

Steven ByrnesJun 15, 2024, 3:57 PM
38 points

14 votes

Overall karma indicates overall quality.

1 comment3 min readLW link

Two LessWrong speed friend­ing experiments

Jun 15, 2024, 10:52 AM
52 points

38 votes

Overall karma indicates overall quality.

3 comments4 min readLW link

Claude’s dark spiritual AI futurism

jessicataJun 15, 2024, 12:57 AM
22 points

12 votes

Overall karma indicates overall quality.

7 comments43 min readLW link
(unstableontology.com)

[Question] When is “un­falsifi­able im­plies false” in­cor­rect?

VojtaKovarikJun 15, 2024, 12:28 AM
3 points

9 votes

Overall karma indicates overall quality.

11 comments1 min readLW link

MIRI’s June 2024 Newsletter

HarlanJun 14, 2024, 11:02 PM
74 points

27 votes

Overall karma indicates overall quality.

20 comments2 min readLW link
(intelligence.org)

Lan­guage for Goal Mis­gen­er­al­iza­tion: Some For­mal­isms from my MSc Thesis

GiulioJun 14, 2024, 7:35 PM
10 points

6 votes

Overall karma indicates overall quality.

0 comments8 min readLW link
(www.giuliostarace.com)

Shard The­ory—is it true for hu­mans?

RishikaJun 14, 2024, 7:21 PM
71 points

32 votes

Overall karma indicates overall quality.

7 comments15 min readLW link

When fine-tun­ing fails to elicit GPT-3.5′s chess abilities

Theodore ChapmanJun 14, 2024, 6:50 PM
42 points

24 votes

Overall karma indicates overall quality.

3 comments9 min readLW link

Re­sults from the AI x Democ­racy Re­search Sprint

Jun 14, 2024, 4:40 PM
13 points

7 votes

Overall karma indicates overall quality.

0 comments6 min readLW link

Ra­tional An­i­ma­tions’ in­tro to mechanis­tic interpretability

WriterJun 14, 2024, 4:10 PM
45 points

20 votes

Overall karma indicates overall quality.

1 comment11 min readLW link
(youtu.be)

Why keep a di­ary, and why wish for large lan­guage models

DanielFilanJun 14, 2024, 4:10 PM
9 points

7 votes

Overall karma indicates overall quality.

1 comment2 min readLW link
(danielfilan.com)

The Leopold Model: Anal­y­sis and Reactions

ZviJun 14, 2024, 3:10 PM
109 points

30 votes

Overall karma indicates overall quality.

19 comments57 min readLW link
(thezvi.wordpress.com)

[Question] Thoughts on Fran­cois Chol­let’s be­lief that LLMs are far away from AGI?

O OJun 14, 2024, 6:32 AM
26 points

12 votes

Overall karma indicates overall quality.

17 comments1 min readLW link

Re­search Re­port: Alter­na­tive spar­sity meth­ods for sparse au­toen­coders with Othel­loGPT.

Andrew QuaisleyJun 14, 2024, 12:57 AM
17 points

9 votes

Overall karma indicates overall quality.

5 comments12 min readLW link

Slowed ASI—a pos­si­ble tech­ni­cal strat­egy for alignment

Lester LeongJun 14, 2024, 12:57 AM
5 points

8 votes

Overall karma indicates overall quality.

2 comments3 min readLW link

Con­cep­tual Ty­pog­ra­phy Example

milanroskoJun 14, 2024, 12:39 AM
15 points

15 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Safety isn’t safety with­out a so­cial model (or: dis­pel­ling the myth of per se tech­ni­cal safety)

Andrew_CritchJun 14, 2024, 12:16 AM
357 points

170 votes

Overall karma indicates overall quality.

38 comments4 min readLW link

OpenAI ap­points Re­tired U.S. Army Gen­eral Paul M. Naka­sone to Board of Directors

Joel BurgetJun 13, 2024, 9:28 PM
35 points

20 votes

Overall karma indicates overall quality.

10 comments1 min readLW link
(openai.com)

AI #68: Re­mark­ably Rea­son­able Reactions

ZviJun 13, 2024, 4:30 PM
46 points

27 votes

Overall karma indicates overall quality.

11 comments50 min readLW link
(thezvi.wordpress.com)

Four Fu­tures For Cog­ni­tive Labor

Maxwell TabarrokJun 13, 2024, 12:56 PM
14 points

11 votes

Overall karma indicates overall quality.

11 comments4 min readLW link
(www.maximum-progress.com)

Un­der­rated Proverbs

Arjun PanicksseryJun 13, 2024, 12:30 PM
13 points

14 votes

Overall karma indicates overall quality.

9 comments1 min readLW link
(arjunpanickssery.substack.com)

[Paper] AI Sand­bag­ging: Lan­guage Models can Strate­gi­cally Un­der­perform on Evaluations

Jun 13, 2024, 10:04 AM
84 points

35 votes

Overall karma indicates overall quality.

10 comments2 min readLW link
(arxiv.org)

Prob­a­bly Not a Ghost Story

George IngebretsenJun 12, 2024, 10:55 PM
27 points

13 votes

Overall karma indicates overall quality.

4 comments3 min readLW link

AiPhone

ZviJun 12, 2024, 10:20 PM
63 points

23 votes

Overall karma indicates overall quality.

4 comments14 min readLW link
(thezvi.wordpress.com)

microwave drilling is impractical

bhauthJun 12, 2024, 10:16 PM
59 points

32 votes

Overall karma indicates overall quality.

19 comments4 min readLW link
(www.bhauth.com)

Phonose­man­tic Duplication

bitcoinssgJun 12, 2024, 8:19 PM
5 points

3 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

My AI Model Delta Com­pared To Christiano

johnswentworthJun 12, 2024, 6:19 PM
191 points

92 votes

Overall karma indicates overall quality.

74 comments4 min readLW link