De­bate, Or­a­cles, and Obfus­cated Arguments

Jun 20, 2024, 11:14 PM
44 points
4 comments21 min readLW link

Eva­po­ra­tion of improvements

ViliamJun 20, 2024, 6:34 PM
29 points
27 comments2 min readLW link

In­ter­pret­ing and Steer­ing Fea­tures in Images

Gytis DaujotasJun 20, 2024, 6:33 PM
66 points
6 comments5 min readLW link

Claude 3.5 Sonnet

Zach Stein-PerlmanJun 20, 2024, 6:00 PM
75 points
41 comments1 min readLW link
(www.anthropic.com)

[Question] What is go­ing to hap­pen in a case of an AGI era where hu­mans are out of the game?

CipollaJun 20, 2024, 5:44 PM
−2 points
1 comment1 min readLW link

Jailbreak steer­ing generalization

Jun 20, 2024, 5:25 PM
41 points
4 comments2 min readLW link
(arxiv.org)

Case stud­ies on so­cial-welfare-based stan­dards in var­i­ous industries

HoldenKarnofskyJun 20, 2024, 1:33 PM
42 points
0 commentsLW link

AI #69: Nice

ZviJun 20, 2024, 12:40 PM
65 points
9 comments51 min readLW link
(thezvi.wordpress.com)

Niche product design

Itay DreyfusJun 20, 2024, 6:34 AM
2 points
1 comment3 min readLW link
(productidentity.co)

Data on AI

Jun 20, 2024, 6:31 AM
1 point
0 comments1 min readLW link
(epochai.org)

Ac­tu­ally, Power Plants May Be an AI Train­ing Bot­tle­neck.

Lao MeinJun 20, 2024, 4:41 AM
83 points
13 comments2 min readLW link

Propos­ing the Post-Sin­gu­lar­ity Sym­biotic Researches

Hiroshi YamakawaJun 20, 2024, 4:05 AM
6 points
1 comment12 min readLW link

Week One of Study­ing Trans­form­ers Architecture

JustisMillsJun 20, 2024, 3:47 AM
3 points
0 comments15 min readLW link
(justismills.substack.com)

[Question] What are things you’re al­lowed to do as a startup?

ElizabethJun 20, 2024, 12:01 AM
30 points
9 comments1 min readLW link

LessWrong/​ACX meetup Tran­sil­vanya tour—Alba Iulia

Marius Adrian NicoarăJun 19, 2024, 7:56 PM
1 point
1 comment1 min readLW link

Chronic perfec­tion­ism through the eyes of school reports

Stuart JohnsonJun 19, 2024, 5:46 PM
13 points
3 comments1 min readLW link

Ilya Sutskever cre­ated a new AGI startup

harfeJun 19, 2024, 5:17 PM
95 points
35 comments1 min readLW link
(ssi.inc)

Beyond the Board: Ex­plor­ing AI Ro­bust­ness Through Go

AdamGleaveJun 19, 2024, 4:40 PM
41 points
2 comments1 min readLW link
(far.ai)

A study on cults and non-cults—an­swer ques­tions about a group and get a cult score

spencergJun 19, 2024, 2:30 PM
1 point
8 comments1 min readLW link
(www.guidedtrack.com)

Work­shop: data anal­y­sis for soft­ware engineers

Derek M. JonesJun 19, 2024, 2:20 PM
2 points
0 comments1 min readLW link

FLEXIBLE AND ADAPTABLE LLM’s WITH CONTINUOUS SELF TRAINING

Escaque 66Jun 19, 2024, 2:17 PM
−11 points
0 comments3 min readLW link

Sur­viv­ing Seveneves

Yair HalberstadtJun 19, 2024, 1:11 PM
41 points
4 comments11 min readLW link

Self re­spon­si­bil­ity

EloJun 19, 2024, 10:17 AM
17 points
3 comments2 min readLW link

Gizmo Watch Review

jefftkJun 18, 2024, 8:00 PM
22 points
4 comments6 min readLW link
(www.jefftk.com)

Boy­cott OpenAI

PeterMcCluskeyJun 18, 2024, 7:52 PM
164 points
26 comments1 min readLW link
(bayesianinvestor.com)

Lov­ing a world you don’t trust

Joe CarlsmithJun 18, 2024, 7:31 PM
135 points
13 comments33 min readLW link

Book re­view: the Iliad

philhJun 18, 2024, 6:50 PM
31 points
2 comments14 min readLW link
(reasonableapproximation.net)

AI Safety Newslet­ter #37: US Launches An­titrust In­ves­ti­ga­tions Plus, re­cent crit­i­cisms of OpenAI and An­thropic, and a sum­mary of Si­tu­a­tional Awareness

Jun 18, 2024, 6:07 PM
8 points
0 comments5 min readLW link
(newsletter.safe.ai)

Suffer­ing Is Not Pain

jbkjrJun 18, 2024, 6:04 PM
34 points
45 comments5 min readLW link
(jbkjr.me)

Lam­ini’s Tar­geted Hal­lu­ci­na­tion Re­duc­tion May Be a Big Deal for Job Automation

sweenesmJun 18, 2024, 3:29 PM
3 points
0 comments1 min readLW link

On Deep­Mind’s Fron­tier Safety Framework

ZviJun 18, 2024, 1:30 PM
37 points
4 comments8 min readLW link
(thezvi.wordpress.com)

[Linkpost] Tran­scen­dence: Gen­er­a­tive Models Can Out­perform The Ex­perts That Train Them

Bogdan Ionut CirsteaJun 18, 2024, 11:00 AM
19 points
3 comments1 min readLW link
(arxiv.org)

I would have shit in that alley, too

Declan MolonyJun 18, 2024, 4:41 AM
462 points
134 comments4 min readLW link

[Question] The thing I don’t un­der­stand about AGI

Jeremy KalfusJun 18, 2024, 4:25 AM
7 points
12 comments1 min readLW link

Cal­ling My Se­cond Fam­ily Dance

jefftkJun 18, 2024, 2:20 AM
11 points
0 comments1 min readLW link
(www.jefftk.com)

LLM-Se­cured Sys­tems: A Gen­eral-Pur­pose Tool For Struc­tured Transparency

ozziegooenJun 18, 2024, 12:21 AM
10 points
1 commentLW link

D&D.Sci Alchemy: Arch­mage Anachronos and the Sup­ply Chain Is­sues Eval­u­a­tion & Ruleset

aphyerJun 17, 2024, 9:29 PM
51 points
11 comments6 min readLW link

Ques­tion­able Nar­ra­tives of “Si­tu­a­tional Aware­ness”

fergusqJun 17, 2024, 9:01 PM
0 points
1 comment1 min readLW link
(forum.effectivealtruism.org)

ZuVillage Ge­or­gia – Mis­sion Statement

BurnsJun 17, 2024, 7:53 PM
3 points
3 comments9 min readLW link

Get­ting 50% (SoTA) on ARC-AGI with GPT-4o

ryan_greenblattJun 17, 2024, 6:44 PM
263 points
50 comments13 min readLW link

Sy­co­phancy to sub­ter­fuge: In­ves­ti­gat­ing re­ward tam­per­ing in large lan­guage models

Jun 17, 2024, 6:41 PM
161 points
22 comments8 min readLW link
(arxiv.org)

La­bor Par­ti­ci­pa­tion is a High-Pri­or­ity AI Align­ment Risk

alexJun 17, 2024, 6:09 PM
6 points
0 comments17 min readLW link

Towards a Less Bul­lshit Model of Semantics

Jun 17, 2024, 3:51 PM
94 points
44 comments21 min readLW link

Analysing Ad­ver­sar­ial At­tacks with Lin­ear Probing

Jun 17, 2024, 2:16 PM
9 points
0 comments8 min readLW link

What’s the fu­ture of AI hard­ware?

Itay DreyfusJun 17, 2024, 1:05 PM
2 points
0 comments8 min readLW link
(productidentity.co)

OpenAI #8: The Right to Warn

ZviJun 17, 2024, 12:00 PM
97 points
8 comments34 min readLW link
(thezvi.wordpress.com)

Logit Prisms: De­com­pos­ing Trans­former Out­puts for Mechanis­tic Interpretability

ntt123Jun 17, 2024, 11:46 AM
5 points
4 comments6 min readLW link
(neuralblog.github.io)

Weak AGIs Kill Us First

yrimonJun 17, 2024, 11:13 AM
15 points
4 comments9 min readLW link

[Linkpost] Guardian ar­ti­cle cov­er­ing Light­cone In­fras­truc­ture, Man­i­fest and CFAR ties to FTX

ROMJun 17, 2024, 10:05 AM
8 points
9 comments1 min readLW link
(www.theguardian.com)

Fat Tails Dis­cour­age Compromise

niplavJun 17, 2024, 9:39 AM
53 points
5 comments1 min readLW link