Higher-effort sum­mer sols­tice: What if we used AI (i.e., An­gel Is­land)?

Rachel ShuJun 25, 2024, 1:35 AM
46 points
9 comments3 min readLW link

Ra­tional An­i­ma­tions’ in­tro to mechanis­tic interpretability

WriterJun 14, 2024, 4:10 PM
45 points
1 comment11 min readLW link
(youtu.be)

AI gov­er­nance needs a the­ory of victory

Jun 21, 2024, 4:15 PM
45 points
8 commentsLW link
(www.convergenceanalysis.org)

Sci-Fi books micro-reviews

Yair HalberstadtJun 24, 2024, 9:49 AM
44 points
27 comments4 min readLW link

De­bate, Or­a­cles, and Obfus­cated Arguments

Jun 20, 2024, 11:14 PM
44 points
4 comments21 min readLW link

Soviet com­edy film recommendations

Nina PanicksseryJun 9, 2024, 11:40 PM
42 points
11 comments2 min readLW link
(open.substack.com)

D&D.Sci Alchemy: Arch­mage Anachronos and the Sup­ply Chain Issues

aphyerJun 7, 2024, 7:02 PM
42 points
16 comments3 min readLW link

Case stud­ies on so­cial-welfare-based stan­dards in var­i­ous industries

HoldenKarnofskyJun 20, 2024, 1:33 PM
42 points
0 commentsLW link

When fine-tun­ing fails to elicit GPT-3.5′s chess abilities

Theodore ChapmanJun 14, 2024, 6:50 PM
42 points
3 comments9 min readLW link

Jailbreak steer­ing generalization

Jun 20, 2024, 5:25 PM
41 points
4 comments2 min readLW link
(arxiv.org)

Book re­view: The Quincunx

cousin_itJun 5, 2024, 9:13 PM
41 points
12 comments2 min readLW link

Sur­viv­ing Seveneves

Yair HalberstadtJun 19, 2024, 1:11 PM
41 points
4 comments11 min readLW link

Ap­ply­ing Force to the Wrong End of a Causal Chain

silentbobJun 22, 2024, 6:06 PM
41 points
0 comments9 min readLW link

Beyond the Board: Ex­plor­ing AI Ro­bust­ness Through Go

AdamGleaveJun 19, 2024, 4:40 PM
41 points
2 comments1 min readLW link
(far.ai)

Long-Term Fu­ture Fund: May 2023 to March 2024 Pay­out recommendations

LinchJun 12, 2024, 1:46 PM
40 points
0 commentsLW link

Progress Con­fer­ence 2024: Toward Abun­dant Futures

jasoncrawfordJun 26, 2024, 3:39 PM
40 points
2 comments1 min readLW link
(rootsofprogress.org)

The Data Wall is Important

JustisMillsJun 9, 2024, 10:54 PM
40 points
20 comments2 min readLW link
(justismills.substack.com)

Is This Lie De­tec­tor Really Just a Lie De­tec­tor? An In­ves­ti­ga­tion of LLM Probe Speci­fic­ity.

Josh LevyJun 4, 2024, 3:45 PM
39 points
0 comments18 min readLW link

AI #70: A Beau­tiful Sonnet

ZviJun 27, 2024, 2:40 PM
38 points
0 comments44 min readLW link
(thezvi.wordpress.com)

(Ap­pet­i­tive, Con­sum­ma­tory) ≈ (RL, re­flex)

Steven ByrnesJun 15, 2024, 3:57 PM
38 points
1 comment3 min readLW link

On Deep­Mind’s Fron­tier Safety Framework

ZviJun 18, 2024, 1:30 PM
37 points
4 comments8 min readLW link
(thezvi.wordpress.com)

Search­ing for the Root of the Tree of Evil

Ivan VendrovJun 8, 2024, 5:05 PM
36 points
14 comments5 min readLW link
(nothinghuman.substack.com)

Rep­re­sen­ta­tion Tuning

Christopher AckermanJun 27, 2024, 5:44 PM
35 points
9 comments13 min readLW link

Em­piri­cal vs. Math­e­mat­i­cal Joints of Nature

Jun 26, 2024, 1:55 AM
35 points
1 comment5 min readLW link

OpenAI ap­points Re­tired U.S. Army Gen­eral Paul M. Naka­sone to Board of Directors

Joel BurgetJun 13, 2024, 9:28 PM
35 points
10 comments1 min readLW link
(openai.com)

Suffer­ing Is Not Pain

jbkjrJun 18, 2024, 6:04 PM
34 points
45 comments5 min readLW link
(jbkjr.me)

GPT2, Five Years On

Joel BurgetJun 5, 2024, 5:44 PM
34 points
0 comments3 min readLW link
(importai.substack.com)

AXRP Epi­sode 33 - RLHF Prob­lems with Scott Emmons

DanielFilanJun 12, 2024, 3:30 AM
34 points
0 comments56 min readLW link

At­ten­tion Out­put SAEs Im­prove Cir­cuit Analysis

Jun 21, 2024, 12:56 PM
33 points
3 comments19 min readLW link

Book re­view: the Iliad

philhJun 18, 2024, 6:50 PM
31 points
2 comments14 min readLW link
(reasonableapproximation.net)

In­cen­tive Learn­ing vs Dead Sea Salt Experiment

Steven ByrnesJun 25, 2024, 5:49 PM
30 points
1 comment28 min readLW link

5. Open Cor­rigi­bil­ity Questions

Max HarmsJun 10, 2024, 2:09 PM
30 points
0 comments7 min readLW link

“Full Au­toma­tion” is a Slip­pery Metric

ozziegooenJun 11, 2024, 7:56 PM
30 points
1 commentLW link

[Question] What are things you’re al­lowed to do as a startup?

ElizabethJun 20, 2024, 12:01 AM
30 points
9 comments1 min readLW link

A Case for Su­per­hu­man Gover­nance, us­ing AI

ozziegooenJun 7, 2024, 12:10 AM
30 points
0 commentsLW link

DPO/​PPO-RLHF on LLMs in­cen­tivizes syco­phancy, ex­ag­ger­a­tion and de­cep­tive hal­lu­ci­na­tion, but not mis­al­igned powerseeking

tailcalledJun 10, 2024, 9:20 PM
29 points
13 comments2 min readLW link

Ag­grega­tive Prin­ci­ples of So­cial Justice

Cleo NardoJun 5, 2024, 1:44 PM
29 points
10 comments37 min readLW link

Offer­ing Completion

jefftkJun 7, 2024, 1:40 AM
29 points
6 comments1 min readLW link
(www.jefftk.com)

Eva­po­ra­tion of improvements

ViliamJun 20, 2024, 6:34 PM
29 points
27 comments2 min readLW link

Child­hood and Ed­u­ca­tion Roundup #6: Col­lege Edition

ZviJun 26, 2024, 11:40 AM
28 points
8 comments23 min readLW link
(thezvi.wordpress.com)

Ag­grega­tive prin­ci­ples ap­prox­i­mate util­i­tar­ian principles

Cleo NardoJun 12, 2024, 4:27 PM
28 points
3 comments23 min readLW link

Monthly Roundup #19: June 2024

ZviJun 25, 2024, 12:00 PM
28 points
9 comments54 min readLW link
(thezvi.wordpress.com)

Prob­a­bly Not a Ghost Story

George IngebretsenJun 12, 2024, 10:55 PM
27 points
4 comments3 min readLW link

Ap­prais­ing ag­grega­tivism and utilitarianism

Cleo NardoJun 21, 2024, 11:10 PM
27 points
10 comments19 min readLW link

An In­tu­itive Ex­pla­na­tion of Sparse Au­toen­coders for Mechanis­tic In­ter­pretabil­ity of LLMs

Adam KarvonenJun 25, 2024, 3:57 PM
27 points
0 comments9 min readLW link
(adamkarvonen.github.io)

Sticker Short­cut Fal­lacy — The Real Worst Ar­gu­ment in the World

ymeskhoutJun 12, 2024, 2:52 PM
27 points
15 comments4 min readLW link
(www.ymeskhout.com)

my favourite Scott Sum­ner blog posts

DMMFJun 11, 2024, 2:40 PM
26 points
0 comments3 min readLW link
(danfrank.ca)

[Question] Thoughts on Fran­cois Chol­let’s be­lief that LLMs are far away from AGI?

O OJun 14, 2024, 6:32 AM
26 points
17 comments1 min readLW link

Talk: AI safety field­build­ing at MATS

Ryan KiddJun 23, 2024, 11:06 PM
26 points
2 comments10 min readLW link

3b. For­mal (Faux) Corrigibility

Max HarmsJun 9, 2024, 5:18 PM
26 points
13 comments17 min readLW link