In­ter­pret­ing Prefer­ence Models w/​ Sparse Autoencoders

Jul 1, 2024, 9:35 PM
74 points
12 comments9 min readLW link

Hon­est sci­ence is spirituality

pchvykovJul 1, 2024, 8:33 PM
−1 points
10 comments4 min readLW link

New Ex­ec­u­tive Team & Board — PIBBSS

Nora_AmmannJul 1, 2024, 7:30 PM
43 points
1 comment1 min readLW link

Un­curs­ing Civilization

LorecJul 1, 2024, 6:44 PM
−5 points
2 comments5 min readLW link

[Question] Self-cen­sor­ing on AI x-risk dis­cus­sions?

DecaeneusJul 1, 2024, 6:24 PM
17 points
2 comments1 min readLW link

Ra­tion­al­ists As Peo­ple Who Build Piles Of Rocks

SableJul 1, 2024, 10:32 AM
9 points
0 comments5 min readLW link
(affablyevil.substack.com)

How good are LLMs at do­ing ML on an un­known dataset?

Håvard Tveit IhleJul 1, 2024, 9:04 AM
33 points
4 comments13 min readLW link

Whirlwind Tour of Chain of Thought Liter­a­ture Rele­vant to Au­tomat­ing Align­ment Re­search.

sevdeawesomeJul 1, 2024, 5:50 AM
25 points
0 comments17 min readLW link

Prob­a­bil­is­tic Logic ⇔ Or­a­cles?

Yudhister KumarJul 1, 2024, 5:36 AM
15 points
0 comments4 min readLW link

Im­por­tant open prob­lems in voting

Closed Limelike CurvesJul 1, 2024, 2:53 AM
33 points
1 comment1 min readLW link

Anti-Cir­cum­ci­sion Es­say 3 of 3: Now That I Think About It, Is There Ac­tu­ally a Space Between “Info” and “Hazard”? Isn’t It Just One Word?

Harry StevenageJul 1, 2024, 2:21 AM
12 points
0 comments7 min readLW link

In Defense of Lawyers Play­ing Their Part

Isaac KingJul 1, 2024, 1:32 AM
32 points
9 comments9 min readLW link

Anti-cir­cum­ci­sion Es­say 2 of 3: Phys­i­cal and Psy­cholog­i­cal Realities

Harry StevenageJun 30, 2024, 10:13 PM
12 points
5 comments9 min readLW link

Re­view of METR’s pub­lic eval­u­a­tion protocol

Jun 30, 2024, 10:03 PM
10 points
0 comments5 min readLW link

Su­per­po­si­tion, Self-Model­ing, and the Path to AGI: A New Perspective

PeterpiperJun 30, 2024, 5:20 PM
−13 points
0 comments2 min readLW link

Anti-Cir­cum­ci­sion Es­say 1 of 3: Ac­cord­ing To Their Crit­ics, In­tac­tivists Are The Best-Be­haved Protest Move­ment In His­tory

Harry StevenageJun 30, 2024, 5:17 PM
12 points
6 comments5 min readLW link

The Xerox Parc/​ARPA ver­sion of the in­tel­lec­tual Tur­ing test: Class 1 vs Class 2 disagreement

hamishtodd1Jun 30, 2024, 3:34 PM
6 points
3 comments1 min readLW link

LLMs Univer­sally Learn a Fea­ture Rep­re­sent­ing To­ken Fre­quency /​ Rarity

Sean OsierJun 30, 2024, 2:48 AM
12 points
5 comments6 min readLW link
(github.com)

My 5-step pro­gram for los­ing weight

nsokolskyJun 30, 2024, 1:05 AM
22 points
20 comments5 min readLW link
(nsokolsky.substack.com)

Datasets that change the odds you exist

dynomightJun 29, 2024, 6:45 PM
56 points
4 comments6 min readLW link
(dynomight.net)

A “Scal­ing Monose­man­tic­ity” Explainer

Jun 29, 2024, 5:50 PM
10 points
0 comments3 min readLW link

Anal­y­sis of key AI analogies

Kevin KohlerJun 29, 2024, 10:55 AM
10 points
2 comments15 min readLW link

Ge­or­gism Crash Course

Zero ContradictionsJun 29, 2024, 6:18 AM
9 points
5 comments1 min readLW link
(zerocontradictions.net)

Ac­ti­va­tion Pat­tern SVD: A pro­posal for SAE Interpretability

Daniel TanJun 28, 2024, 10:12 PM
15 points
2 comments2 min readLW link

Pod­cast: Eliz­a­beth & Austin on “What Man­i­fold was al­lowed to do”

Austin ChenJun 28, 2024, 10:10 PM
20 points
0 commentsLW link
(share.descript.com)

The In­cred­ible Fen­tanyl-De­tect­ing Machine

sarahconstantinJun 28, 2024, 10:10 PM
156 points
26 comments7 min readLW link
(sarahconstantin.substack.com)

Sav­ing Lives Re­duces Over-Pop­u­la­tion—A Counter-In­tu­itive Non-Zero-Sum Game

James Stephen BrownJun 28, 2024, 7:29 PM
6 points
0 comments5 min readLW link
(nonzerosum.games)

Men­tor­ship in AGI Safety: Ap­pli­ca­tions for men­tor­ship are open!

Jun 28, 2024, 2:49 PM
5 points
0 comments1 min readLW link

Con­tra Ace­moglu on AI

Maxwell TabarrokJun 28, 2024, 1:13 PM
48 points
0 comments5 min readLW link
(www.maximum-progress.com)

Five toy wor­lds to think about her­i­ta­bil­ity

David Hugh-JonesJun 28, 2024, 1:11 PM
13 points
0 comments9 min readLW link
(wyclif.substack.com)

[Question] How do nat­u­ral sci­ences prove cau­sa­tion?

Kongo LandwalkerJun 28, 2024, 11:58 AM
1 point
3 comments1 min readLW link

LessWrong/​ACX meetup Tran­sil­vanya tour—Sibiu

Marius Adrian NicoarăJun 28, 2024, 11:41 AM
1 point
1 comment1 min readLW link

Bayes’ The­o­rem: In Search of Gold (Les­son 1)

bayesyatinaJun 28, 2024, 8:39 AM
3 points
0 comments3 min readLW link

How a chip is designed

YMJun 28, 2024, 8:04 AM
65 points
4 comments5 min readLW link

The Wis­dom of Liv­ing for 200 Years

Martin SustrikJun 28, 2024, 4:44 AM
25 points
3 comments4 min readLW link

A Gen­er­ally In­tel­li­gent Game

snerxJun 28, 2024, 1:31 AM
−1 points
1 comment4 min readLW link

Cor­rigi­bil­ity = Tool-ness?

Jun 28, 2024, 1:19 AM
78 points
8 comments9 min readLW link

Si­tu­a­tional Awareness

PeterMcCluskeyJun 28, 2024, 1:08 AM
11 points
0 comments12 min readLW link
(bayesianinvestor.com)

Toward a tax­on­omy of cog­ni­tive bench­marks for agen­tic AGIs

Ben SmithJun 27, 2024, 11:50 PM
15 points
0 comments5 min readLW link

How Big a Deal are MatMul-Free Trans­form­ers?

JustisMillsJun 27, 2024, 10:28 PM
19 points
6 comments5 min readLW link
(justismills.substack.com)

Se­condary forces of debt

KatjaGraceJun 27, 2024, 9:10 PM
81 points
18 comments2 min readLW link
(worldspiritsockpuppet.com)

Distil­la­tion of ‘Do lan­guage mod­els plan for fu­ture to­kens’

TheManxLoinerJun 27, 2024, 8:57 PM
26 points
2 comments6 min readLW link

how birds sense mag­netic fields

bhauthJun 27, 2024, 6:59 PM
51 points
4 comments5 min readLW link
(www.bhauth.com)

Rep­re­sen­ta­tion Tuning

Christopher AckermanJun 27, 2024, 5:44 PM
35 points
9 comments13 min readLW link

An is­sue with train­ing schemers with su­per­vised fine-tuning

Fabien RogerJun 27, 2024, 3:37 PM
49 points
12 comments6 min readLW link

AI #70: A Beau­tiful Sonnet

ZviJun 27, 2024, 2:40 PM
38 points
0 comments44 min readLW link
(thezvi.wordpress.com)

De­tect­ing Ge­net­i­cally Eng­ineered Viruses With Me­tage­nomic Sequencing

jefftkJun 27, 2024, 2:01 PM
87 points
10 commentsLW link
(naobservatory.org)

Cross Robin

jefftkJun 27, 2024, 3:10 AM
11 points
2 comments1 min readLW link
(www.jefftk.com)

Live The­ory Part 0: Tak­ing In­tel­li­gence Seriously

SahilJun 26, 2024, 9:37 PM
101 points
3 comments8 min readLW link

In­stru­men­tal vs Ter­mi­nal Desiderata

Max HarmsJun 26, 2024, 8:57 PM
21 points
0 comments3 min readLW link