HDBSCAN is Sur­pris­ingly Effec­tive at Find­ing In­ter­pretable Clusters of the SAE De­coder Matrix

Oct 11, 2024, 11:06 PM
8 points
2 comments10 min readLW link

Chang­ing the Mind of an LLM

testingthewatersOct 11, 2024, 10:25 PM
2 points
0 comments5 min readLW link

EIS XIV: Is mechanis­tic in­ter­pretabil­ity about to be prac­ti­cally use­ful?

scasperOct 11, 2024, 10:13 PM
68 points
4 comments7 min readLW link

Dario Amodei — Machines of Lov­ing Grace

Matrice JacobineOct 11, 2024, 9:43 PM
63 points
26 comments1 min readLW link
(darioamodei.com)

“Deep Galac­tic Chillout” a space to re­lax dur­ing SF tech week & meet whole­some, fun people

Jared M.Oct 11, 2024, 7:50 PM
1 point
0 comments1 min readLW link

Open let­ter to young EAs

Leif WenarOct 11, 2024, 7:49 PM
9 points
10 comments1 min readLW link

The Great Bootstrap

KristianRonnOct 11, 2024, 7:46 PM
12 points
0 comments15 min readLW link

Em­brac­ing com­plex­ity when de­vel­op­ing and eval­u­at­ing AI re­spon­si­bly

Aliya AmirovaOct 11, 2024, 5:46 PM
2 points
9 comments9 min readLW link

How much I’m pay­ing for AI pro­duc­tivity soft­ware (and the fu­ture of AI use)

jacquesthibsOct 11, 2024, 5:11 PM
59 points
18 comments8 min readLW link
(jacquesthibodeau.com)

AI: The Philoso­pher’s Stone of the 21st Century

HNXOct 11, 2024, 4:55 PM
0 points
2 comments29 min readLW link

[Question] Who cre­ated the Less Wrong Gather Town?

ArepoOct 11, 2024, 8:53 AM
2 points
1 comment1 min readLW link

A Heuris­tic Proof of Prac­ti­cal Aligned Superintelligence

RokoOct 11, 2024, 5:05 AM
7 points
6 comments1 min readLW link
(transhumanaxiology.substack.com)

An AI crash is our best bet for re­strict­ing AI

RemmeltOct 11, 2024, 2:12 AM
26 points
3 commentsLW link

A Triple Decker for Elfland

jefftkOct 11, 2024, 1:50 AM
25 points
0 comments1 min readLW link
(www.jefftk.com)

OODA your OODA Loop

RaemonOct 11, 2024, 12:50 AM
38 points
3 comments3 min readLW link

Scal­ing pre­dic­tion mar­kets with meta-markets

DentosalOct 10, 2024, 9:17 PM
1 point
0 comments2 min readLW link

Startup Suc­cess Rates Are So Low Be­cause the Re­wards Are So Large

AppliedDivinityStudiesOct 10, 2024, 8:22 PM
42 points
6 comments2 min readLW link

Can AI Out­pre­dict Hu­mans? Re­sults From Me­tac­u­lus’s Q3 AI Fore­cast­ing Benchmark

ChristianWilliamsOct 10, 2024, 6:58 PM
53 points
2 commentsLW link
(www.metaculus.com)

Ra­tion­al­ity Quotes—Fall 2024

ScrewtapeOct 10, 2024, 6:37 PM
79 points
27 comments1 min readLW link

[Question] why won’t this al­ign­ment plan work?

KvmanThinkingOct 10, 2024, 3:44 PM
8 points
7 comments1 min readLW link

AI #85: AI Wins the No­bel Prize

ZviOct 10, 2024, 1:40 PM
30 points
6 comments31 min readLW link
(thezvi.wordpress.com)

Be­hav­ioral red-team­ing is un­likely to pro­duce clear, strong ev­i­dence that mod­els aren’t scheming

BuckOct 10, 2024, 1:36 PM
100 points
4 comments13 min readLW link

Joshua Achiam Public State­ment Analysis

ZviOct 10, 2024, 12:50 PM
73 points
14 comments21 min readLW link
(thezvi.wordpress.com)

Do you want to do a de­bate on youtube? I’m look­ing for po­lite, truth-seek­ing par­ti­ci­pants.

Nathan YoungOct 10, 2024, 9:32 AM
12 points
0 comments1 min readLW link

Ra­tion­al­ist Gnosticism

tailcalledOct 10, 2024, 9:06 AM
11 points
10 comments3 min readLW link

Values Are Real Like Harry Potter

Oct 9, 2024, 11:42 PM
86 points
21 comments5 min readLW link

Mo­men­tum of Light in Glass

BenOct 9, 2024, 8:19 PM
143 points
44 comments11 min readLW link

vgillioz’s Shortform

vgilliozOct 9, 2024, 7:31 PM
1 point
2 comments1 min readLW link

Tri­an­gu­lat­ing My In­ter­pre­ta­tion of Meth­ods: Black Boxes by Marco J. Nathan

adamShimiOct 9, 2024, 7:13 PM
8 points
0 comments6 min readLW link
(formethods.substack.com)

Scaf­fold­ing for “Notic­ing Me­tacog­ni­tion”

RaemonOct 9, 2024, 5:54 PM
88 points
4 comments17 min readLW link

Safe Pre­dic­tive Agents with Joint Scor­ing Rules

Rubi J. HudsonOct 9, 2024, 4:38 PM
55 points
10 comments17 min readLW link

Demis Hass­abis and Ge­offrey Hin­ton Awarded No­bel Prizes

Anna GajdovaOct 9, 2024, 12:56 PM
48 points
14 comments1 min readLW link

Hu­mans are (mostly) metarational

Yair HalberstadtOct 9, 2024, 5:51 AM
14 points
6 comments3 min readLW link

[Job Ad] MATS is hiring!

Oct 9, 2024, 2:17 AM
10 points
0 comments5 min readLW link

Pal­isade is hiring: Exec As­sis­tant, Con­tent Lead, Ops Lead, and Policy Lead

Charlie Rogers-SmithOct 9, 2024, 12:04 AM
11 points
0 comments4 min readLW link

AGI & Con­scious­ness—Joscha Bach

Rahul ChandOct 8, 2024, 10:51 PM
1 point
1 comment10 min readLW link

Video and tran­script of pre­sen­ta­tion on Oth­er­ness and con­trol in the age of AGI

Joe CarlsmithOct 8, 2024, 10:30 PM
35 points
1 comment27 min readLW link

From seeded com­plex­ity to con­scious­ness—yes, it’s all the same.

eschatailOct 8, 2024, 9:31 PM
−23 points
0 comments2 min readLW link

Limits of safe and al­igned AI

ShivamOct 8, 2024, 9:30 PM
2 points
0 comments4 min readLW link

[Question] What con­sti­tutes an in­fo­haz­ard?

K1r4d4rk.v1Oct 8, 2024, 9:29 PM
−4 points
8 comments1 min readLW link

[Question] What makes one a “ra­tio­nal­ist”?

mathyoufOct 8, 2024, 8:25 PM
7 points
5 comments3 min readLW link

[In­tu­itive self-mod­els] 4. Trance

Steven ByrnesOct 8, 2024, 1:30 PM
82 points
7 comments24 min readLW link

Schel­ling game eval­u­a­tions for AI control

Olli JärviniemiOct 8, 2024, 12:01 PM
71 points
5 comments11 min readLW link

Think­ing About a Pedalboard

jefftkOct 8, 2024, 11:50 AM
9 points
2 comments1 min readLW link
(www.jefftk.com)

Overview of strong hu­man in­tel­li­gence am­plifi­ca­tion methods

TsviBTOct 8, 2024, 8:37 AM
280 points
144 comments10 min readLW link

Near-death experiences

Declan MolonyOct 8, 2024, 6:34 AM
3 points
1 comment2 min readLW link

The un­rea­son­able effec­tive­ness of plas­mid se­quenc­ing as a service

Abhishaike MahajanOct 8, 2024, 2:02 AM
23 points
2 comments13 min readLW link
(www.owlposting.com)

There is a globe in your LLM

jacob_droriOct 8, 2024, 12:43 AM
89 points
4 comments1 min readLW link

MATS AI Safety Strat­egy Cur­ricu­lum v2

Oct 7, 2024, 10:44 PM
43 points
6 comments13 min readLW link

2025 Color Trends

sarahconstantinOct 7, 2024, 9:20 PM
40 points
7 comments6 min readLW link
(sarahconstantin.substack.com)