Monose­man­tic­ity & Quantization

Rahul ChandOct 22, 2024, 10:57 PM
1 point
0 comments9 min readLW link

[Question] What is the alpha in one bit of ev­i­dence?

J BostockOct 22, 2024, 9:57 PM
20 points
13 comments1 min readLW link

Catas­trophic sab­o­tage as a ma­jor threat model for hu­man-level AI systems

evhubOct 22, 2024, 8:57 PM
92 points
13 comments15 min readLW link

Why I quit effec­tive al­tru­ism, and why Ti­mothy Tel­leen-Law­ton is stay­ing (for now)

ElizabethOct 22, 2024, 6:20 PM
76 points
82 comments1 min readLW link
(acesounderglass.com)

De­ci­sion-Mak­ing Un­der Uncer­tainty: Les­sons From AI

JonasbOct 22, 2024, 5:54 PM
−1 points
0 comments5 min readLW link
(www.denominations.io)

Test­ing Ge­netic Eng­ineer­ing De­tec­tion with Spike-Ins

jefftkOct 22, 2024, 5:20 PM
9 points
0 commentsLW link
(naobservatory.org)

Pre­dic­tions as Public Works Pro­ject — What Me­tac­u­lus Is Build­ing Next

ChristianWilliamsOct 22, 2024, 4:35 PM
5 points
0 commentsLW link
(www.metaculus.com)

Gorges of gen­der on a ter­rain of traits

dkl9Oct 22, 2024, 4:18 PM
−7 points
1 comment3 min readLW link
(dkl9.net)

A Defense of Peer Review

Oct 22, 2024, 4:16 PM
23 points
1 comment22 min readLW link
(www.asimov.press)

BIG-Bench Ca­nary Con­tam­i­na­tion in GPT-4

JozdienOct 22, 2024, 3:40 PM
125 points
14 comments4 min readLW link

[Paper Blog­post] When Your AIs De­ceive You: Challenges with Par­tial Ob­serv­abil­ity in RLHF

Leon LangOct 22, 2024, 1:57 PM
51 points
2 comments18 min readLW link
(arxiv.org)

[In­tu­itive self-mod­els] 6. Awak­en­ing /​ En­light­en­ment /​ PNSE

Steven ByrnesOct 22, 2024, 1:23 PM
64 points
8 comments21 min readLW link

Re­solv­ing von Neu­mann-Mor­gen­stern In­con­sis­tent Preferences

niplavOct 22, 2024, 11:45 AM
38 points
5 comments58 min readLW link

Lenses of Control

WillPetilloOct 22, 2024, 7:51 AM
14 points
0 comments9 min readLW link

A Brief Ex­pla­na­tion of AI Control

Aaron_ScherOct 22, 2024, 7:00 AM
8 points
1 comment6 min readLW link

Longevity, AI, and Cog­ni­tive Re­search Hackathon @ MIT

ekkoláptoOct 22, 2024, 6:19 AM
1 point
0 comments1 min readLW link

Con­ver­sa­tional Sign­posts—How to stop hav­ing bor­ing so­cial interactions

Declan MolonyOct 22, 2024, 5:37 AM
11 points
6 comments2 min readLW link

I got dysen­tery so you don’t have to

eukaryoteOct 22, 2024, 4:55 AM
321 points
6 comments17 min readLW link
(eukaryotewritesblog.com)

Trans­form­ers Ex­plained (Again)

RohanSOct 22, 2024, 4:06 AM
4 points
0 comments18 min readLW link

Sleep­ing on Stage

jefftkOct 22, 2024, 12:50 AM
26 points
3 comments1 min readLW link
(www.jefftk.com)

The Mask Comes Off: At What Price?

ZviOct 21, 2024, 11:50 PM
72 points
16 comments8 min readLW link
(thezvi.wordpress.com)

Dist­in­guish­ing ways AI can be “con­cen­trated”

Matthew BarnettOct 21, 2024, 10:21 PM
28 points
2 commentsLW link

Jailbreak­ing ChatGPT and Claude us­ing Web API Con­text Injection

Jaehyuk LimOct 21, 2024, 9:34 PM
4 points
0 comments3 min readLW link

How to Teach Your Brain to Hate Procrastination

10xyzOct 21, 2024, 8:12 PM
3 points
0 comments2 min readLW link

Paus­ing for what?

MountainPathOct 21, 2024, 8:12 PM
0 points
1 comment1 min readLW link

What is au­ton­omy? Why bound­aries are nec­es­sary.

ChipmonkOct 21, 2024, 5:56 PM
8 points
1 comment1 min readLW link
(chrislakin.blog)

Could ran­domly choos­ing peo­ple to serve as rep­re­sen­ta­tives lead to bet­ter gov­ern­ment?

John HuangOct 21, 2024, 5:10 PM
75 points
13 comments10 min readLW link

There aren’t enough smart peo­ple in biol­ogy do­ing some­thing boring

Abhishaike MahajanOct 21, 2024, 3:52 PM
27 points
13 comments10 min readLW link

Au­toma­tion collapse

Oct 21, 2024, 2:50 PM
72 points
9 comments7 min readLW link

What AI com­pa­nies should do: Some rough ideas

Zach Stein-PerlmanOct 21, 2024, 2:00 PM
33 points
10 comments5 min readLW link

[Question] What should OpenAI do that it hasn’t already done, to stop their va­can­cies from be­ing ad­ver­tised on the 80k Job Board?

WitheringWeightsOct 21, 2024, 1:57 PM
22 points
0 comments1 min readLW link

A Rocket–In­ter­pretabil­ity Analogy

plexOct 21, 2024, 1:55 PM
155 points
31 comments1 min readLW link

Tokyo AI Safety 2025: Call For Papers

BlaineOct 21, 2024, 8:43 AM
24 points
0 comments3 min readLW link
(www.tais2025.cc)

OpenAI defected, but we can take hon­est actions

RemmeltOct 21, 2024, 8:41 AM
17 points
16 commentsLW link

Slightly More Than You Wanted To Know: Preg­nancy Length Effects

JustisMillsOct 21, 2024, 1:26 AM
63 points
4 comments5 min readLW link
(justismills.substack.com)

In­for­ma­tion vs Assurance

johnswentworthOct 20, 2024, 11:16 PM
187 points
17 comments2 min readLW link

Liquid vs Illiquid Ca­reers

vaishnav92Oct 20, 2024, 11:03 PM
35 points
7 comments7 min readLW link
(vaishnavsunil.substack.com)

AI Can be “Gra­di­ent Aware” Without Do­ing Gra­di­ent hack­ing.

SodiumOct 20, 2024, 9:02 PM
20 points
0 comments2 min readLW link

A brief the­ory of why we think things are good or bad

David JohnstonOct 20, 2024, 8:31 PM
7 points
10 commentsLW link

Think­ing in 2D

sarahconstantinOct 20, 2024, 7:30 PM
27 points
0 comments8 min readLW link
(sarahconstantin.substack.com)

Pod­cast dis­cussing Han­son’s Cul­tural Drift Argument

Oct 20, 2024, 5:58 PM
3 points
0 comments1 min readLW link
(moralmayhem.substack.com)

Ad­vice on Com­mu­ni­cat­ing Concisely

EvolutionByDesignOct 20, 2024, 4:45 PM
3 points
9 comments1 min readLW link

Am­bi­gui­ties or the is­sues we face with AI in medicine

Thehumanproject.aiOct 20, 2024, 4:45 PM
2 points
0 comments5 min readLW link

The Per­sonal Im­pli­ca­tions of AGI Realism

xizneb20 Oct 2024 16:43 UTC
7 points
8 comments5 min readLW link

Safety tax functions

owencb20 Oct 2024 14:08 UTC
31 points
0 comments6 min readLW link
(strangecities.substack.com)

Ex­plor­ing the Pla­tonic Rep­re­sen­ta­tion Hy­poth­e­sis Beyond In-Distri­bu­tion Data

rokosbasilisk20 Oct 2024 8:40 UTC
12 points
2 comments1 min readLW link

Elec­toral Systems

RedFishBlueFish20 Oct 2024 3:25 UTC
1 point
0 comments14 min readLW link

Over­com­ing Bias Anthology

Arjun Panickssery20 Oct 2024 2:01 UTC
169 points
14 comments2 min readLW link
(overcoming-bias-anthology.com)

D/​acc AI Se­cu­rity Salon

Allison Duettmann19 Oct 2024 22:17 UTC
19 points
0 comments1 min readLW link

Who Should Have Been Killed, and Con­tains Neato? Who Else Could It Be, but that Villain Mag­neto!

Ace Delgado19 Oct 2024 20:39 UTC
−16 points
0 comments1 min readLW link