[Question] Wouldn’t weak AI agents provide warn­ing?

Mandatory TopicApr 26, 2024, 7:34 PM
5 points
0 comments1 min readLW link

World models

A*Apr 26, 2024, 7:11 PM
1 point
0 comments1 min readLW link

Duct Tape security

Isaac KingApr 26, 2024, 6:57 PM
69 points
11 comments5 min readLW link

Fun­da­men­tal Uncer­tainty: Chap­ter 8 - When does fun­da­men­tal un­cer­tainty mat­ter?

Gordon Seidoh WorleyApr 26, 2024, 6:10 PM
11 points
2 comments32 min readLW link

Scal­ing of AI train­ing runs will slow down af­ter GPT-5

Maxime RichéApr 26, 2024, 4:05 PM
40 points
5 comments3 min readLW link

Spa­tial at­ten­tion as a “tell” for em­pa­thetic simu­la­tion?

Steven ByrnesApr 26, 2024, 3:10 PM
55 points
12 comments8 min readLW link

Arch-anarchy

Peter lawless Apr 26, 2024, 3:05 PM
−1 points
1 comment25 min readLW link

Bread­board­ing a Whis­tle Synth

jefftkApr 26, 2024, 3:00 PM
9 points
2 comments2 min readLW link
(www.jefftk.com)

An In­tro­duc­tion to AI Sandbagging

Apr 26, 2024, 1:40 PM
46 points
13 comments8 min readLW link

LLMs seem (rel­a­tively) safe

JustisMillsApr 25, 2024, 10:13 PM
53 points
24 comments7 min readLW link
(justismills.substack.com)

Los­ing Faith In Con­trar­i­anism

Bentham's BulldogApr 25, 2024, 8:53 PM
39 points
44 comments5 min readLW link

Why I stopped be­ing into basin broadness

tailcalledApr 25, 2024, 8:47 PM
16 points
3 comments2 min readLW link

AXRP Epi­sode 29 - Science of Deep Learn­ing with Vikrant Varma

DanielFilanApr 25, 2024, 7:10 PM
20 points
1 comment63 min readLW link

Im­prov­ing Dic­tionary Learn­ing with Gated Sparse Autoencoders

Apr 25, 2024, 6:43 PM
63 points
38 comments1 min readLW link
(arxiv.org)

“Why I Write” by Ge­orge Or­well (1946)

Arjun PanicksseryApr 25, 2024, 4:02 PM
59 points
2 comments9 min readLW link
(www.orwellfoundation.com)

Knowl­edge Base 8: The truth as an at­trac­tor in the in­for­ma­tion space

iwisApr 25, 2024, 3:28 PM
−8 points
0 comments2 min readLW link

Cy­ber­se­cu­rity of Fron­tier AI Models: A Reg­u­la­tory Review

Apr 25, 2024, 2:51 PM
8 points
0 comments8 min readLW link

The first fu­ture and the best future

KatjaGraceApr 25, 2024, 6:40 AM
106 points
12 comments1 min readLW link
(worldspiritsockpuppet.com)

NIH Cancer Myths Myths

belkarxApr 25, 2024, 5:43 AM
15 points
1 comment2 min readLW link

so­cial lemon markets

bhauthApr 25, 2024, 2:18 AM
22 points
6 comments3 min readLW link
(www.bhauth.com)

Bayesian in­fer­ence with­out priors

DanielFilanApr 24, 2024, 11:50 PM
26 points
8 comments8 min readLW link
(danielfilan.com)

The In­ner Ring by C. S. Lewis

Saul MunnApr 24, 2024, 10:48 PM
69 points
6 comments13 min readLW link
(www.lewissociety.org)

This is Water by David Foster Wallace

Nathan YoungApr 24, 2024, 9:21 PM
60 points
16 comments13 min readLW link
(fs.blog)

Be­ta­dine oral rinses for covid and other viral infections

ElizabethApr 24, 2024, 5:50 PM
22 points
3 comments5 min readLW link
(acesounderglass.com)

At last! ChatGPT does, shall we say, in­ter­est­ing imi­ta­tions of “Kubla Khan”

Bill BenzonApr 24, 2024, 2:56 PM
−3 points
0 comments4 min readLW link

Magic by forgetting

avturchinApr 24, 2024, 2:32 PM
18 points
39 comments4 min readLW link

Changes in Col­lege Admissions

ZviApr 24, 2024, 1:50 PM
50 points
11 comments39 min readLW link
(thezvi.wordpress.com)

1-page out­line of Car­l­smith’s oth­er­ness and con­trol series

Nathan YoungApr 24, 2024, 11:25 AM
22 points
3 comments3 min readLW link

How to use and in­ter­pret ac­ti­va­tion patching

Apr 24, 2024, 8:35 AM
13 points
6 comments18 min readLW link

AI Gen­er­ated Mu­sic as a Method of In­stal­ling Essen­tial Ra­tion­al­ist Skills

keltanApr 24, 2024, 7:48 AM
18 points
4 comments1 min readLW link

Elec­tronic Harp Man­dolin Prototype

jefftkApr 24, 2024, 2:20 AM
9 points
0 comments1 min readLW link
(www.jefftk.com)

[Question] Ex­am­ples of Highly Coun­ter­fac­tual Dis­cov­er­ies?

johnswentworthApr 23, 2024, 10:19 PM
197 points
108 comments1 min readLW link

[Question] Is there soft­ware to prac­tice read­ing ex­pres­sions?

lsusrApr 23, 2024, 9:53 PM
37 points
11 comments1 min readLW link

Let’s De­sign A School, Part 1

SableApr 23, 2024, 9:50 PM
56 points
5 comments11 min readLW link
(affablyevil.substack.com)

WSJ: In­side Ama­zon’s Se­cret Oper­a­tion to Gather In­tel on Rivals

trevorApr 23, 2024, 9:33 PM
37 points
5 comments5 min readLW link
(www.wsj.com)

On Minicircle

MetacelsusApr 23, 2024, 9:28 PM
10 points
0 comments1 min readLW link
(docs.google.com)

Sim­ple probes can catch sleeper agents

Apr 23, 2024, 9:10 PM
133 points
21 comments1 min readLW link
(www.anthropic.com)

Man­i­fold “ex­plor­ing real cash prizes”

Rana DexsinApr 23, 2024, 9:07 PM
7 points
0 comments1 min readLW link
(manifoldmarkets.notion.site)

[Question] (When) Should you work through the night when in­spira­tion strikes you?

Chi NguyenApr 23, 2024, 9:07 PM
21 points
4 comments1 min readLW link

Book re­view: Deep Utopia

PeterMcCluskeyApr 23, 2024, 7:55 PM
45 points
14 comments4 min readLW link
(bayesianinvestor.com)

On what re­search poli­cy­mak­ers ac­tu­ally need

MondSemmelApr 23, 2024, 7:50 PM
38 points
0 comments3 min readLW link
(www.slowboring.com)

De­quan­tify­ing first-or­der theories

jessicataApr 23, 2024, 7:04 PM
40 points
9 comments8 min readLW link
(unstableontology.com)

Vec­tor Plan­ning in a Lat­tice Graph

Apr 23, 2024, 4:58 PM
20 points
7 comments2 min readLW link

ProLU: A Non­lin­ear­ity for Sparse Autoencoders

Glen TaggartApr 23, 2024, 2:09 PM
44 points
4 comments9 min readLW link

Sub­jec­tive Ques­tions Re­quire Sub­jec­tive information

BenApr 23, 2024, 1:16 PM
7 points
4 comments4 min readLW link

Re­ject­ing Television

Declan MolonyApr 23, 2024, 4:59 AM
90 points
10 comments6 min readLW link

LW Front­page Ex­per­i­ments! (aka “Take the wheel, Shog­goth!”)

Apr 23, 2024, 3:58 AM
71 points
27 comments5 min readLW link

Thoughts on Zero Points

depressurizeApr 23, 2024, 2:22 AM
31 points
1 comment4 min readLW link
(sexandchicago.substack.com)

Funny Anec­dote of Eliezer From His Sister

Noah BirnbaumApr 22, 2024, 10:05 PM
207 points
6 comments2 min readLW link

How LLMs Work, in the Style of The Economist

utilistrutilApr 22, 2024, 7:06 PM
0 points
0 comments2 min readLW link