Proac­tive ‘If-Then’ Safety Cases

Nathan Helm-BurgerNov 18, 2024, 9:16 PM
10 points
0 comments4 min readLW link

[Question] Will Orion/​Gem­ini 2/​Llama-4 out­perform o1

LuigiPaganiNov 18, 2024, 9:15 PM
2 points
3 comments1 min readLW link

How to use bright light to im­prove your life.

Nat MartinNov 18, 2024, 7:32 PM
40 points
10 comments10 min readLW link

So­cial events with plau­si­ble deniability

ChipmonkNov 18, 2024, 6:25 PM
25 points
24 comments1 min readLW link
(chrislakin.blog)

How likely is brain preser­va­tion to work?

Andy_McKenzieNov 18, 2024, 4:58 PM
26 points
3 comments6 min readLW link

Why im­perfect ad­ver­sar­ial ro­bust­ness doesn’t doom AI control

Nov 18, 2024, 4:05 PM
62 points
25 comments2 min readLW link

Eth­i­cal Im­pli­ca­tions of the Quan­tum Multiverse

Jonah WilbergNov 18, 2024, 4:00 PM
7 points
22 comments6 min readLW link

Re­duc­ing x-risk might be ac­tively harmful

MountainPathNov 18, 2024, 2:25 PM
5 points
5 comments1 min readLW link

Monthly Roundup #24: Novem­ber 2024

ZviNov 18, 2024, 1:20 PM
44 points
14 comments50 min readLW link
(thezvi.wordpress.com)

A Straight­for­ward Ex­pla­na­tion of the Good Reg­u­la­tor Theorem

Alfred HarwoodNov 18, 2024, 12:45 PM
36 points
3 comments14 min readLW link

The Choice Transition

Nov 18, 2024, 12:30 PM
50 points
4 comments15 min readLW link
(strangecities.substack.com)

Chat Bankman-Fried: an Ex­plo­ra­tion of LLM Align­ment in Finance

claudia.biancottiNov 18, 2024, 9:38 AM
26 points
4 comments1 min readLW link

Pro­posal to in­crease fer­til­ity: Univer­sity par­ent clubs

FluffnuttNov 18, 2024, 4:21 AM
17 points
3 comments1 min readLW link

A small im­prove­ment to Wikipe­dia page on Pareto Efficiency

Edwin EvansNov 18, 2024, 2:13 AM
8 points
0 comments1 min readLW link

[Question] Why is Gem­ini tel­ling the user to die?

BurnyNov 18, 2024, 1:44 AM
13 points
1 comment1 min readLW link

“It’s a 10% chance which I did 10 times, so it should be 100%”

egor.timatkovNov 18, 2024, 1:14 AM
154 points
59 comments2 min readLW link

The Catas­tro­phe of Shiny Objects

mindprisonNov 18, 2024, 12:24 AM
−12 points
0 comments3 min readLW link

Do Deep Neu­ral Net­works Have Brain-like Rep­re­sen­ta­tions?: A Sum­mary of Disagreements

Joseph EmersonNov 18, 2024, 12:07 AM
9 points
0 comments26 min readLW link

Truth Ter­mi­nal: A re­con­struc­tion of events

Nov 17, 2024, 11:51 PM
3 points
1 comment7 min readLW link

Which AI Safety Bench­mark Do We Need Most in 2025?

Nov 17, 2024, 11:50 PM
2 points
2 comments8 min readLW link

“The Solomonoff Prior is Mal­ign” is a spe­cial case of a sim­pler argument

David MatolcsiNov 17, 2024, 9:32 PM
130 points
44 comments12 min readLW link

Chess As The Model Game

criticalpointsNov 17, 2024, 7:45 PM
19 points
0 comments8 min readLW link
(eregis.github.io)

The grass is always greener in the en­vi­ron­ment that shaped your values

Karl FaulksNov 17, 2024, 6:00 PM
8 points
0 comments3 min readLW link

An­nounc­ing turn­trout.com, my new digi­tal home

TurnTroutNov 17, 2024, 5:42 PM
108 points
33 comments1 min readLW link
(turntrout.com)

Sec­u­lar Sols­tice Song­book Update

jefftkNov 17, 2024, 5:30 PM
14 points
2 comments1 min readLW link
(www.jefftk.com)

Ger­many-wide ACX Meetup

Fernand0Nov 17, 2024, 10:08 AM
4 points
0 comments1 min readLW link

Pro­ject Ad­e­quate: Seek­ing Cofounders/​Funders

LorecNov 17, 2024, 3:12 AM
5 points
7 comments8 min readLW link

Try­ing Bluesky

jefftkNov 17, 2024, 2:50 AM
26 points
16 comments1 min readLW link
(www.jefftk.com)

AXRP Epi­sode 38.1 - Alan Chan on Agent Infrastructure

DanielFilanNov 16, 2024, 11:30 PM
12 points
0 comments14 min readLW link

Cross-con­text ab­duc­tion: LLMs make in­fer­ences about pro­ce­du­ral train­ing data lev­er­ag­ing declar­a­tive facts in ear­lier train­ing data

Sohaib ImranNov 16, 2024, 11:22 PM
36 points
11 comments14 min readLW link

Why We Wouldn’t Build Aligned AI Even If We Could

SnowyiuNov 16, 2024, 8:19 PM
10 points
7 comments10 min readLW link

Gwerns

Tomás B.Nov 16, 2024, 2:31 PM
24 points
2 comments1 min readLW link

Which evals re­sources would be good?

Marius HobbhahnNov 16, 2024, 2:24 PM
51 points
4 comments5 min readLW link

OpenAI Email Archives (from Musk v. Alt­man and OpenAI blog)

habrykaNov 16, 2024, 6:38 AM
531 points
80 comments51 min readLW link

Us­ing Danger­ous AI, But Safely?

habrykaNov 16, 2024, 4:29 AM
17 points
2 comments43 min readLW link

Ayn Rand’s model of “liv­ing money”; and an up­side of burnout

AnnaSalamonNov 16, 2024, 2:59 AM
228 points
58 comments5 min readLW link

Fun­da­men­tal Uncer­tainty: Epilogue

Gordon Seidoh WorleyNov 16, 2024, 12:57 AM
10 points
0 comments1 min readLW link

Mak­ing a con­ser­va­tive case for alignment

Nov 15, 2024, 6:55 PM
208 points
67 comments7 min readLW link

The Case For Giv­ing To The Shrimp Welfare Project

omnizoidNov 15, 2024, 4:03 PM
−5 points
14 comments7 min readLW link

Win/​con­tinue/​lose sce­nar­ios and ex­e­cute/​re­place/​au­dit protocols

BuckNov 15, 2024, 3:47 PM
64 points
2 comments7 min readLW link

Antonym Heads Pre­dict Se­man­tic Op­po­sites in Lan­guage Models

Jake WardNov 15, 2024, 3:32 PM
3 points
0 comments5 min readLW link

Propos­ing the Con­di­tional AI Safety Treaty (linkpost TIME)

otto.bartenNov 15, 2024, 1:59 PM
10 points
8 comments3 min readLW link
(time.com)

A The­ory of Equil­ibrium in the Offense-Defense Balance

Maxwell TabarrokNov 15, 2024, 1:51 PM
25 points
6 comments2 min readLW link
(www.maximum-progress.com)

Bos­ton Sec­u­lar Sols­tice 2024: Call for Singers and Musicans

jefftkNov 15, 2024, 1:50 PM
22 points
0 comments1 min readLW link
(www.jefftk.com)

An Un­canny Moat

Adam NewgasNov 15, 2024, 11:39 AM
8 points
0 comments4 min readLW link
(www.boristhebrave.com)

If I care about mea­sure, choices have ad­di­tional bur­den (+AI gen­er­ated LW-com­ments)

avturchinNov 15, 2024, 10:27 AM
5 points
11 comments2 min readLW link

What are Emo­tions?

Myles HNov 15, 2024, 4:20 AM
4 points
13 comments8 min readLW link

The Third Fun­da­men­tal Question

ScrewtapeNov 15, 2024, 4:01 AM
66 points
7 comments6 min readLW link

Dance Differentiation

jefftkNov 15, 2024, 2:30 AM
14 points
0 comments1 min readLW link
(www.jefftk.com)

Break­ing be­liefs about sav­ing the world

OxidizeNov 15, 2024, 12:46 AM
−1 points
3 comments9 min readLW link