Ex­tinc­tion-level Good­hart’s Law as a Prop­erty of the Environment

Feb 21, 2024, 5:56 PM
23 points
0 comments10 min readLW link

Dy­nam­ics Cru­cial to AI Risk Seem to Make for Com­pli­cated Models

Feb 21, 2024, 5:54 PM
19 points
0 comments9 min readLW link

Which Model Prop­er­ties are Ne­c­es­sary for Eval­u­at­ing an Ar­gu­ment?

Feb 21, 2024, 5:52 PM
18 points
2 comments7 min readLW link

Weak vs Quan­ti­ta­tive Ex­tinc­tion-level Good­hart’s Law

Feb 21, 2024, 5:38 PM
27 points
1 comment2 min readLW link

Dual Wield­ing Kin­dle Scribes

mesaoptimizerFeb 21, 2024, 5:17 PM
57 points
18 comments6 min readLW link

A Tale of Two Res­tau­rant Types

ZviFeb 21, 2024, 1:50 PM
15 points
0 comments6 min readLW link
(thezvi.wordpress.com)

Less Wrong au­to­mated sys­tems are in­ad­ver­tently Cen­sor­ing me

RokoFeb 21, 2024, 12:57 PM
6 points
52 comments1 min readLW link

[Question] What is the re­search speed mul­ti­plier of the most ad­vanced cur­rent LLMs?

wunanFeb 21, 2024, 12:39 PM
6 points
2 comments1 min readLW link

Jailbreak­ing GPT-4 with the tool API

mishajwFeb 21, 2024, 11:16 AM
20 points
2 comments4 min readLW link

Gut Ren­o­vat­ing Another Bathroom

jefftkFeb 21, 2024, 3:00 AM
22 points
0 comments2 min readLW link
(www.jefftk.com)

Thoughts for and against an ASI figur­ing out ethics for itself

sweenesmFeb 20, 2024, 11:40 PM
6 points
10 comments3 min readLW link

AI #51: Alt­man’s Ambition

ZviFeb 20, 2024, 7:50 PM
83 points
5 comments38 min readLW link
(thezvi.wordpress.com)

The Third Gemini

ZviFeb 20, 2024, 7:50 PM
30 points
2 comments9 min readLW link
(thezvi.wordpress.com)

Why does gen­er­al­iza­tion work?

Martín SotoFeb 20, 2024, 5:51 PM
43 points
16 comments4 min readLW link

ChatGPT re­fuses to ac­cept a challenge where it would get shot be­tween the eyes [game the­ory]

Bill BenzonFeb 20, 2024, 4:55 PM
4 points
6 comments4 min readLW link

In­duc­ing hu­man-like bi­ases in moral rea­son­ing LMs

Feb 20, 2024, 4:28 PM
23 points
3 comments14 min readLW link

Monthly Roundup #15: Fe­bru­ary 2024

ZviFeb 20, 2024, 1:10 PM
22 points
7 comments32 min readLW link
(thezvi.wordpress.com)

Selec­tions From “The Trou­ble With Be­ing Born”

Arjun PanicksseryFeb 20, 2024, 10:07 AM
23 points
2 comments2 min readLW link
(arjunpanickssery.substack.com)

Difficulty classes for al­ign­ment properties

JozdienFeb 20, 2024, 9:08 AM
34 points
5 comments2 min readLW link

Les­sons from Failed At­tempts to Model Sleep­ing Beauty Problem

Ape in the coatFeb 20, 2024, 6:43 AM
13 points
16 comments14 min readLW link

flow­ing like wa­ter; hard like stone

Feb 20, 2024, 3:20 AM
27 points
4 comments4 min readLW link

Theism Isn’t So Crazy

omnizoidFeb 20, 2024, 3:20 AM
−31 points
11 comments19 min readLW link

[Question] Get­ting started at dis­til­la­tions: can cri­tique mine?

Joyee ChenFeb 20, 2024, 12:49 AM
2 points
0 comments1 min readLW link

Au­dit­ing LMs with coun­ter­fac­tual search: a tool for con­trol and ELK

Jacob PfauFeb 20, 2024, 12:02 AM
28 points
6 comments10 min readLW link

Ra­tion­al­ist Sto­ry­tel­ling (French)

Camille Berger Feb 19, 2024, 10:25 PM
3 points
0 comments1 min readLW link

Abs-E (or, speak only in the pos­i­tive)

dkl9Feb 19, 2024, 9:14 PM
29 points
24 comments2 min readLW link
(dkl9.net)

Re­tire­ment Ac­counts and Short Timelines

jefftkFeb 19, 2024, 6:50 PM
83 points
35 comments2 min readLW link
(www.jefftk.com)

How Tech­ni­cal AI Safety Re­searchers Can Help Im­ple­ment Pu­ni­tive Da­m­ages to Miti­gate Catas­trophic AI Risk

Gabriel WeilFeb 19, 2024, 6:00 PM
20 points
0 comments4 min readLW link

Pro­to­col eval­u­a­tions: good analo­gies vs control

Fabien RogerFeb 19, 2024, 6:00 PM
42 points
10 comments11 min readLW link

When Should Copy­right Get Shorter?

Maxwell TabarrokFeb 19, 2024, 4:03 PM
11 points
14 comments4 min readLW link
(www.maximum-progress.com)

Auto-match­ing hid­den lay­ers in Py­torch LLMs

chanindFeb 19, 2024, 12:40 PM
2 points
0 comments3 min readLW link

I’d also take $7 trillion

bhauthFeb 19, 2024, 3:31 AM
47 points
12 comments10 min readLW link
(www.bhauth.com)

On co­in­ci­dences and Bayesian rea­son­ing, as ap­plied to the ori­gins of COVID-19

viking_mathFeb 19, 2024, 1:14 AM
62 points
28 comments14 min readLW link

Solu­tion to the two en­velopes prob­lem for moral weights

MichaelStJulesFeb 19, 2024, 12:15 AM
9 points
1 commentLW link

Con­spir­acy In­ves­ti­ga­tion Done Right

ymeskhoutFeb 19, 2024, 12:09 AM
24 points
0 comments6 min readLW link

Scien­tific Method

Andrij “Androniq” GhorbunovFeb 18, 2024, 9:06 PM
24 points
4 comments30 min readLW link

[Question] Weigh­ing rep­u­ta­tional and moral con­se­quences of leav­ing Rus­sia or staying

spzaFeb 18, 2024, 7:36 PM
29 points
24 comments1 min readLW link

Things I’ve Grieved

RaemonFeb 18, 2024, 7:32 PM
125 points
6 comments2 min readLW link

Senses of “know­ing” a person

dkl9Feb 18, 2024, 7:13 PM
3 points
0 comments1 min readLW link
(dkl9.net)

The Jolly Green Gi­ant Chron­i­cles [ChatGPT]

Bill BenzonFeb 18, 2024, 5:28 PM
4 points
0 comments8 min readLW link

In­tu­ition for 1 + 2 + 3 + … = −1/​12

Shankar SivarajanFeb 18, 2024, 4:46 PM
18 points
28 comments3 min readLW link

No Click­bait—Misal­ign­ment Database

Kabir KumarFeb 18, 2024, 5:35 AM
6 points
10 comments1 min readLW link

Idea: NV⁻ Cen­ters for Brain Interpretability

James CamachoFeb 18, 2024, 5:28 AM
6 points
1 comment3 min readLW link

Celi­acs don’t need to live in fear

JarrahFeb 18, 2024, 2:30 AM
16 points
4 comments4 min readLW link

“What if we could re­design so­ciety from scratch? The promise of char­ter cities.” [Ra­tional An­i­ma­tions video]

Jackson WagnerFeb 18, 2024, 12:57 AM
40 points
7 commentsLW link
(www.youtube.com)

Eval­u­at­ing Solar

jefftkFeb 17, 2024, 9:50 PM
26 points
5 comments2 min readLW link
(www.jefftk.com)

Opinions sur­vey 2 (with ra­tio­nal­ism score at the end)

tailcalledFeb 17, 2024, 12:03 PM
2 points
11 comments1 min readLW link
(docs.google.com)

Achiev­ing AI Align­ment through De­liber­ate Uncer­tainty in Mul­ti­a­gent Systems

Florian_DietzFeb 17, 2024, 8:45 AM
4 points
0 comments13 min readLW link

Com­mu­ni­ca­tion, con­scious­ness, and be­lief strength measures

Jakub SmékalFeb 17, 2024, 5:45 AM
1 point
0 comments3 min readLW link

San Fer­nando Valley Ra­tion­al­ity: Fe­bru­ary 22, 2024

Thomas BroadleyFeb 17, 2024, 1:58 AM
3 points
0 comments1 min readLW link