Job List­ing: Manag­ing Edi­tor /​ Writer

Gretta Duleba21 Feb 2024 23:41 UTC
43 points
2 comments1 min readLW link

The Pareto Best and the Curse of Doom

Screwtape21 Feb 2024 23:10 UTC
110 points
22 comments9 min readLW link

AISN #31: A New AI Policy Bill in Cal­ifor­nia Plus, Prece­dents for AI Gover­nance and The EU AI Office

21 Feb 2024 21:58 UTC
17 points
0 comments6 min readLW link
(newsletter.safe.ai)

Analo­gies be­tween scal­ing labs and mis­al­igned su­per­in­tel­li­gent AI

scasper21 Feb 2024 19:29 UTC
74 points
5 comments4 min readLW link

Ex­tinc­tion Risks from AI: In­visi­ble to Science?

21 Feb 2024 18:07 UTC
24 points
7 comments1 min readLW link
(arxiv.org)

Ex­tinc­tion-level Good­hart’s Law as a Prop­erty of the Environment

21 Feb 2024 17:56 UTC
23 points
0 comments10 min readLW link

Dy­nam­ics Cru­cial to AI Risk Seem to Make for Com­pli­cated Models

21 Feb 2024 17:54 UTC
18 points
0 comments9 min readLW link

Which Model Prop­er­ties are Ne­c­es­sary for Eval­u­at­ing an Ar­gu­ment?

21 Feb 2024 17:52 UTC
17 points
2 comments7 min readLW link

Weak vs Quan­ti­ta­tive Ex­tinc­tion-level Good­hart’s Law

21 Feb 2024 17:38 UTC
17 points
1 comment2 min readLW link

Dual Wield­ing Kin­dle Scribes

mesaoptimizer21 Feb 2024 17:17 UTC
50 points
18 comments6 min readLW link

A Tale of Two Res­tau­rant Types

Zvi21 Feb 2024 13:50 UTC
15 points
0 comments6 min readLW link
(thezvi.wordpress.com)

Less Wrong au­to­mated sys­tems are in­ad­ver­tently Cen­sor­ing me

Roko21 Feb 2024 12:57 UTC
8 points
52 comments1 min readLW link

[Question] What is the re­search speed mul­ti­plier of the most ad­vanced cur­rent LLMs?

wunan21 Feb 2024 12:39 UTC
6 points
2 comments1 min readLW link

Jailbreak­ing GPT-4 with the tool API

mishajw21 Feb 2024 11:16 UTC
20 points
2 comments4 min readLW link

Gut Ren­o­vat­ing Another Bathroom

jefftk21 Feb 2024 3:00 UTC
22 points
0 comments2 min readLW link
(www.jefftk.com)

Thoughts for and against an ASI figur­ing out ethics for itself

sweenesm20 Feb 2024 23:40 UTC
6 points
10 comments3 min readLW link

AI #51: Alt­man’s Ambition

Zvi20 Feb 2024 19:50 UTC
83 points
5 comments38 min readLW link
(thezvi.wordpress.com)

The Third Gemini

Zvi20 Feb 2024 19:50 UTC
30 points
2 comments9 min readLW link
(thezvi.wordpress.com)

Why does gen­er­al­iza­tion work?

Martín Soto20 Feb 2024 17:51 UTC
43 points
16 comments4 min readLW link

Rep­re­sen­ta­tions of Ab­stract Re­la­tions in Infancy

Bruce W. Lee20 Feb 2024 17:40 UTC
2 points
0 comments3 min readLW link
(direct.mit.edu)

ChatGPT re­fuses to ac­cept a challenge where it would get shot be­tween the eyes [game the­ory]

Bill Benzon20 Feb 2024 16:55 UTC
4 points
6 comments4 min readLW link

In­duc­ing hu­man-like bi­ases in moral rea­son­ing LMs

20 Feb 2024 16:28 UTC
19 points
3 comments14 min readLW link

Monthly Roundup #15: Fe­bru­ary 2024

Zvi20 Feb 2024 13:10 UTC
22 points
7 comments32 min readLW link
(thezvi.wordpress.com)

Selec­tions From “The Trou­ble With Be­ing Born”

Arjun Panickssery20 Feb 2024 10:07 UTC
23 points
2 comments2 min readLW link
(arjunpanickssery.substack.com)

Difficulty classes for al­ign­ment properties

Jozdien20 Feb 2024 9:08 UTC
33 points
5 comments2 min readLW link

Les­sons from Failed At­tempts to Model Sleep­ing Beauty Problem

Ape in the coat20 Feb 2024 6:43 UTC
11 points
12 comments14 min readLW link

flow­ing like wa­ter; hard like stone

20 Feb 2024 3:20 UTC
27 points
4 comments4 min readLW link

Theism Isn’t So Crazy

omnizoid20 Feb 2024 3:20 UTC
−31 points
11 comments19 min readLW link

[Question] Get­ting started at dis­til­la­tions: can cri­tique mine?

Joyee Chen20 Feb 2024 0:49 UTC
2 points
0 comments1 min readLW link

Au­dit­ing LMs with coun­ter­fac­tual search: a tool for con­trol and ELK

Jacob Pfau20 Feb 2024 0:02 UTC
28 points
6 comments10 min readLW link

Ra­tion­al­ist Sto­ry­tel­ling (French)

Camille Berger 19 Feb 2024 22:25 UTC
3 points
0 comments1 min readLW link

Abs-E (or, speak only in the pos­i­tive)

dkl919 Feb 2024 21:14 UTC
22 points
20 comments2 min readLW link
(dkl9.net)

Re­tire­ment Ac­counts and Short Timelines

jefftk19 Feb 2024 18:50 UTC
83 points
35 comments2 min readLW link
(www.jefftk.com)

Re­la­tional Think­ing in An­i­mals and Humans

Bruce W. Lee19 Feb 2024 18:34 UTC
4 points
0 comments4 min readLW link
(psycnet.apa.org)

How Tech­ni­cal AI Safety Re­searchers Can Help Im­ple­ment Pu­ni­tive Da­m­ages to Miti­gate Catas­trophic AI Risk

Gabriel Weil19 Feb 2024 18:00 UTC
18 points
0 comments4 min readLW link

Pro­to­col eval­u­a­tions: good analo­gies vs control

Fabien Roger19 Feb 2024 18:00 UTC
35 points
10 comments11 min readLW link

When Should Copy­right Get Shorter?

Maxwell Tabarrok19 Feb 2024 16:03 UTC
11 points
14 comments4 min readLW link
(www.maximum-progress.com)

Auto-match­ing hid­den lay­ers in Py­torch LLMs

chanind19 Feb 2024 12:40 UTC
2 points
0 comments3 min readLW link

I’d also take $7 trillion

bhauth19 Feb 2024 3:31 UTC
45 points
12 comments10 min readLW link
(www.bhauth.com)

On co­in­ci­dences and Bayesian rea­son­ing, as ap­plied to the ori­gins of COVID-19

viking_math19 Feb 2024 1:14 UTC
62 points
28 comments14 min readLW link

Solu­tion to the two en­velopes prob­lem for moral weights

MichaelStJules19 Feb 2024 0:15 UTC
9 points
1 comment1 min readLW link

Con­spir­acy In­ves­ti­ga­tion Done Right

ymeskhout19 Feb 2024 0:09 UTC
21 points
0 comments6 min readLW link

Scien­tific Method

Andrij “Androniq” Ghorbunov18 Feb 2024 21:06 UTC
20 points
4 comments30 min readLW link

[Question] Weigh­ing rep­u­ta­tional and moral con­se­quences of leav­ing Rus­sia or staying

spza18 Feb 2024 19:36 UTC
29 points
24 comments1 min readLW link

Things I’ve Grieved

Raemon18 Feb 2024 19:32 UTC
122 points
6 comments2 min readLW link

Senses of “know­ing” a person

dkl918 Feb 2024 19:13 UTC
3 points
0 comments1 min readLW link
(dkl9.net)

The Jolly Green Gi­ant Chron­i­cles [ChatGPT]

Bill Benzon18 Feb 2024 17:28 UTC
4 points
0 comments8 min readLW link

In­tu­ition for 1 + 2 + 3 + … = −1/​12

Shankar Sivarajan18 Feb 2024 16:46 UTC
13 points
28 comments3 min readLW link

No Click­bait—Misal­ign­ment Database

Kabir Kumar18 Feb 2024 5:35 UTC
5 points
10 comments1 min readLW link

Idea: NV⁻ Cen­ters for Brain Interpretability

James Camacho18 Feb 2024 5:28 UTC
10 points
1 comment3 min readLW link