Cheap Model → Big Model design

Maxwell Peterson19 Nov 2023 22:50 UTC
15 points
2 comments7 min readLW link

Hu­man-like sys­tem­atic gen­er­al­iza­tion through a meta-learn­ing neu­ral network

Burny19 Nov 2023 21:41 UTC
7 points
0 comments2 min readLW link
(twitter.com)

“Benev­olent [ie, Ruler] AI is a bad idea” and a sug­gested alternative

the gears to ascension19 Nov 2023 20:22 UTC
22 points
11 comments1 min readLW link
(www.palladiummag.com)

Align­ment is Hard: An Un­com­putable Align­ment Problem

Alexander Bistagne19 Nov 2023 19:38 UTC
−5 points
4 comments1 min readLW link
(github.com)

New pa­per shows truth­ful­ness & in­struc­tion-fol­low­ing don’t gen­er­al­ize by default

joshc19 Nov 2023 19:27 UTC
58 points
0 comments4 min readLW link

In favour of a sovereign state of Gaza

Yair Halberstadt19 Nov 2023 16:08 UTC
8 points
3 comments4 min readLW link

My Crit­i­cism of Sin­gu­lar Learn­ing Theory

Joar Skalse19 Nov 2023 15:19 UTC
79 points
56 comments12 min readLW link

“Why can’t you just turn it off?”

Roko19 Nov 2023 14:46 UTC
42 points
25 comments1 min readLW link

Spa­cious­ness In Part­ner Dance: A Nat­u­ral­ism Demo

LoganStrohl19 Nov 2023 7:00 UTC
78 points
5 comments19 min readLW link

Alt­man firing re­tal­i­a­tion in­com­ing?

trevor19 Nov 2023 0:10 UTC
50 points
23 comments5 min readLW link

When Will AIs Develop Long-Term Plan­ning?

PeterMcCluskey19 Nov 2023 0:08 UTC
18 points
5 comments4 min readLW link
(bayesianinvestor.com)

Killswitch

Junio18 Nov 2023 22:53 UTC
2 points
0 comments3 min readLW link

Superalignment

Douglas_Reay18 Nov 2023 22:37 UTC
−4 points
4 comments1 min readLW link
(openai.com)

Pre­dictable Defect-Co­op­er­ate?

quetzal_rainbow18 Nov 2023 15:38 UTC
7 points
1 comment2 min readLW link

I think I’m just con­fused. Once a model ex­ists, how do you “red-team” it to see whether it’s safe. Isn’t it already dan­ger­ous?

FTPickle18 Nov 2023 14:16 UTC
21 points
13 comments1 min readLW link

AI Safety Camp 2024

Linda Linsefors18 Nov 2023 10:37 UTC
15 points
1 comment4 min readLW link
(aisafety.camp)

Post-EAG Mu­sic Party

jefftk18 Nov 2023 3:00 UTC
14 points
2 comments2 min readLW link
(www.jefftk.com)

Let­ter to a Sonoma County Jail Cell

MadHatter18 Nov 2023 2:24 UTC
11 points
1 comment1 min readLW link
(open.substack.com)

1. A Sense of Fair­ness: De­con­fus­ing Ethics

RogerDearnaley17 Nov 2023 20:55 UTC
15 points
8 comments15 min readLW link

Sam Alt­man fired from OpenAI

LawrenceC17 Nov 2023 20:42 UTC
192 points
75 comments1 min readLW link
(openai.com)

On the lethal­ity of bi­ased hu­man re­ward ratings

17 Nov 2023 18:59 UTC
48 points
10 comments37 min readLW link

Coup probes: Catch­ing catas­tro­phes with probes trained off-policy

Fabien Roger17 Nov 2023 17:58 UTC
85 points
7 comments14 min readLW link

On Lies and Liars

Gabriel Alfour17 Nov 2023 17:13 UTC
33 points
4 comments14 min readLW link
(cognition.cafe)

Clas­sify­ing rep­re­sen­ta­tions of sparse au­toen­coders (SAEs)

Annah17 Nov 2023 13:54 UTC
15 points
6 comments2 min readLW link

R&D is a Huge Ex­ter­nal­ity, So Why Do Mar­kets Do So Much of it?

Maxwell Tabarrok17 Nov 2023 13:14 UTC
15 points
14 comments3 min readLW link
(maximumprogress.substack.com)

On ex­clud­ing dan­ger­ous in­for­ma­tion from training

ShayBenMoshe17 Nov 2023 11:14 UTC
23 points
5 comments3 min readLW link

The dan­gers of re­pro­duc­ing while old

garymm17 Nov 2023 5:55 UTC
23 points
6 comments1 min readLW link
(www.garymm.org)

I put odds on ends with Nathan Young

KatjaGrace17 Nov 2023 5:40 UTC
8 points
0 comments1 min readLW link
(worldspiritsockpuppet.com)

De­bate helps su­per­vise hu­man ex­perts [Paper]

habryka17 Nov 2023 5:25 UTC
29 points
6 comments1 min readLW link
(github.com)

A to Z of things

KatjaGrace17 Nov 2023 5:20 UTC
64 points
6 comments1 min readLW link
(worldspiritsockpuppet.com)

On Tap­ping Out

Screwtape17 Nov 2023 3:23 UTC
45 points
13 comments8 min readLW link

Elic­it­ing La­tent Knowl­edge in Com­pre­hen­sive AI Ser­vices Models

acabodi17 Nov 2023 2:36 UTC
6 points
0 comments5 min readLW link

Some Rules for an Alge­bra of Bayes Nets

16 Nov 2023 23:53 UTC
69 points
30 comments14 min readLW link

How much to up­date on re­cent AI gov­er­nance moves?

16 Nov 2023 23:46 UTC
109 points
4 comments29 min readLW link

New LessWrong fea­ture: Dialogue Matching

jacobjacob16 Nov 2023 21:27 UTC
106 points
22 comments3 min readLW link

Towards Eval­u­at­ing AI Sys­tems for Mo­ral Sta­tus Us­ing Self-Reports

16 Nov 2023 20:18 UTC
45 points
3 comments1 min readLW link
(arxiv.org)

So­cial Dark Matter

[DEACTIVATED] Duncan Sabien16 Nov 2023 20:00 UTC
282 points
112 comments34 min readLW link

AI #38: Let’s Make a Deal

Zvi16 Nov 2023 19:50 UTC
44 points
2 comments55 min readLW link
(thezvi.wordpress.com)

Fore­cast­ing AI (Overview)

jsteinhardt16 Nov 2023 19:00 UTC
35 points
0 comments2 min readLW link
(bounded-regret.ghost.io)

We Should Talk About This More. Epistemic World Col­lapse as Im­mi­nent Safety Risk of Gen­er­a­tive AI.

Joerg Weiss16 Nov 2023 18:46 UTC
11 points
2 comments29 min readLW link

In­tel­li­gence in sys­tems (hu­man, AI) can be con­cep­tu­al­ized as the re­s­olu­tion and through­put at which a sys­tem can pro­cess and af­fect Shan­non in­for­ma­tion.

AiresJL16 Nov 2023 17:46 UTC
0 points
0 comments2 min readLW link

Life on the Grid (Part 2)

rogersbacon16 Nov 2023 17:22 UTC
7 points
0 comments15 min readLW link
(www.secretorum.life)

The im­pos­si­bil­ity of ra­tio­nally an­a­lyz­ing par­ti­san news

RationalDino16 Nov 2023 16:19 UTC
4 points
4 comments1 min readLW link

We are Peace­craft.ai!

MadHatter16 Nov 2023 14:15 UTC
15 points
20 comments2 min readLW link

A di­alec­ti­cal view of the his­tory of AI, Part 1: We’re only in the an­tithe­sis phase. [A syn­the­sis is in the fu­ture.]

Bill Benzon16 Nov 2023 12:34 UTC
6 points
0 comments12 min readLW link

[Question] How much fraud is there in academia?

ChristianKl16 Nov 2023 11:50 UTC
23 points
10 comments1 min readLW link

Learn­ing co­effi­cient es­ti­ma­tion: the details

Zach Furman16 Nov 2023 3:19 UTC
36 points
0 comments2 min readLW link
(colab.research.google.com)

[Question] AI Safety orgs- what’s your biggest bot­tle­neck right now?

Kabir Kumar16 Nov 2023 2:02 UTC
1 point
0 comments1 min readLW link

My cri­tique of Eliezer’s deeply ir­ra­tional beliefs

Jorterder16 Nov 2023 0:34 UTC
−33 points
1 comment9 min readLW link
(docs.google.com)

Ex­trap­o­lat­ing from Five Words

Gordon Seidoh Worley15 Nov 2023 23:21 UTC
40 points
11 comments2 min readLW link