Act­ing Nor­mal is Good, Actually

Gordon Seidoh Worley10 Feb 2023 23:35 UTC
14 points
5 comments3 min readLW link

[S] D&D.Sci: All the D8a. Allllllll of it.

aphyer10 Feb 2023 21:14 UTC
42 points
17 comments6 min readLW link

A Differ­ent Kind of Ark: My failed at­tempt to build a bridge be­tween universes

ChrisM10 Feb 2023 20:49 UTC
2 points
2 comments6 min readLW link
(www.vesselproject.io)

Prizes for the 2021 Review

Raemon10 Feb 2023 19:47 UTC
69 points
2 comments4 min readLW link

A pro­posed method for fore­cast­ing trans­for­ma­tive AI

Matthew Barnett10 Feb 2023 19:34 UTC
121 points
21 comments10 min readLW link

The best way so far to ex­plain AI risk: The Precipice (p. 137-149)

trevor10 Feb 2023 19:33 UTC
50 points
2 comments17 min readLW link

Is this a weak pivotal act: cre­at­ing nanobots that eat evil AGIs (but noth­ing else)?

Christopher King10 Feb 2023 19:26 UTC
0 points
3 comments1 min readLW link

Why I’m not work­ing on {de­bate, RRM, ELK, nat­u­ral ab­strac­tions}

Steven Byrnes10 Feb 2023 19:22 UTC
71 points
19 comments9 min readLW link

Con­di­tion­ing Pre­dic­tive Models: Open prob­lems, Con­clu­sion, and Appendix

10 Feb 2023 19:21 UTC
36 points
3 comments11 min readLW link

Jobs that can help with the most im­por­tant century

HoldenKarnofsky10 Feb 2023 18:20 UTC
24 points
0 comments19 min readLW link
(www.cold-takes.com)

[Question] Is it a co­in­ci­dence that GPT-3 re­quires roughly the same amount of com­pute as is nec­es­sary to em­u­late the hu­man brain?

RomanS10 Feb 2023 16:26 UTC
12 points
10 comments1 min readLW link

Con­tra: Chang­ing Role Terms

jefftk10 Feb 2023 15:00 UTC
8 points
0 comments3 min readLW link
(www.jefftk.com)

Cyborgism

10 Feb 2023 14:47 UTC
333 points
46 comments35 min readLW link

FLI Pod­cast: Con­nor Leahy on AI Progress, Chimps, Memes, and Mar­kets (Part 1/​3)

10 Feb 2023 13:55 UTC
39 points
0 comments43 min readLW link

Many im­por­tant tech­nolo­gies start out as sci­ence fic­tion be­fore be­com­ing real

trevor10 Feb 2023 9:36 UTC
26 points
2 comments2 min readLW link

[Question] What’s ac­tu­ally go­ing on in the “mind” of the model when we fine-tune GPT-3 to In­struc­tGPT?

rpglover6410 Feb 2023 7:57 UTC
18 points
3 comments1 min readLW link

Mechanism De­sign for AI Safety—Agenda Creation Retreat

Rubi J. Hudson10 Feb 2023 3:05 UTC
24 points
2 comments1 min readLW link

[Question] On util­ity functions

jodaru10 Feb 2023 1:22 UTC
11 points
10 comments1 min readLW link

Se­cu­rity Mind­set—Fire Alarms and Trig­ger Signatures

elspood9 Feb 2023 21:15 UTC
23 points
0 comments4 min readLW link

Im­pos­tor syn­drome: how to cure it with spread­sheets and med­i­ta­tion

KatWoods9 Feb 2023 21:04 UTC
29 points
2 comments19 min readLW link

Con­di­tion­ing Pre­dic­tive Models: De­ploy­ment strategy

9 Feb 2023 20:59 UTC
28 points
0 comments10 min readLW link

Make Con­flict of In­ter­est Poli­cies Public

jefftk9 Feb 2023 19:30 UTC
33 points
7 comments2 min readLW link
(www.jefftk.com)

Cu­rated blind auc­tion pre­dic­tion mar­kets and a rep­u­ta­tion sys­tem as an al­ter­na­tive to ed­i­to­rial re­view in news pub­li­ca­tion.

ciaran 9 Feb 2023 18:48 UTC
2 points
0 comments2 min readLW link

Tools for find­ing in­for­ma­tion on the internet

RomanHauksson9 Feb 2023 17:05 UTC
78 points
11 comments2 min readLW link
(roman.computer)

Covid 2/​9/​23: In­terferon λ

Zvi9 Feb 2023 16:50 UTC
48 points
8 comments12 min readLW link
(thezvi.wordpress.com)

EIS II: What is “In­ter­pretabil­ity”?

scasper9 Feb 2023 16:48 UTC
28 points
6 comments4 min readLW link

The Eng­ineer’s In­ter­pretabil­ity Se­quence (EIS) I: Intro

scasper9 Feb 2023 16:28 UTC
45 points
24 comments3 min readLW link

[Question] Do the Safety Prop­er­ties of Pow­er­ful AI Sys­tems Need to be Ad­ver­sar­i­ally Ro­bust? Why?

DragonGod9 Feb 2023 13:36 UTC
22 points
42 comments2 min readLW link

Which ML skills are use­ful for find­ing a new AIS re­search agenda?

Yonatan Cale9 Feb 2023 13:09 UTC
16 points
1 comment1 min readLW link

When To Stop

Alok Singh9 Feb 2023 9:10 UTC
31 points
5 comments1 min readLW link
(alok.github.io)

The Per­va­sive Illu­sion of See­ing the Com­plete World

shminux9 Feb 2023 6:47 UTC
34 points
1 comment2 min readLW link

Reli­gion is Good, Actually

Gordon Seidoh Worley9 Feb 2023 6:34 UTC
−1 points
39 comments4 min readLW link

Us­ing PICT against Pas­taGPT Jailbreaking

Quentin FEUILLADE--MONTIXI9 Feb 2023 4:30 UTC
17 points
0 comments9 min readLW link

Notes on the Math­e­mat­ics of LLM Architectures

Spencer Becker-Kahn9 Feb 2023 1:45 UTC
12 points
2 comments1 min readLW link
(drive.google.com)

On Devel­op­ing a Math­e­mat­i­cal The­ory of In­ter­pretabil­ity

Spencer Becker-Kahn9 Feb 2023 1:45 UTC
64 points
8 comments6 min readLW link

Ano­ma­lous to­kens re­veal the origi­nal iden­tities of In­struct models

9 Feb 2023 1:30 UTC
137 points
16 comments9 min readLW link
(generative.ink)

[Question] How would you use video gamey tech to help with AI safety?

porby9 Feb 2023 0:20 UTC
9 points
5 comments1 min readLW link

A (EtA: quick) note on ter­minol­ogy: AI Align­ment != AI x-safety

David Scott Krueger (formerly: capybaralet)8 Feb 2023 22:33 UTC
46 points
20 comments1 min readLW link

GPT-175bee

8 Feb 2023 18:58 UTC
119 points
13 comments1 min readLW link

Ei­genKarma: trust at scale

Henrik Karlsson8 Feb 2023 18:52 UTC
182 points
50 comments5 min readLW link

Con­di­tion­ing Pre­dic­tive Models: In­ter­ac­tions with other approaches

8 Feb 2023 18:19 UTC
32 points
2 comments11 min readLW link

Wanted: Tech­ni­cal an­i­ma­tor and/​or front-end de­vel­oper for in­ter­ac­tive di­a­grams of invention

jasoncrawford8 Feb 2023 17:14 UTC
30 points
3 comments1 min readLW link
(rootsofprogress.org)

A multi-dis­ci­plinary view on AI safety research

Roman Leventov8 Feb 2023 16:50 UTC
43 points
4 comments26 min readLW link

Com­mu­nity build­ing: Les­sons from ten years of fa­cil­i­ta­tion experience

Severin T. Seehrich8 Feb 2023 16:26 UTC
17 points
0 comments1 min readLW link

Progress links and tweets, 2023-02-08

jasoncrawford8 Feb 2023 15:52 UTC
10 points
0 comments1 min readLW link
(rootsofprogress.org)

A Par­tic­u­lar Equilibrium

Algon8 Feb 2023 15:16 UTC
13 points
0 comments2 min readLW link
(algon-33.github.io)

Self-Aware­ness (and pos­si­ble mode col­lapse around it) in ChatGPT

Yitz8 Feb 2023 9:57 UTC
18 points
2 comments2 min readLW link

Drugs are Some­times Good, Actually

Gordon Seidoh Worley8 Feb 2023 2:24 UTC
12 points
8 comments4 min readLW link

House Covid In­fec­tion Retrospective

jefftk8 Feb 2023 2:20 UTC
25 points
1 comment2 min readLW link
(www.jefftk.com)

Not­ing an er­ror in Inad­e­quate Equilibria

Matthew Barnett8 Feb 2023 1:33 UTC
359 points
56 comments2 min readLW link