Se­cu­rity Mind­set—Fire Alarms and Trig­ger Signatures

elspood9 Feb 2023 21:15 UTC
23 points
0 comments4 min readLW link

Im­pos­tor syn­drome: how to cure it with spread­sheets and med­i­ta­tion

KatWoods9 Feb 2023 21:04 UTC
29 points
2 comments19 min readLW link

Con­di­tion­ing Pre­dic­tive Models: De­ploy­ment strategy

9 Feb 2023 20:59 UTC
28 points
0 comments10 min readLW link

Make Con­flict of In­ter­est Poli­cies Public

jefftk9 Feb 2023 19:30 UTC
33 points
7 comments2 min readLW link
(www.jefftk.com)

Cu­rated blind auc­tion pre­dic­tion mar­kets and a rep­u­ta­tion sys­tem as an al­ter­na­tive to ed­i­to­rial re­view in news pub­li­ca­tion.

ciaran 9 Feb 2023 18:48 UTC
2 points
0 comments2 min readLW link

Tools for find­ing in­for­ma­tion on the internet

RomanHauksson9 Feb 2023 17:05 UTC
78 points
11 comments2 min readLW link
(roman.computer)

Covid 2/​9/​23: In­terferon λ

Zvi9 Feb 2023 16:50 UTC
48 points
8 comments12 min readLW link
(thezvi.wordpress.com)

EIS II: What is “In­ter­pretabil­ity”?

scasper9 Feb 2023 16:48 UTC
28 points
6 comments4 min readLW link

The Eng­ineer’s In­ter­pretabil­ity Se­quence (EIS) I: Intro

scasper9 Feb 2023 16:28 UTC
45 points
24 comments3 min readLW link

[Question] Do the Safety Prop­er­ties of Pow­er­ful AI Sys­tems Need to be Ad­ver­sar­i­ally Ro­bust? Why?

DragonGod9 Feb 2023 13:36 UTC
22 points
42 comments2 min readLW link

Which ML skills are use­ful for find­ing a new AIS re­search agenda?

Yonatan Cale9 Feb 2023 13:09 UTC
16 points
1 comment1 min readLW link

When To Stop

Alok Singh9 Feb 2023 9:10 UTC
31 points
5 comments1 min readLW link
(alok.github.io)

The Per­va­sive Illu­sion of See­ing the Com­plete World

shminux9 Feb 2023 6:47 UTC
34 points
1 comment2 min readLW link

Reli­gion is Good, Actually

Gordon Seidoh Worley9 Feb 2023 6:34 UTC
−1 points
39 comments4 min readLW link

Us­ing PICT against Pas­taGPT Jailbreaking

Quentin FEUILLADE--MONTIXI9 Feb 2023 4:30 UTC
17 points
0 comments9 min readLW link

Notes on the Math­e­mat­ics of LLM Architectures

Spencer Becker-Kahn9 Feb 2023 1:45 UTC
12 points
2 comments1 min readLW link
(drive.google.com)

On Devel­op­ing a Math­e­mat­i­cal The­ory of In­ter­pretabil­ity

Spencer Becker-Kahn9 Feb 2023 1:45 UTC
64 points
8 comments6 min readLW link

Ano­ma­lous to­kens re­veal the origi­nal iden­tities of In­struct models

9 Feb 2023 1:30 UTC
137 points
16 comments9 min readLW link
(generative.ink)

[Question] How would you use video gamey tech to help with AI safety?

porby9 Feb 2023 0:20 UTC
9 points
5 comments1 min readLW link

A (EtA: quick) note on ter­minol­ogy: AI Align­ment != AI x-safety

David Scott Krueger (formerly: capybaralet)8 Feb 2023 22:33 UTC
46 points
20 comments1 min readLW link

GPT-175bee

8 Feb 2023 18:58 UTC
119 points
13 comments1 min readLW link

Ei­genKarma: trust at scale

Henrik Karlsson8 Feb 2023 18:52 UTC
182 points
50 comments5 min readLW link

Con­di­tion­ing Pre­dic­tive Models: In­ter­ac­tions with other approaches

8 Feb 2023 18:19 UTC
32 points
2 comments11 min readLW link

Wanted: Tech­ni­cal an­i­ma­tor and/​or front-end de­vel­oper for in­ter­ac­tive di­a­grams of invention

jasoncrawford8 Feb 2023 17:14 UTC
30 points
3 comments1 min readLW link
(rootsofprogress.org)

A multi-dis­ci­plinary view on AI safety research

Roman Leventov8 Feb 2023 16:50 UTC
43 points
4 comments26 min readLW link

Com­mu­nity build­ing: Les­sons from ten years of fa­cil­i­ta­tion experience

Severin T. Seehrich8 Feb 2023 16:26 UTC
17 points
0 comments1 min readLW link

Progress links and tweets, 2023-02-08

jasoncrawford8 Feb 2023 15:52 UTC
10 points
0 comments1 min readLW link
(rootsofprogress.org)

A Par­tic­u­lar Equilibrium

Algon8 Feb 2023 15:16 UTC
13 points
0 comments2 min readLW link
(algon-33.github.io)

Self-Aware­ness (and pos­si­ble mode col­lapse around it) in ChatGPT

Yitz8 Feb 2023 9:57 UTC
18 points
2 comments2 min readLW link

Drugs are Some­times Good, Actually

Gordon Seidoh Worley8 Feb 2023 2:24 UTC
12 points
8 comments4 min readLW link

House Covid In­fec­tion Retrospective

jefftk8 Feb 2023 2:20 UTC
25 points
1 comment2 min readLW link
(www.jefftk.com)

Not­ing an er­ror in Inad­e­quate Equilibria

Matthew Barnett8 Feb 2023 1:33 UTC
359 points
56 comments2 min readLW link

Liv­ing No­mad­i­cally: My 80/​20 Guide

KatWoods8 Feb 2023 1:31 UTC
35 points
18 comments1 min readLW link

OpenAI/​Microsoft an­nounce “next gen­er­a­tion lan­guage model” in­te­grated into Bing/​Edge

LawrenceC7 Feb 2023 20:38 UTC
79 points
4 comments1 min readLW link
(blogs.microsoft.com)

How evals might (or might not) pre­vent catas­trophic risks from AI

Akash7 Feb 2023 20:16 UTC
43 points
0 comments9 min readLW link

Con­di­tion­ing Pre­dic­tive Models: Mak­ing in­ner al­ign­ment as easy as possible

7 Feb 2023 20:04 UTC
27 points
2 comments19 min readLW link

On The Cur­rent Sta­tus Of AI Dating

Nikita Brancatisano7 Feb 2023 20:00 UTC
52 points
8 comments6 min readLW link

Fram­ing AI strategy

Zach Stein-Perlman7 Feb 2023 19:20 UTC
33 points
1 comment18 min readLW link
(aiimpacts.org)

Re­view of AI Align­ment Progress

PeterMcCluskey7 Feb 2023 18:57 UTC
72 points
32 comments7 min readLW link
(bayesianinvestor.com)

The Eco­nomics of Contracts

Edward P. Könings7 Feb 2023 13:52 UTC
21 points
3 comments8 min readLW link
(edwardknings.substack.com)

Two very differ­ent ex­pe­riences with ChatGPT

Sherrinford7 Feb 2023 13:09 UTC
38 points
15 comments5 min readLW link

[About Me] Cin­era’s Home Page

DragonGod7 Feb 2023 12:56 UTC
30 points
2 comments9 min readLW link

Stuff I Recom­mend You Use

Arjun Panickssery7 Feb 2023 12:18 UTC
16 points
2 comments2 min readLW link
(arjunpanickssery.substack.com)

AXRP: Store, Pa­treon, Video

DanielFilan7 Feb 2023 4:50 UTC
12 points
0 comments1 min readLW link

Duck­bill Masks Are Great

jefftk7 Feb 2023 3:00 UTC
22 points
14 comments1 min readLW link
(www.jefftk.com)

EA & LW Fo­rum Weekly Sum­mary (30th Jan − 5th Feb 2023)

Zoe Williams7 Feb 2023 2:13 UTC
3 points
3 comments1 min readLW link

so you think you’re not qual­ified to do tech­ni­cal al­ign­ment re­search?

Tamsin Leake7 Feb 2023 1:54 UTC
55 points
7 comments1 min readLW link
(carado.moe)

[ASoT] Policy Tra­jec­tory Visualization

Ulisse Mini7 Feb 2023 0:13 UTC
9 points
2 comments1 min readLW link

English is a Ter­rible Pro­gram­ming Lan­guage—And other rea­sons AI won’t dis­place programmers

dawsoneliasen6 Feb 2023 22:12 UTC
26 points
8 comments8 min readLW link
(orbistertius.substack.com)

Afri­can Wild Dogs Vote By Sneez­ing—Can AI Help Us Do Bet­ter?

Augmented Assembly6 Feb 2023 21:09 UTC
10 points
6 comments4 min readLW link