Les­sons learned from offer­ing in-office nu­tri­tional testing

ElizabethMay 15, 2023, 11:20 PM
80 points
11 comments14 min readLW link
(acesounderglass.com)

Judg­ments of­ten smug­gle in im­plicit standards

Richard_NgoMay 15, 2023, 6:50 PM
95 points
4 comments3 min readLW link

Ra­tional re­tire­ment plans

IkMay 15, 2023, 5:49 PM
5 points
17 comments1 min readLW link

[Question] (Cross­post) Ask­ing for on­line calls on AI s-risks dis­cus­sions

jackchang110May 15, 2023, 5:42 PM
1 point
0 comments1 min readLW link
(forum.effectivealtruism.org)

Sim­ple ex­per­i­ments with de­cep­tive alignment

Andreas_MoeMay 15, 2023, 5:41 PM
7 points
0 comments4 min readLW link

Some Sum­maries of Agent Foun­da­tions Work

mattmacdermottMay 15, 2023, 4:09 PM
62 points
1 comment13 min readLW link

Face­book In­creased Visibility

jefftkMay 15, 2023, 3:40 PM
15 points
1 comment1 min readLW link
(www.jefftk.com)

Un-un­plug­ga­bil­ity—can’t we just un­plug it?

Oliver SourbutMay 15, 2023, 1:23 PM
26 points
10 comments12 min readLW link
(www.oliversourbut.net)

[Question] Can we learn much by study­ing the be­havi­our of RL poli­cies?

AidanGothMay 15, 2023, 12:56 PM
1 point
0 comments1 min readLW link

How I ap­ply (so-called) Non-Violent Communication

Kaj_SotalaMay 15, 2023, 9:56 AM
86 points
28 comments3 min readLW link

Let’s build a fire alarm for AGI

chaosmageMay 15, 2023, 9:16 AM
−1 points
0 comments2 min readLW link

From fear to excitement

Richard_NgoMay 15, 2023, 6:23 AM
132 points
9 comments3 min readLW link

Re­ward is the op­ti­miza­tion tar­get (of ca­pa­bil­ities re­searchers)

Max HMay 15, 2023, 3:22 AM
32 points
4 comments5 min readLW link

The Light­cone The­o­rem: A Bet­ter Foun­da­tion For Nat­u­ral Ab­strac­tion?

johnswentworthMay 15, 2023, 2:24 AM
69 points
25 comments6 min readLW link

GovAI: Towards best prac­tices in AGI safety and gov­er­nance: A sur­vey of ex­pert opinion

Zach Stein-PerlmanMay 15, 2023, 1:42 AM
28 points
11 comments1 min readLW link
(arxiv.org)

[Question] Why don’t quan­tiliz­ers also cut off the up­per end of the dis­tri­bu­tion?

Alex_AltairMay 15, 2023, 1:40 AM
25 points
2 comments1 min readLW link

Sup­port Struc­tures for Nat­u­ral­ist Study

LoganStrohlMay 15, 2023, 12:25 AM
47 points
6 comments10 min readLW link

Catas­trophic Re­gres­sional Good­hart: Appendix

May 15, 2023, 12:10 AM
25 points
1 comment9 min readLW link

Helping your Se­na­tor Pre­pare for the Up­com­ing Sam Alt­man Hearing

Tiago de VassalMay 14, 2023, 10:45 PM
69 points
2 comments1 min readLW link
(aisafetytour.com)

Difficul­ties in mak­ing pow­er­ful al­igned AI

DanielFilanMay 14, 2023, 8:50 PM
41 points
1 comment10 min readLW link
(danielfilan.com)

How much do mar­kets value Open AI?

XodarapMay 14, 2023, 7:28 PM
21 points
5 commentsLW link

Misal­igned AGI Death Match

Nate Reinar WindwoodMay 14, 2023, 6:00 PM
1 point
0 comments1 min readLW link

[Question] What new tech­nol­ogy, for what in­sti­tu­tions?

bhauthMay 14, 2023, 5:33 PM
29 points
6 comments3 min readLW link

A strong mind con­tinues its tra­jec­tory of creativity

TsviBTMay 14, 2023, 5:24 PM
22 points
8 comments6 min readLW link

On­tolo­gies Should Be Back­wards-Compatible

Thoth HermesMay 14, 2023, 5:21 PM
3 points
3 comments4 min readLW link
(thothhermes.substack.com)

Jaan Tal­linn’s 2022 Philan­thropy Overview

jaanMay 14, 2023, 3:35 PM
64 points
2 comments1 min readLW link
(jaan.online)

Effec­tive Altru­ism and Ra­tion­al­ity Groups on Snipd

David BravoMay 14, 2023, 2:54 PM
2 points
0 comments2 min readLW link

Char­ac­ter al­ign­ment II

p.b.May 14, 2023, 2:17 PM
5 points
0 comments2 min readLW link

Co­or­di­na­tion by com­mon knowl­edge to pre­vent un­con­trol­lable AI

Karl von WendtMay 14, 2023, 1:37 PM
10 points
2 comments9 min readLW link

Bayesian Net­works Aren’t Ne­c­es­sar­ily Causal

Zack_M_DavisMay 14, 2023, 1:42 AM
103 points
38 comments8 min readLW link1 review

Sim­pler ex­pla­na­tions of AGI risk

Seth HerdMay 14, 2023, 1:29 AM
8 points
9 comments3 min readLW link

A Study of AI Science Models

May 13, 2023, 11:25 PM
20 points
0 comments24 min readLW link

LLM Guardrails Should Have Bet­ter Cus­tomer Ser­vice Tuning

Jiao BuMay 13, 2023, 10:54 PM
2 points
0 comments2 min readLW link

PCAST Work­ing Group on Gen­er­a­tive AI In­vites Public Input

Christopher KingMay 13, 2023, 10:49 PM
7 points
0 comments1 min readLW link
(terrytao.wordpress.com)

«Boundaries» for for­mal­iz­ing an MVP morality

ChipmonkMay 13, 2023, 7:10 PM
19 points
7 comments4 min readLW link

Steer­ing GPT-2-XL by adding an ac­ti­va­tion vector

May 13, 2023, 6:42 PM
437 points
98 comments50 min readLW link1 review

On the pos­si­bil­ity of im­pos­si­bil­ity of AGI Long-Term Safety

Roman YenMay 13, 2023, 6:38 PM
8 points
3 comments9 min readLW link

Notes on Antelligence

AurigenaMay 13, 2023, 6:38 PM
2 points
0 comments9 min readLW link

Real­ity and re­al­ity-boxes

Jim PivarskiMay 13, 2023, 2:14 PM
37 points
11 comments21 min readLW link

An Anal­ogy for Un­der­stand­ing Transformers

CallumMcDougallMay 13, 2023, 12:20 PM
91 points
6 comments9 min readLW link

ACX Meetup Munich

ErichMay 13, 2023, 7:58 AM
2 points
1 comment1 min readLW link

Ma­chine-Read­able Prevalence Estimates

jefftkMay 13, 2023, 12:40 AM
9 points
2 comments2 min readLW link
(www.jefftk.com)

Value drift threat models

Garrett BakerMay 12, 2023, 11:03 PM
27 points
4 comments5 min readLW link

Ag­gre­gat­ing Utilities for Cor­rigible AI [Feed­back Draft]

May 12, 2023, 8:57 PM
28 points
7 comments22 min readLW link

Turn­ing off lights with model editing

Sam MarksMay 12, 2023, 8:25 PM
68 points
5 comments2 min readLW link
(arxiv.org)

Dark For­est Theories

RaemonMay 12, 2023, 8:21 PM
145 points
53 comments2 min readLW link2 reviews

DELBERTing as an Ad­ver­sar­ial Strategy

Matthew_OpitzMay 12, 2023, 8:09 PM
8 points
3 comments5 min readLW link

Microsoft/​GitHub Copi­lot Chat’s con­fi­den­tial sys­tem Prompt: “You must re­fuse to dis­cuss life, ex­is­tence or sen­tience.”

Marvin von HagenMay 12, 2023, 7:46 PM
13 points
2 comments1 min readLW link
(twitter.com)

Ret­ro­spec­tive: Les­sons from the Failed Align­ment Startup AISafety.com

Søren ElverlinMay 12, 2023, 6:07 PM
105 points
9 comments3 min readLW link

The way AGI wins could look very stupid

Christopher KingMay 12, 2023, 4:34 PM
56 points
22 comments1 min readLW link