EA Ve­gan Ad­vo­cacy is not truth­seek­ing, and it’s ev­ery­one’s problem

Elizabeth28 Sep 2023 23:30 UTC
319 points
247 comments22 min readLW link
(acesounderglass.com)

Com­pet­i­tive, Co­op­er­a­tive, and Cohabitive

Screwtape28 Sep 2023 23:25 UTC
46 points
12 comments4 min readLW link

The Com­ing Wave

PeterMcCluskey28 Sep 2023 22:59 UTC
25 points
1 comment6 min readLW link
(bayesianinvestor.com)

High-level in­ter­pretabil­ity: de­tect­ing an AI’s objectives

28 Sep 2023 19:30 UTC
69 points
4 comments21 min readLW link

How to Catch an AI Liar: Lie De­tec­tion in Black-Box LLMs by Ask­ing Un­re­lated Questions

28 Sep 2023 18:53 UTC
183 points
37 comments3 min readLW link

Re­spon­si­ble scal­ing policy TLDR

lukehmiles28 Sep 2023 18:51 UTC
9 points
0 comments1 min readLW link

Align­ment Work­shop talks

Richard_Ngo28 Sep 2023 18:26 UTC
37 points
1 comment1 min readLW link
(www.alignment-workshop.com)

My Cur­rent Thoughts on the AI Strate­gic Landscape

Jeffrey Heninger28 Sep 2023 17:59 UTC
11 points
28 comments14 min readLW link

My Ar­ro­gant Plan for Alignment

MrArrogant28 Sep 2023 17:51 UTC
2 points
6 comments6 min readLW link

Dis­cur­sive Com­pe­tence in ChatGPT, Part 2: Me­mory for Texts

Bill Benzon28 Sep 2023 16:34 UTC
1 point
0 comments3 min readLW link

Differ­ent views of al­ign­ment have differ­ent con­se­quences for im­perfect methods

Stuart_Armstrong28 Sep 2023 16:31 UTC
31 points
0 comments1 min readLW link

AI #31: It Can Do What Now?

Zvi28 Sep 2023 16:00 UTC
90 points
6 comments40 min readLW link
(thezvi.wordpress.com)

The point of a game is not to win, and you shouldn’t even pre­tend that it is

mako yass28 Sep 2023 15:54 UTC
43 points
27 comments4 min readLW link
(makopool.com)

Co­hab­itive Games so Far

mako yass28 Sep 2023 15:41 UTC
102 points
118 comments19 min readLW link
(makopool.com)

Wob­bly Table The­o­rem in Practice

Morpheus28 Sep 2023 14:33 UTC
23 points
0 comments2 min readLW link

Weigh­ing An­i­mal Worth

jefftk28 Sep 2023 13:50 UTC
25 points
11 comments2 min readLW link
(www.jefftk.com)

ARC Evals: Re­spon­si­ble Scal­ing Policies

Zach Stein-Perlman28 Sep 2023 4:30 UTC
40 points
9 comments2 min readLW link
(evals.alignment.org)

Petrov Day Ret­ro­spec­tive, 2023 (re: the most im­por­tant virtue of Petrov Day & unilat­er­ally pro­mot­ing it)

Ruby28 Sep 2023 2:48 UTC
64 points
73 comments6 min readLW link

Jimmy Ap­ples, source of the ru­mor that OpenAI has achieved AGI in­ter­nally, is a cred­ible in­sider.

Jorterder28 Sep 2023 1:20 UTC
−6 points
2 comments1 min readLW link
(twitter.com)

In­ves­ti­gat­ing the ru­mors of OpenAI achiev­ing AGI

Jorterder28 Sep 2023 1:17 UTC
−4 points
1 comment1 min readLW link

Alibaba Group re­leases Qwen, 14B pa­ram­e­ter LLM

nikola28 Sep 2023 0:12 UTC
5 points
1 comment1 min readLW link
(qianwen-res.oss-cn-beijing.aliyuncs.com)

Me­tac­u­lus Launches 2023/​2024 FluSight Challenge Sup­port­ing CDC, $5K in Prizes

ChristianWilliams27 Sep 2023 21:35 UTC
5 points
0 comments1 min readLW link
(www.metaculus.com)

Pro­jects I would like to see (pos­si­bly at AI Safety Camp)

Linda Linsefors27 Sep 2023 21:27 UTC
22 points
12 comments4 min readLW link

Towards Bet­ter Mile­stones for Mon­i­tor­ing AI Capabilities

snewman27 Sep 2023 21:18 UTC
11 points
0 comments14 min readLW link

[Question] Is Bjorn Lom­borg roughly right about cli­mate change policy?

yhoiseth27 Sep 2023 20:06 UTC
29 points
13 comments2 min readLW link
(www.sciencedirect.com)

Com­mon­sense Good, Creative Good

jefftk27 Sep 2023 19:50 UTC
44 points
11 comments3 min readLW link
(www.jefftk.com)

Petrov Day [Spoiler Warn­ing]

lsusr27 Sep 2023 19:20 UTC
6 points
6 comments1 min readLW link

The Hid­den Com­plex­ity of Wishes—The Animation

Writer27 Sep 2023 17:59 UTC
32 points
0 comments1 min readLW link
(youtu.be)

MMLU’s Mo­ral Sce­nar­ios Bench­mark Doesn’t Mea­sure What You Think it Measures

corey morris27 Sep 2023 17:54 UTC
14 points
2 comments4 min readLW link
(medium.com)

[Question] What’s your stan­dard for good work perfor­mance?

Chi Nguyen27 Sep 2023 16:58 UTC
30 points
3 comments1 min readLW link

The Role of Groups in the Pro­gres­sion of Hu­man Understanding

Chris_Leong27 Sep 2023 15:09 UTC
11 points
0 comments2 min readLW link

The Great Disembedding

rogersbacon27 Sep 2023 14:53 UTC
16 points
4 comments16 min readLW link
(www.secretorum.life)

[Question] how do short-timelin­ers rea­son about the differ­ences be­tween brain and AI?

JavierCC27 Sep 2023 8:13 UTC
2 points
11 comments1 min readLW link

[Question] Is there a widely ac­cepted met­ric for ‘gen­uine­ness’ in in­ter­per­sonal com­mu­ni­ca­tion?

M. Y. Zuo27 Sep 2023 5:30 UTC
6 points
3 comments1 min readLW link

Bari­a­tric surgery seems like a no-brainer for most mor­bidly obese people

lc27 Sep 2023 1:05 UTC
11 points
12 comments3 min readLW link

Ja­cob on the Precipice

Richard_Ngo26 Sep 2023 21:16 UTC
42 points
7 comments11 min readLW link
(narrativeark.substack.com)

Text Posts from the Kids Group: 2022

jefftk26 Sep 2023 20:40 UTC
33 points
2 comments7 min readLW link
(www.jefftk.com)

GPT-4 for per­sonal pro­duc­tivity: on­line dis­trac­tion blocker

Sergii26 Sep 2023 17:41 UTC
62 points
11 comments2 min readLW link
(grgv.xyz)

ARENA 2.0 - Im­pact Report

CallumMcDougall26 Sep 2023 17:13 UTC
33 points
5 comments13 min readLW link

Mechanis­tic In­ter­pretabil­ity Read­ing group

26 Sep 2023 16:26 UTC
8 points
0 comments1 min readLW link

An­nounc­ing the CNN In­ter­pretabil­ity Competition

scasper26 Sep 2023 16:21 UTC
22 points
0 comments4 min readLW link

Mak­ing AIs less likely to be spiteful

26 Sep 2023 14:12 UTC
89 points
2 comments10 min readLW link

[Linkpost] Mark Zucker­berg con­fronted about Meta’s Llama 2 AI’s abil­ity to give users de­tailed guidance on mak­ing an­thrax—Busi­ness Insider

mic26 Sep 2023 12:05 UTC
18 points
11 comments2 min readLW link
(www.businessinsider.com)

En­forc­ing Far-Fu­ture Con­tracts for Governments

FCCC26 Sep 2023 4:26 UTC
−7 points
49 comments3 min readLW link

Car­i­oca Petrov Day

Giskard26 Sep 2023 0:30 UTC
1 point
0 comments1 min readLW link

[Question] A few Align­ment ques­tions: util­ity op­ti­miz­ers, SLT, sharp left turn and identifiability

Igor Timofeev26 Sep 2023 0:27 UTC
5 points
1 comment2 min readLW link

Im­pact sto­ries for model in­ter­nals: an ex­er­cise for in­ter­pretabil­ity researchers

jenny25 Sep 2023 23:15 UTC
29 points
3 comments7 min readLW link

Au­to­nomic Sanity

Sable25 Sep 2023 22:37 UTC
20 points
9 comments4 min readLW link
(affablyevil.substack.com)

[Question] What is wrong with this “util­ity switch but­ton prob­lem” ap­proach?

Donald Hobson25 Sep 2023 21:36 UTC
14 points
3 comments1 min readLW link

You should just smile at strangers a lot

chaosmage25 Sep 2023 20:12 UTC
13 points
10 comments1 min readLW link