[Question] Op­tion Space Nomenclature

SilverFlameDec 8, 2023, 11:14 PM
1 point
0 comments1 min readLW link

“Model UN Solu­tions”

Arjun PanicksseryDec 8, 2023, 11:06 PM
36 points
5 comments1 min readLW link
(open.substack.com)

Speed ar­gu­ments against schem­ing (Sec­tion 4.4-4.7 of “Schem­ing AIs”)

Joe CarlsmithDec 8, 2023, 9:09 PM
9 points
0 comments15 min readLW link

Model­ing in­cen­tives at scale us­ing LLMs

Dec 8, 2023, 6:46 PM
7 points
3 comments13 min readLW link

Re­fusal mechanisms: ini­tial ex­per­i­ments with Llama-2-7b-chat

Dec 8, 2023, 5:08 PM
82 points
7 comments7 min readLW link

Colour ver­sus Shape Goal Mis­gen­er­al­iza­tion in Re­in­force­ment Learn­ing: A Case Study

Karolis JucysDec 8, 2023, 1:18 PM
16 points
1 comment4 min readLW link
(arxiv.org)

What I Would Do If I Were Work­ing On AI Governance

johnswentworthDec 8, 2023, 6:43 AM
110 points
32 comments10 min readLW link

Whither Pri­son Abo­li­tion?

MadHatterDec 8, 2023, 5:27 AM
−7 points
0 comments16 min readLW link
(bittertruths.substack.com)

Class con­scious­ness for those against the class system

TekhneMakreDec 8, 2023, 1:02 AM
11 points
9 comments1 min readLW link

Build­ing self­less agents to avoid in­stru­men­tal self-preser­va­tion.

blalloDec 7, 2023, 6:59 PM
14 points
2 comments6 min readLW link

Does Chat-GPT dis­play ‘Scope Insen­si­tivity’?

callumDec 7, 2023, 6:58 PM
11 points
0 comments3 min readLW link

LLM keys—A Pro­posal of a Solu­tion to Prompt In­jec­tion Attacks

Peter HroššoDec 7, 2023, 5:36 PM
1 point
2 comments1 min readLW link

Meetup Tip: Heart­beat Messages

ScrewtapeDec 7, 2023, 5:18 PM
69 points
4 comments3 min readLW link

[Valence se­ries] 2. Valence & Normativity

Steven ByrnesDec 7, 2023, 4:43 PM
88 points
7 comments28 min readLW link1 review

AISN #27: Defen­sive Ac­cel­er­a­tionism, A Ret­ro­spec­tive On The OpenAI Board Saga, And A New AI Bill From Se­na­tors Thune And Klobuchar

Dec 7, 2023, 3:59 PM
13 points
0 comments6 min readLW link
(newsletter.safe.ai)

AI #41: Bring in the Other Gemini

ZviDec 7, 2023, 3:10 PM
46 points
16 comments52 min readLW link
(thezvi.wordpress.com)

Sim­plic­ity ar­gu­ments for schem­ing (Sec­tion 4.3 of “Schem­ing AIs”)

Joe CarlsmithDec 7, 2023, 3:05 PM
10 points
1 comment19 min readLW link

Gem­ini 1.0

ZviDec 7, 2023, 2:40 PM
50 points
7 comments9 min readLW link
(thezvi.wordpress.com)

Ran­dom Mus­ings on The­ory of Im­pact for Ac­ti­va­tion Vectors

Chris_LeongDec 7, 2023, 1:07 PM
8 points
0 comments1 min readLW link

[Question] Is AlphaGo ac­tu­ally a con­se­quen­tial­ist util­ity max­i­mizer?

faul_snameDec 7, 2023, 12:41 PM
36 points
8 comments3 min readLW link

(Re­port) Eval­u­at­ing Taiwan’s Tac­tics to Safe­guard its Semi­con­duc­tor As­sets Against a Chi­nese Invasion

GauraventhDec 7, 2023, 11:50 AM
14 points
5 comments22 min readLW link
(bristolaisafety.org)

Would AIs trapped in the Me­ta­verse pine to en­ter the real world and would the ram­ifi­ca­tions cause trou­ble?

ProfessorFalkenDec 7, 2023, 10:17 AM
−2 points
1 comment1 min readLW link

The GiveWiki’s Top Picks in AI Safety for the Giv­ing Sea­son of 2023

Dawn DrescherDec 7, 2023, 9:23 AM
4 points
10 commentsLW link
(impactmarkets.substack.com)

Lan­guage Model Me­moriza­tion, Copy­right Law, and Con­di­tional Pre­train­ing Alignment

RogerDearnaleyDec 7, 2023, 6:14 AM
9 points
0 comments11 min readLW link

Reflec­tive con­sis­tency, ran­dom­ized de­ci­sions, and the dan­gers of un­re­al­is­tic thought experiments

Radford NealDec 7, 2023, 3:33 AM
34 points
25 comments6 min readLW link

[Question] For fun: How long can you hold your breath?

exanovaDec 6, 2023, 11:36 PM
1 point
7 comments1 min readLW link

Math­e­mat­ics As Physics

Nox MLDec 6, 2023, 10:27 PM
−2 points
10 comments5 min readLW link

The count­ing ar­gu­ment for schem­ing (Sec­tions 4.1 and 4.2 of “Schem­ing AIs”)

Joe CarlsmithDec 6, 2023, 7:28 PM
10 points
0 comments10 min readLW link

On Trust

johnswentworthDec 6, 2023, 7:19 PM
42 points
26 comments4 min readLW link

Origi­nal­ity vs. Correctness

Dec 6, 2023, 6:51 PM
60 points
17 comments25 min readLW link

Pro­posal for im­prov­ing the global on­line dis­course through per­son­al­ised com­ment or­der­ing on all websites

Roman LeventovDec 6, 2023, 6:51 PM
35 points
21 comments6 min readLW link

Google Gem­ini Announced

Jacob G-WDec 6, 2023, 4:14 PM
54 points
22 comments1 min readLW link
(blog.google)

Based Beff Je­zos and the Accelerationists

ZviDec 6, 2023, 4:00 PM
90 points
29 comments12 min readLW link
(thezvi.wordpress.com)

Bucket Bri­gade: Likely End-of-Life

jefftkDec 6, 2023, 3:30 PM
16 points
1 comment1 min readLW link
(www.jefftk.com)

Why Yud­kowsky is wrong about “co­va­lently bonded equiv­a­lents of biol­ogy”

titotalDec 6, 2023, 2:09 PM
44 points
41 commentsLW link
(open.substack.com)

Me­tac­u­lus Launches Chi­nese AI Chips Tour­na­ment, Sup­port­ing In­sti­tute for AI Policy and Strat­egy Research

ChristianWilliamsDec 6, 2023, 11:26 AM
10 points
1 commentLW link
(www.metaculus.com)

Min­i­mal Vi­able Par­adise: How do we get The Good Fu­ture(TM)?

Nathan YoungDec 6, 2023, 9:24 AM
9 points
0 comments7 min readLW link

An­throp­i­cal Para­doxes are Para­doxes of Prob­a­bil­ity Theory

Ape in the coatDec 6, 2023, 8:16 AM
55 points
18 comments5 min readLW link

Digi­tal hu­mans vs merge with AI? Same or differ­ent?

Dec 6, 2023, 4:56 AM
21 points
11 comments7 min readLW link

EA In­fras­truc­ture Fund’s Plan to Fo­cus on Prin­ci­ples-First EA

LinchDec 6, 2023, 3:24 AM
27 points
0 commentsLW link

**In defence of He­len Toner, Adam D’An­gelo, and Tasha McCauley**

mrtreasureDec 6, 2023, 2:02 AM
25 points
3 comments9 min readLW link
(pastebin.com)

Some quick thoughts on “AI is easy to con­trol”

Mikhail SaminDec 6, 2023, 12:58 AM
15 points
10 comments7 min readLW link

ACX Cor­val­lis, OR

kenakoferDec 6, 2023, 12:23 AM
1 point
0 comments1 min readLW link

Multi­na­tional cor­po­ra­tions as op­ti­miz­ers: a case for reach­ing across the aisle

sudo-nymDec 6, 2023, 12:14 AM
9 points
10 comments1 min readLW link

[Question] How do you feel about LessWrong these days? [Open feed­back thread]

Bird ConceptDec 5, 2023, 8:54 PM
108 points
285 comments1 min readLW link

Cri­tique-a-Thon of AI Align­ment Plans

IknownothingDec 5, 2023, 8:50 PM
12 points
3 comments1 min readLW link

Ar­gu­ments for/​against schem­ing that fo­cus on the path SGD takes (Sec­tion 3 of “Schem­ing AIs”)

Joe CarlsmithDec 5, 2023, 6:48 PM
10 points
0 comments23 min readLW link

In defence of He­len Toner, Adam D’An­gelo, and Tasha McCauley (OpenAI post)

mrtreasureDec 5, 2023, 6:40 PM
6 points
2 comments1 min readLW link
(pastebin.com)

Study­ing The Alien Mind

Dec 5, 2023, 5:27 PM
80 points
10 comments15 min readLW link

Deep For­get­ting & Un­learn­ing for Safely-Scoped LLMs

scasperDec 5, 2023, 4:48 PM
126 points
30 comments13 min readLW link