Se­condary Risk Markets

Vaniver11 Dec 2023 21:52 UTC
35 points
4 comments4 min readLW link

Has any­one ex­per­i­mented with Do­drio, a tool for ex­plor­ing trans­former mod­els through in­ter­ac­tive vi­su­al­iza­tion?

Bill Benzon11 Dec 2023 20:34 UTC
4 points
0 comments1 min readLW link

[Valence se­ries] 3. Valence & Beliefs

Steven Byrnes11 Dec 2023 20:21 UTC
63 points
6 comments21 min readLW link

[Question] Am I eth­i­cally obli­gated to ex­tend the life of my dog with life-ex­ten­sion treat­ments about to hit the mar­ket?

TrudosKudos11 Dec 2023 19:41 UTC
−3 points
1 comment1 min readLW link

Ad­ver­sar­ial Ro­bust­ness Could Help Prevent Catas­trophic Misuse

aogara11 Dec 2023 19:12 UTC
30 points
18 comments9 min readLW link

The Con­scious­ness Box

GradualImprovement11 Dec 2023 16:45 UTC
33 points
22 comments4 min readLW link

Em­piri­cal work that might shed light on schem­ing (Sec­tion 6 of “Schem­ing AIs”)

Joe Carlsmith11 Dec 2023 16:30 UTC
8 points
0 comments21 min readLW link

Into AI Safety: Epi­sode 3

jacobhaimes11 Dec 2023 16:30 UTC
6 points
0 comments1 min readLW link
(into-ai-safety.github.io)

Im­plic­itly Typed C

jefftk11 Dec 2023 16:10 UTC
16 points
0 comments1 min readLW link
(www.jefftk.com)

37C3 Hacker x Ra­tion­al­ist Meetup

11 Dec 2023 16:02 UTC
5 points
5 comments1 min readLW link

re: Yud­kowsky on biolog­i­cal materials

bhauth11 Dec 2023 13:28 UTC
179 points
30 comments5 min readLW link

Ideoculture

elv11 Dec 2023 10:29 UTC
8 points
2 comments6 min readLW link

Quick thoughts on the im­pli­ca­tions of multi-agent views of mind on AI takeover

Kaj_Sotala11 Dec 2023 6:34 UTC
40 points
14 comments4 min readLW link

Au­dit­ing failures vs con­cen­trated failures

11 Dec 2023 2:47 UTC
44 points
0 comments7 min readLW link

Fac­ing Up to the Prob­lem of Consciousness

Bruce W. Lee10 Dec 2023 23:31 UTC
8 points
0 comments3 min readLW link

Deeply Cover Car Crashes?

jefftk10 Dec 2023 22:20 UTC
36 points
31 comments1 min readLW link
(www.jefftk.com)

Prin­ci­ples For Product Li­a­bil­ity (With Ap­pli­ca­tion To AI)

johnswentworth10 Dec 2023 21:27 UTC
37 points
55 comments10 min readLW link

[Question] What do you do to re­mem­ber and refer­ence the LessWrong posts that were most per­son­ally sig­nifi­cant to you, in terms of in­tel­lec­tual de­vel­op­ment or gen­eral use­ful­ness?

lillybaeum10 Dec 2023 17:52 UTC
5 points
7 comments1 min readLW link

[Question] Do web­sites and apps ac­tu­ally gen­er­ally get worse af­ter up­dates, or is it just an effect of the fear of change?

lillybaeum10 Dec 2023 17:26 UTC
33 points
34 comments2 min readLW link

How LDT helps re­duce the AI arms race

Tamsin Leake10 Dec 2023 16:21 UTC
70 points
13 comments4 min readLW link
(carado.moe)

Un­der­stand­ing Sub­jec­tive Probabilities

Isaac King10 Dec 2023 6:03 UTC
30 points
16 comments10 min readLW link

Send us ex­am­ple gnarly bugs

10 Dec 2023 5:23 UTC
77 points
10 comments2 min readLW link

Con­cep­tual co­her­ence for con­crete cat­e­gories in hu­mans and LLMs

Bill Benzon9 Dec 2023 23:49 UTC
13 points
1 comment2 min readLW link

2d ai-part­ners as a com­pre­hen­sive mo­ti­va­tion tool

AiresJL9 Dec 2023 21:59 UTC
3 points
0 comments1 min readLW link

Without—MicroFic­tion 250 words

Carissa Cassiel9 Dec 2023 21:49 UTC
19 points
1 comment1 min readLW link

Some nega­tive steganog­ra­phy results

Fabien Roger9 Dec 2023 20:22 UTC
55 points
5 comments2 min readLW link

Sum­ming up “Schem­ing AIs” (Sec­tion 5)

Joe Carlsmith9 Dec 2023 15:48 UTC
2 points
0 comments11 min readLW link

The Offense-Defense Balance Rarely Changes

Maxwell Tabarrok9 Dec 2023 15:21 UTC
75 points
23 comments3 min readLW link
(maximumprogress.substack.com)

A Philo­soph­i­cal Tautology

Nox ML9 Dec 2023 14:06 UTC
−2 points
45 comments2 min readLW link

Un­pick­ing Extinction

ukc100149 Dec 2023 9:15 UTC
34 points
10 comments10 min readLW link

Find­ing Sparse Lin­ear Con­nec­tions be­tween Fea­tures in LLMs

9 Dec 2023 2:27 UTC
68 points
5 comments10 min readLW link

[Question] Op­tion Space Nomenclature

SilverFlame8 Dec 2023 23:14 UTC
1 point
0 comments1 min readLW link

“Model UN Solu­tions”

Arjun Panickssery8 Dec 2023 23:06 UTC
36 points
5 comments1 min readLW link
(open.substack.com)

Speed ar­gu­ments against schem­ing (Sec­tion 4.4-4.7 of “Schem­ing AIs”)

Joe Carlsmith8 Dec 2023 21:09 UTC
9 points
0 comments15 min readLW link

Fore­act­ing agents

B Jacobs8 Dec 2023 19:57 UTC
4 points
0 comments13 min readLW link

Model­ing in­cen­tives at scale us­ing LLMs

8 Dec 2023 18:46 UTC
7 points
3 comments13 min readLW link

Re­fusal mechanisms: ini­tial ex­per­i­ments with Llama-2-7b-chat

8 Dec 2023 17:08 UTC
79 points
7 comments7 min readLW link

Colour ver­sus Shape Goal Mis­gen­er­al­iza­tion in Re­in­force­ment Learn­ing: A Case Study

Karolis Ramanauskas8 Dec 2023 13:18 UTC
13 points
1 comment4 min readLW link
(arxiv.org)

What I Would Do If I Were Work­ing On AI Governance

johnswentworth8 Dec 2023 6:43 UTC
109 points
32 comments10 min readLW link

Whither Pri­son Abo­li­tion?

MadHatter8 Dec 2023 5:27 UTC
−7 points
0 comments16 min readLW link
(bittertruths.substack.com)

Class con­scious­ness for those against the class system

TekhneMakre8 Dec 2023 1:02 UTC
10 points
7 comments1 min readLW link

Build­ing self­less agents to avoid in­stru­men­tal self-preser­va­tion.

blallo7 Dec 2023 18:59 UTC
14 points
2 comments6 min readLW link

Does Chat-GPT dis­play ‘Scope Insen­si­tivity’?

callum7 Dec 2023 18:58 UTC
11 points
0 comments3 min readLW link

LLM keys—A Pro­posal of a Solu­tion to Prompt In­jec­tion Attacks

Peter Hroššo7 Dec 2023 17:36 UTC
1 point
2 comments1 min readLW link

Meetup Tip: Heart­beat Messages

Screwtape7 Dec 2023 17:18 UTC
68 points
4 comments3 min readLW link

[Valence se­ries] 2. Valence & Normativity

Steven Byrnes7 Dec 2023 16:43 UTC
70 points
4 comments28 min readLW link

AISN #27: Defen­sive Ac­cel­er­a­tionism, A Ret­ro­spec­tive On The OpenAI Board Saga, And A New AI Bill From Se­na­tors Thune And Klobuchar

7 Dec 2023 15:59 UTC
13 points
0 comments6 min readLW link
(newsletter.safe.ai)

AI #41: Bring in the Other Gemini

Zvi7 Dec 2023 15:10 UTC
46 points
16 comments52 min readLW link
(thezvi.wordpress.com)

Sim­plic­ity ar­gu­ments for schem­ing (Sec­tion 4.3 of “Schem­ing AIs”)

Joe Carlsmith7 Dec 2023 15:05 UTC
10 points
1 comment19 min readLW link

Re­sults from the Tur­ing Sem­i­nar hackathon

7 Dec 2023 14:50 UTC
29 points
1 comment6 min readLW link