Fac­ing Up to the Prob­lem of Consciousness

Bruce W. Lee10 Dec 2023 23:31 UTC
8 points
0 comments3 min readLW link

Deeply Cover Car Crashes?

jefftk10 Dec 2023 22:20 UTC
36 points
31 comments1 min readLW link
(www.jefftk.com)

Prin­ci­ples For Product Li­a­bil­ity (With Ap­pli­ca­tion To AI)

johnswentworth10 Dec 2023 21:27 UTC
37 points
55 comments10 min readLW link

[Question] What do you do to re­mem­ber and refer­ence the LessWrong posts that were most per­son­ally sig­nifi­cant to you, in terms of in­tel­lec­tual de­vel­op­ment or gen­eral use­ful­ness?

lillybaeum10 Dec 2023 17:52 UTC
5 points
7 comments1 min readLW link

[Question] Do web­sites and apps ac­tu­ally gen­er­ally get worse af­ter up­dates, or is it just an effect of the fear of change?

lillybaeum10 Dec 2023 17:26 UTC
33 points
34 comments2 min readLW link

How LDT helps re­duce the AI arms race

Tamsin Leake10 Dec 2023 16:21 UTC
70 points
13 comments4 min readLW link
(carado.moe)

Un­der­stand­ing Sub­jec­tive Probabilities

Isaac King10 Dec 2023 6:03 UTC
30 points
16 comments10 min readLW link

Send us ex­am­ple gnarly bugs

10 Dec 2023 5:23 UTC
77 points
10 comments2 min readLW link

Con­cep­tual co­her­ence for con­crete cat­e­gories in hu­mans and LLMs

Bill Benzon9 Dec 2023 23:49 UTC
13 points
1 comment2 min readLW link

2d ai-part­ners as a com­pre­hen­sive mo­ti­va­tion tool

AiresJL9 Dec 2023 21:59 UTC
3 points
0 comments1 min readLW link

Without—MicroFic­tion 250 words

Carissa Cassiel9 Dec 2023 21:49 UTC
19 points
1 comment1 min readLW link

Some nega­tive steganog­ra­phy results

Fabien Roger9 Dec 2023 20:22 UTC
55 points
5 comments2 min readLW link

Sum­ming up “Schem­ing AIs” (Sec­tion 5)

Joe Carlsmith9 Dec 2023 15:48 UTC
2 points
1 comment11 min readLW link

The Offense-Defense Balance Rarely Changes

Maxwell Tabarrok9 Dec 2023 15:21 UTC
75 points
23 comments3 min readLW link
(maximumprogress.substack.com)

A Philo­soph­i­cal Tautology

Nox ML9 Dec 2023 14:06 UTC
−2 points
45 comments2 min readLW link

Un­pick­ing Extinction

ukc100149 Dec 2023 9:15 UTC
34 points
10 comments10 min readLW link

Find­ing Sparse Lin­ear Con­nec­tions be­tween Fea­tures in LLMs

9 Dec 2023 2:27 UTC
68 points
5 comments10 min readLW link

[Question] Op­tion Space Nomenclature

SilverFlame8 Dec 2023 23:14 UTC
1 point
0 comments1 min readLW link

“Model UN Solu­tions”

Arjun Panickssery8 Dec 2023 23:06 UTC
36 points
5 comments1 min readLW link
(open.substack.com)

Speed ar­gu­ments against schem­ing (Sec­tion 4.4-4.7 of “Schem­ing AIs”)

Joe Carlsmith8 Dec 2023 21:09 UTC
9 points
0 comments15 min readLW link

Fore­act­ing agents

B Jacobs8 Dec 2023 19:57 UTC
4 points
0 comments13 min readLW link

Model­ing in­cen­tives at scale us­ing LLMs

8 Dec 2023 18:46 UTC
7 points
3 comments13 min readLW link

Re­fusal mechanisms: ini­tial ex­per­i­ments with Llama-2-7b-chat

8 Dec 2023 17:08 UTC
79 points
7 comments7 min readLW link

Colour ver­sus Shape Goal Mis­gen­er­al­iza­tion in Re­in­force­ment Learn­ing: A Case Study

Karolis Ramanauskas8 Dec 2023 13:18 UTC
13 points
1 comment4 min readLW link
(arxiv.org)

What I Would Do If I Were Work­ing On AI Governance

johnswentworth8 Dec 2023 6:43 UTC
109 points
32 comments10 min readLW link

Whither Pri­son Abo­li­tion?

MadHatter8 Dec 2023 5:27 UTC
−7 points
0 comments16 min readLW link
(bittertruths.substack.com)

Class con­scious­ness for those against the class system

TekhneMakre8 Dec 2023 1:02 UTC
10 points
7 comments1 min readLW link

Build­ing self­less agents to avoid in­stru­men­tal self-preser­va­tion.

blallo7 Dec 2023 18:59 UTC
14 points
2 comments6 min readLW link

Does Chat-GPT dis­play ‘Scope Insen­si­tivity’?

callum7 Dec 2023 18:58 UTC
11 points
0 comments3 min readLW link

LLM keys—A Pro­posal of a Solu­tion to Prompt In­jec­tion Attacks

Peter Hroššo7 Dec 2023 17:36 UTC
1 point
2 comments1 min readLW link

Meetup Tip: Heart­beat Messages

Screwtape7 Dec 2023 17:18 UTC
68 points
4 comments3 min readLW link

[Valence se­ries] 2. Valence & Normativity

Steven Byrnes7 Dec 2023 16:43 UTC
70 points
4 comments28 min readLW link

AISN #27: Defen­sive Ac­cel­er­a­tionism, A Ret­ro­spec­tive On The OpenAI Board Saga, And A New AI Bill From Se­na­tors Thune And Klobuchar

7 Dec 2023 15:59 UTC
13 points
0 comments6 min readLW link
(newsletter.safe.ai)

AI #41: Bring in the Other Gemini

Zvi7 Dec 2023 15:10 UTC
46 points
16 comments52 min readLW link
(thezvi.wordpress.com)

Sim­plic­ity ar­gu­ments for schem­ing (Sec­tion 4.3 of “Schem­ing AIs”)

Joe Carlsmith7 Dec 2023 15:05 UTC
10 points
1 comment19 min readLW link

Re­sults from the Tur­ing Sem­i­nar hackathon

7 Dec 2023 14:50 UTC
29 points
1 comment6 min readLW link

Gem­ini 1.0

Zvi7 Dec 2023 14:40 UTC
50 points
7 comments9 min readLW link
(thezvi.wordpress.com)

Ran­dom Mus­ings on The­ory of Im­pact for Ac­ti­va­tion Vectors

Chris_Leong7 Dec 2023 13:07 UTC
8 points
0 comments1 min readLW link

[Question] Is AlphaGo ac­tu­ally a con­se­quen­tial­ist util­ity max­i­mizer?

faul_sname7 Dec 2023 12:41 UTC
33 points
8 comments3 min readLW link

(Re­port) Eval­u­at­ing Taiwan’s Tac­tics to Safe­guard its Semi­con­duc­tor As­sets Against a Chi­nese Invasion

Gauraventh7 Dec 2023 11:50 UTC
16 points
5 comments22 min readLW link
(bristolaisafety.org)

Would AIs trapped in the Me­ta­verse pine to en­ter the real world and would the ram­ifi­ca­tions cause trou­ble?

ProfessorFalken7 Dec 2023 10:17 UTC
−2 points
1 comment1 min readLW link

The GiveWiki’s Top Picks in AI Safety for the Giv­ing Sea­son of 2023

Dawn Drescher7 Dec 2023 9:23 UTC
4 points
10 comments1 min readLW link
(impactmarkets.substack.com)

Lan­guage Model Me­moriza­tion, Copy­right Law, and Con­di­tional Pre­train­ing Alignment

RogerDearnaley7 Dec 2023 6:14 UTC
3 points
0 comments11 min readLW link

Reflec­tive con­sis­tency, ran­dom­ized de­ci­sions, and the dan­gers of un­re­al­is­tic thought experiments

Radford Neal7 Dec 2023 3:33 UTC
34 points
21 comments6 min readLW link

[Question] For fun: How long can you hold your breath?

exanova6 Dec 2023 23:36 UTC
1 point
7 comments1 min readLW link

Math­e­mat­ics As Physics

Nox ML6 Dec 2023 22:27 UTC
−2 points
10 comments5 min readLW link

The count­ing ar­gu­ment for schem­ing (Sec­tions 4.1 and 4.2 of “Schem­ing AIs”)

Joe Carlsmith6 Dec 2023 19:28 UTC
10 points
0 comments10 min readLW link

On Trust

johnswentworth6 Dec 2023 19:19 UTC
44 points
24 comments4 min readLW link

Origi­nal­ity vs. Correctness

6 Dec 2023 18:51 UTC
60 points
16 comments25 min readLW link

Pro­posal for im­prov­ing the global on­line dis­course through per­son­al­ised com­ment or­der­ing on all websites

Roman Leventov6 Dec 2023 18:51 UTC
35 points
21 comments6 min readLW link