D/​acc AI Se­cu­rity Salon

Allison DuettmannOct 19, 2024, 10:17 PM
19 points
0 comments1 min readLW link

Who Should Have Been Killed, and Con­tains Neato? Who Else Could It Be, but that Villain Mag­neto!

Ace DelgadoOct 19, 2024, 8:39 PM
−16 points
0 comments1 min readLW link

If far-UV is so great, why isn’t it ev­ery­where?

Austin ChenOct 19, 2024, 6:56 PM
70 points
23 commentsLW link
(strainhardening.substack.com)

What if AGI was already ac­ci­den­tally cre­ated in 2019? [Fic­tional story]

Alice WanderlandOct 19, 2024, 9:17 AM
−3 points
2 comments15 min readLW link
(aliceandbobinwanderland.substack.com)

[Question] What ac­tual bad out­come has “ethics-based” RLHF AI Align­ment already pre­vented?

RokoOct 19, 2024, 6:11 AM
7 points
16 comments1 min readLW link

[Question] What’s a good book for a tech­ni­cally-minded 11-year old?

Martin SustrikOct 19, 2024, 6:05 AM
10 points
32 comments1 min readLW link

Method­ol­ogy: Con­ta­gious Beliefs

James Stephen BrownOct 19, 2024, 3:58 AM
3 points
0 comments7 min readLW link

AI Prej­u­dices: Prac­ti­cal Implications

PeterMcCluskeyOct 19, 2024, 2:19 AM
12 points
0 comments5 min readLW link
(bayesianinvestor.com)

Start an Up­per-Room UV In­stal­la­tion Com­pany?

jefftkOct 19, 2024, 2:00 AM
44 points
9 comments1 min readLW link
(www.jefftk.com)

How I’d like al­ign­ment to get done (as of 2024-10-18)

TristanTrimOct 18, 2024, 11:39 PM
11 points
4 comments4 min readLW link

Sab­o­tage Eval­u­a­tions for Fron­tier Models

Oct 18, 2024, 10:33 PM
95 points
56 comments6 min readLW link
(assets.anthropic.com)

D&D Sci Coli­seum: Arena of Data

aphyerOct 18, 2024, 10:02 PM
41 points
23 comments4 min readLW link

the Day­di­ca­tion technique

chaosmageOct 18, 2024, 9:47 PM
29 points
0 comments2 min readLW link

[Linkpost] Hawk­ish na­tion­al­ism vs in­ter­na­tional AI power and benefit sharing

Oct 18, 2024, 6:13 PM
7 points
5 comments1 min readLW link
(nacicankaya.substack.com)

LLM Psy­cho­met­rics and Prompt-In­duced Psychopathy

Korbinian K.Oct 18, 2024, 6:11 PM
12 points
2 comments10 min readLW link

A short pro­ject on Mamba: grokking & interpretability

Alejandro TlaieOct 18, 2024, 4:59 PM
21 points
0 comments6 min readLW link

LLMs can learn about them­selves by introspection

Oct 18, 2024, 4:12 PM
102 points
38 comments9 min readLW link

[Question] Are there more than 12 paths to Su­per­in­tel­li­gence?

p4rziv4lOct 18, 2024, 4:05 PM
−3 points
0 comments1 min readLW link

Low Prob­a­bil­ity Es­ti­ma­tion in Lan­guage Models

Gabriel WuOct 18, 2024, 3:50 PM
50 points
0 comments10 min readLW link
(www.alignment.org)

The Mys­te­ri­ous Trump Buy­ers on Polymarket

AnnapurnaOct 18, 2024, 1:26 PM
52 points
10 comments2 min readLW link
(jorgevelez.substack.com)

On In­ten­tion­al­ity, or: Towards a More In­clu­sive Con­cept of Lying

Cornelius DybdahlOct 18, 2024, 10:37 AM
8 points
0 comments4 min readLW link

Species as Canon­i­cal Refer­ents of Su­per-Organisms

Yudhister KumarOct 18, 2024, 7:49 AM
15 points
8 comments2 min readLW link
(www.yudhister.me)

NAO Up­dates, Fall 2024

jefftkOct 18, 2024, 12:00 AM
32 points
2 commentsLW link
(naobservatory.org)

You’re Play­ing a Rough Game

jefftkOct 17, 2024, 7:20 PM
25 points
2 comments2 min readLW link
(www.jefftk.com)

P=NP

OnePolynomialOct 17, 2024, 5:56 PM
−25 points
0 comments8 min readLW link

Fac­tor­ing P(doom) into a bayesian network

Joseph GardiOct 17, 2024, 5:55 PM
1 point
0 comments1 min readLW link

un­der­stand­ing bureaucracy

dhruvmethiOct 17, 2024, 5:55 PM
1 point
2 comments8 min readLW link

AI #86: Just Think of the Potential

ZviOct 17, 2024, 3:10 PM
58 points
8 comments57 min readLW link
(thezvi.wordpress.com)

Con­crete benefits of mak­ing predictions

Oct 17, 2024, 2:23 PM
35 points
5 comments6 min readLW link
(fatebook.io)

Arith­metic is an un­der­rated world-mod­el­ing technology

dynomightOct 17, 2024, 2:00 PM
152 points
33 comments6 min readLW link
(dynomight.net)

The Com­pu­ta­tional Com­plex­ity of Cir­cuit Dis­cov­ery for In­ner Interpretability

Bogdan Ionut CirsteaOct 17, 2024, 1:18 PM
11 points
2 comments1 min readLW link
(arxiv.org)

[Question] is there a big dic­tio­nary some­where with all your jar­gon and acronyms and what­not?

KvmanThinkingOct 17, 2024, 11:30 AM
4 points
7 comments1 min readLW link

[Question] Is there a known method to find oth­ers who came across the same po­ten­tial in­fo­haz­ard with­out spoiling it to the pub­lic?

hiveOct 17, 2024, 10:47 AM
4 points
6 comments1 min readLW link

It is time to start war gam­ing for AGI

yanni kyriacosOct 17, 2024, 5:14 AM
4 points
1 comment1 min readLW link

[Question] Re­in­force­ment Learn­ing: Essen­tial Step Towards AGI or Ir­rele­vant?

DoubleOct 17, 2024, 3:37 AM
1 point
0 comments1 min readLW link

[Question] En­deav­orOTC le­git?

FinalFormal2Oct 17, 2024, 1:33 AM
3 points
0 comments1 min readLW link

The Cog­ni­tive Boot­camp Agreement

RaemonOct 16, 2024, 11:24 PM
36 points
0 comments8 min readLW link

Bit­ter les­sons about lu­cid dreaming

avturchinOct 16, 2024, 9:27 PM
77 points
62 comments2 min readLW link

Towards Quan­ti­ta­tive AI Risk Management

Oct 16, 2024, 7:26 PM
28 points
1 comment6 min readLW link

Why Academia is Mostly Not Truth-Seeking

Zero ContradictionsOct 16, 2024, 7:14 PM
−7 points
6 comments1 min readLW link
(thewaywardaxolotl.blogspot.com)

Launch­ing Ad­ja­cent News

Lucas KohorstOct 16, 2024, 5:58 PM
24 points
0 comments4 min readLW link

[Question] In­ter­est in Leet­code, but for Ra­tion­al­ity?

Gregory Oct 16, 2024, 5:54 PM
74 points
20 comments2 min readLW link

Re­quest for ad­vice: Re­search for Con­ver­sa­tional Game The­ory for LLMs

Rome ViharoOct 16, 2024, 5:53 PM
10 points
0 comments1 min readLW link

Why hu­mans won’t con­trol su­per­hu­man AIs.

Spiritus DeiOct 16, 2024, 4:48 PM
−11 points
1 comment6 min readLW link

Against em­pa­thy-by-default

Steven ByrnesOct 16, 2024, 4:38 PM
60 points
24 comments7 min readLW link

can­cer rates af­ter gene therapy

bhauthOct 16, 2024, 3:32 PM
53 points
2 comments3 min readLW link
(bhauth.com)

Monthly Roundup #23: Oc­to­ber 2024

ZviOct 16, 2024, 1:50 PM
39 points
13 comments50 min readLW link
(thezvi.wordpress.com)

[Question] Change My Mind: Thirders in “Sleep­ing Beauty” are Just Do­ing Episte­mol­ogy Wrong

DragonGodOct 16, 2024, 10:20 AM
8 points
67 comments6 min readLW link

[Question] After up­load­ing your con­scious­ness...

Jinge WangOct 16, 2024, 3:52 AM
−2 points
0 comments1 min readLW link

The ELYSIUM Pro­posal - Ex­trap­o­lated voLi­tions Yield­ing Separate In­di­vi­d­u­al­ized Utopias for Mankind

RokoOct 16, 2024, 1:24 AM
9 points
18 comments1 min readLW link
(transhumanaxiology.substack.com)