6 (Po­ten­tial) Mis­con­cep­tions about AI Intellectuals

ozziegooenFeb 14, 2025, 11:51 PM
18 points
11 commentsLW link

[Question] Should Open Philan­thropy Make an Offer to Buy OpenAI?

mrtreasureFeb 14, 2025, 11:18 PM
25 points
1 comment1 min readLW link

A com­pu­ta­tional no-co­in­ci­dence principle

Eric NeymanFeb 14, 2025, 9:39 PM
148 points
38 comments6 min readLW link
(www.alignment.org)

Hope­ful hy­poth­e­sis, the Per­sona Juke­box.

Donald HobsonFeb 14, 2025, 7:24 PM
11 points
4 comments3 min readLW link

In­tro­duc­tion to Ex­pected Value Fanaticism

Petra KosonenFeb 14, 2025, 7:05 PM
9 points
8 comments1 min readLW link
(utilitarianism.net)

In­trin­sic Di­men­sion of Prompts in LLMs

Karthik ViswanathanFeb 14, 2025, 7:02 PM
3 points
0 comments4 min readLW link

Ob­jec­tive Real­ism: A Per­spec­tive Beyond Hu­man Constructs

ApatheosFeb 14, 2025, 7:02 PM
−12 points
1 comment2 min readLW link

A short course on AGI safety from the GDM Align­ment team

Feb 14, 2025, 3:43 PM
103 points
2 comments1 min readLW link
(deepmindsafetyresearch.medium.com)

The Mask Comes Off: A Trio of Tales

ZviFeb 14, 2025, 3:30 PM
81 points
1 comment13 min readLW link
(thezvi.wordpress.com)

Celtic Knots on a hex lattice

BenFeb 14, 2025, 2:29 PM
27 points
10 comments2 min readLW link

Bi­modal AI Beliefs

Adam TrainFeb 14, 2025, 6:45 AM
6 points
1 comment4 min readLW link

Re­sponse to the US Govt’s Re­quest for In­for­ma­tion Con­cern­ing Its AI Ac­tion Plan

Davey MorseFeb 14, 2025, 6:14 AM
4 points
0 comments3 min readLW link

What is a cir­cuit? [in in­ter­pretabil­ity]

Yudhister KumarFeb 14, 2025, 4:40 AM
23 points
1 comment1 min readLW link

Sys­tem­atic Sand­bag­ging Eval­u­a­tions on Claude 3.5 Sonnet

farrelmahaztraFeb 14, 2025, 1:22 AM
13 points
0 comments1 min readLW link
(farrelmahaztra.com)

Para­noia, Cog­ni­tive Bi­ases, and Catas­trophic Thought Pat­terns.

Spiritus DeiFeb 14, 2025, 12:13 AM
−4 points
1 comment6 min readLW link

Notes on the Pres­i­den­tial Elec­tion of 1836

Arjun PanicksseryFeb 13, 2025, 11:40 PM
23 points
0 comments7 min readLW link
(arjunpanickssery.substack.com)

Static Place AI Makes Agen­tic AI Re­dun­dant: Mul­tiver­sal AI Align­ment & Ra­tional Utopia

ankFeb 13, 2025, 10:35 PM
1 point
2 comments11 min readLW link

I’m mak­ing a ttrpg about life in an in­ten­tional com­mu­nity dur­ing the last year be­fore the Sin­gu­lar­ity

bgaesopFeb 13, 2025, 9:54 PM
11 points
2 comments2 min readLW link

SWE Au­toma­tion Is Com­ing: Con­sider Sel­ling Your Crypto

A_donorFeb 13, 2025, 8:17 PM
12 points
8 comments1 min readLW link

≤10-year Timelines Re­main Un­likely De­spite Deep­Seek and o3

Rafael HarthFeb 13, 2025, 7:21 PM
52 points
67 comments15 min readLW link

Sys­tem 2 Alignment

Seth HerdFeb 13, 2025, 7:17 PM
35 points
0 comments22 min readLW link

Mur­der plots are infohazards

Chris MonteiroFeb 13, 2025, 7:15 PM
301 points
44 comments2 min readLW link

Sparse Au­toen­coder Fea­ture Abla­tion for Unlearning

aludertFeb 13, 2025, 7:13 PM
3 points
0 comments11 min readLW link

What is it to solve the al­ign­ment prob­lem?

Joe CarlsmithFeb 13, 2025, 6:42 PM
31 points
6 comments19 min readLW link
(joecarlsmith.substack.com)

Self-di­alogue: Do be­hav­iorist re­wards make schem­ing AGIs?

Steven ByrnesFeb 13, 2025, 6:39 PM
43 points
0 comments46 min readLW link

How do we solve the al­ign­ment prob­lem?

Joe CarlsmithFeb 13, 2025, 6:27 PM
63 points
9 comments6 min readLW link
(joecarlsmith.substack.com)

Am­bigu­ous out-of-dis­tri­bu­tion gen­er­al­iza­tion on an al­gorith­mic task

Feb 13, 2025, 6:24 PM
83 points
6 comments11 min readLW link

Teach­ing AI to rea­son: this year’s most im­por­tant story

Benjamin_ToddFeb 13, 2025, 5:40 PM
10 points
0 comments10 min readLW link
(benjamintodd.substack.com)

AI #103: Show Me the Money

ZviFeb 13, 2025, 3:20 PM
30 points
9 comments58 min readLW link
(thezvi.wordpress.com)

OpenAI’s NSFW policy: user safety, harm re­duc­tion, and AI consent

8e9Feb 13, 2025, 1:59 PM
4 points
3 comments2 min readLW link

Stud­ies of Hu­man Er­ror Rate

tin482Feb 13, 2025, 1:43 PM
15 points
3 comments1 min readLW link

the dumb­est the­ory of everything

lostinwilliamsburgFeb 13, 2025, 7:57 AM
−1 points
0 comments7 min readLW link

Skep­ti­cism to­wards claims about the views of pow­er­ful institutions

tlevinFeb 13, 2025, 7:40 AM
46 points
2 comments4 min readLW link

Virtue sig­nal­ing, and the “hu­mans-are-won­der­ful” bias, as a trust exercise

lcFeb 13, 2025, 6:59 AM
44 points
16 comments4 min readLW link

My model of what is go­ing on with LLMs

Cole WyethFeb 13, 2025, 3:43 AM
104 points
49 comments7 min readLW link

Not all ca­pa­bil­ities will be cre­ated equal: fo­cus on strate­gi­cally su­per­hu­man agents

benwrFeb 13, 2025, 1:24 AM
62 points
8 comments3 min readLW link

LLMs can teach them­selves to bet­ter pre­dict the future

Ben TurtelFeb 13, 2025, 1:01 AM
0 points
1 comment1 min readLW link
(arxiv.org)

Dove­tail’s agent foun­da­tions fel­low­ship talks & discussion

Alex_AltairFeb 13, 2025, 12:49 AM
10 points
0 comments1 min readLW link

Ex­tended anal­ogy be­tween hu­mans, cor­po­ra­tions, and AIs.

Daniel KokotajloFeb 13, 2025, 12:03 AM
36 points
2 comments6 min readLW link

Mo­ral Hazard in Demo­cratic Voting

lsusrFeb 12, 2025, 11:17 PM
20 points
8 comments1 min readLW link

MATS Spring 2024 Ex­ten­sion Retrospective

Feb 12, 2025, 10:43 PM
26 points
1 comment15 min readLW link

Hunt­ing for AI Hack­ers: LLM Agent Honeypot

Feb 12, 2025, 8:29 PM
34 points
0 comments5 min readLW link
(www.apartresearch.com)

Prob­a­bil­ity of AI-Caused Disaster

Alvin ÅnestrandFeb 12, 2025, 7:40 PM
2 points
2 comments10 min readLW link
(forecastingaifutures.substack.com)

Two flaws in the Machi­avelli Benchmark

TheManxLoinerFeb 12, 2025, 7:34 PM
23 points
0 comments3 min readLW link

Gra­di­ent Anatomy’s—Hal­lu­ci­na­tion Ro­bust­ness in Med­i­cal Q&A

DieSabFeb 12, 2025, 7:16 PM
2 points
0 comments10 min readLW link

Are cur­rent LLMs safe for psy­chother­apy?

PaperBikeFeb 12, 2025, 7:16 PM
5 points
4 comments1 min readLW link

Com­par­ing the effec­tive­ness of top-down and bot­tom-up ac­ti­va­tion steer­ing for by­pass­ing re­fusal on harm­ful prompts

Ana KaprosFeb 12, 2025, 7:12 PM
7 points
0 comments5 min readLW link

The Paris AI Anti-Safety Summit

ZviFeb 12, 2025, 2:00 PM
129 points
21 comments21 min readLW link
(thezvi.wordpress.com)

In­side the dark forests of the internet

Itay DreyfusFeb 12, 2025, 10:20 AM
10 points
0 comments6 min readLW link
(productidentity.co)

Utility Eng­ineer­ing: An­a­lyz­ing and Con­trol­ling Emer­gent Value Sys­tems in AIs

Matrice JacobineFeb 12, 2025, 9:15 AM
53 points
49 commentsLW link
(www.emergent-values.ai)