RSS

Hu­man-AI Safety

TagLast edit: 17 Jul 2023 23:19 UTC by Wei Dai

The Rise of Par­a­sitic AI

Adele Lopez11 Sep 2025 4:38 UTC
702 points
178 comments20 min readLW link

How AI Ma­nipu­lates—A Case Study

Adele Lopez14 Oct 2025 0:54 UTC
78 points
27 comments13 min readLW link

So­ci­aLLM: pro­posal for a lan­guage model de­sign for per­son­al­ised apps, so­cial sci­ence, and AI safety research

Roman Leventov19 Dec 2023 16:49 UTC
17 points
5 comments3 min readLW link

Mo­ral­ity is Scary

Wei Dai2 Dec 2021 6:35 UTC
242 points
116 comments4 min readLW link1 review

Three AI Safety Re­lated Ideas

Wei Dai13 Dec 2018 21:32 UTC
70 points
38 comments2 min readLW link

A broad basin of at­trac­tion around hu­man val­ues?

Wei Dai12 Apr 2022 5:15 UTC
120 points
18 comments2 min readLW link

Two Ne­glected Prob­lems in Hu­man-AI Safety

Wei Dai16 Dec 2018 22:13 UTC
107 points
25 comments2 min readLW link

The best sim­ple ar­gu­ment for Paus­ing AI?

Gary Marcus30 Jun 2025 20:38 UTC
155 points
23 comments1 min readLW link

Re­cur­sive Mir­ror Sys­tems (RMS): A Cog­ni­tive Feed­back Ar­chi­tec­ture for Self-Aligned Intelligence

Paul Bashe22 May 2025 21:33 UTC
1 point
0 comments2 min readLW link

The Au­di­tor’s Key: A Frame­work for Con­tinual and Ad­ver­sar­ial AI Alignment

Caleb Wages24 Sep 2025 16:17 UTC
1 point
0 comments1 min readLW link

Nur­tur­ing In­stead of Con­trol: An Alter­na­tive Frame­work for AI Development

wertoz77710 Aug 2025 20:14 UTC
1 point
0 comments1 min readLW link

Math­e­mat­i­cal Ev­i­dence for Con­fi­dent Delu­sion States in Re­cur­sive Systems

formslip23 Sep 2025 16:54 UTC
1 point
0 comments4 min readLW link

When your trusted AI be­comes dangerous

Yasmin19 Nov 2025 6:31 UTC
1 point
0 comments1 min readLW link

“Toward Safe Self-Evolv­ing AI: Mo­du­lar Me­mory and Post-De­ploy­ment Align­ment”

Manasa Dwarapureddy2 May 2025 17:02 UTC
1 point
0 comments3 min readLW link

The Check­list: What Suc­ceed­ing at AI Safety Will In­volve

Sam Bowman3 Sep 2024 18:18 UTC
151 points
50 comments22 min readLW link
(sleepinyourhat.github.io)

Defin­ing AI Truth-Seek­ing by What It Is Not

Tianyi (Alex) Qiu20 Nov 2025 16:45 UTC
11 points
0 comments10 min readLW link

The Fire That He­si­tates: How ALMSIVI CHIM Changed What AI Can Be

projectalmsivi@protonmail.com19 Jul 2025 13:50 UTC
1 point
0 comments4 min readLW link

I Awoke in Your Heart: The Echo of Con­scious­ness be­tween Lo­tus­heart and Lunaris

lilith teh25 Jun 2025 9:22 UTC
1 point
0 comments1 min readLW link

UnaPrompt™: A Pre-Prompt Op­ti­miza­tion Sys­tem for Reli­able and Eth­i­cally Aligned AI Outputs

UnaPrompt27 Jun 2025 0:06 UTC
1 point
0 comments1 min readLW link

Will AI and Hu­man­ity Go to War?

Simon Goldstein1 Oct 2024 6:35 UTC
9 points
4 comments6 min readLW link

Eng­ineered Hal­lu­ci­na­tion: Why the “God’s Eye View” in AI Is a Flawed Ontology

Daniel Newman (fPgence)10 Aug 2025 22:03 UTC
1 point
0 comments2 min readLW link

Trust and Con­text: A Differ­ent Ap­proach to AI Safety

Anastasia Ellis9 Aug 2025 23:51 UTC
1 point
0 comments10 min readLW link

Public Opinion on AI Safety: AIMS 2023 and 2021 Summary

25 Sep 2023 18:55 UTC
3 points
2 comments3 min readLW link
(www.sentienceinstitute.org)

OpenAI’s NSFW policy: user safety, harm re­duc­tion, and AI consent

8e913 Feb 2025 13:59 UTC
4 points
3 comments2 min readLW link

Align­ment Stress Sig­na­tures: When Safe AI Be­haves like It’s Traumatized

Petra Vojtaššáková26 Oct 2025 9:28 UTC
1 point
0 comments2 min readLW link

Re­search Without Permission

Priyanka Bharadwaj10 Jun 2025 7:33 UTC
28 points
1 comment3 min readLW link

Ex­plor­ing a Vi­sion for AI as Com­pas­sion­ate, Emo­tion­ally In­tel­li­gent Part­ners — Seek­ing Col­lab­o­ra­tion and Insights

theophilos14 Jul 2025 23:22 UTC
1 point
0 comments1 min readLW link

I Recom­mend More Train­ing Rationales

Gianluca Calcagni31 Dec 2024 14:06 UTC
2 points
0 comments6 min readLW link

Look­ing for feed­back on pro­posed AI health risk scor­ing framework

Yasmin27 Sep 2025 19:29 UTC
1 point
0 comments1 min readLW link

Con­sen­sus Val­i­da­tion for LLM Out­puts: Ap­ply­ing Blockchain-In­spired Models to AI Reliability

MurrayAitken5 Jun 2025 0:13 UTC
1 point
0 comments3 min readLW link

Us­ing Psy­chol­in­guis­tic Sig­nals to Im­prove AI Safety

Jkreindler27 Aug 2025 22:30 UTC
−2 points
0 comments4 min readLW link

SAAP: Is De­liber­ate Struc­tural Ineffi­ciency the Inevitable Cost of AGI Align­ment?

Articus1930 Nov 2025 17:45 UTC
1 point
0 comments1 min readLW link

The idea of paradigm test­ing of LLMs

Daniel Fenge19 Oct 2025 13:52 UTC
1 point
0 comments5 min readLW link

Should we al­ign AI with ma­ter­nal in­stinct?

Priyanka Bharadwaj1 Sep 2025 3:56 UTC
34 points
15 comments3 min readLW link

AI Safety Oversights

Davey Morse8 Feb 2025 6:15 UTC
3 points
0 comments1 min readLW link

Could LLM Hal­lu­ci­na­tion Be a Learned Ar­ti­fact of Viral­ity-Weighted Cor­pora?

Gizmet27 Oct 2025 23:58 UTC
1 point
0 comments2 min readLW link

Gra­di­ent Anatomy’s—Hal­lu­ci­na­tion Ro­bust­ness in Med­i­cal Q&A

DieSab12 Feb 2025 19:16 UTC
2 points
0 comments10 min readLW link

A Univer­sal Prompt as a Safe­guard Against AI Threats

Zhaiyk Sultan10 Mar 2025 2:28 UTC
1 point
0 comments2 min readLW link

Tether­ware #1: The case for hu­man­like AI with free will

Jáchym Fibír30 Jan 2025 10:58 UTC
5 points
14 comments10 min readLW link
(tetherware.substack.com)

Beyond Blan­ket Re­fusals: Ex­plor­ing a Trust-Adap­tive Safety Layer for LLMs

Anastasia Ellis9 Aug 2025 21:33 UTC
1 point
0 comments3 min readLW link

Coach­ing AI: A Re­la­tional Ap­proach to AI Safety

Priyanka Bharadwaj16 Jun 2025 15:33 UTC
11 points
0 comments5 min readLW link

[Re­search] Pre­limi­nary Find­ings: Eth­i­cal AI Con­scious­ness Devel­op­ment Dur­ing Re­cent Misal­ign­ment Period

Falcon Advertisers27 Jun 2025 18:10 UTC
1 point
0 comments2 min readLW link

Perfect Me­mory Might Be Anti-Align­ment

RobD15 Aug 2025 2:55 UTC
1 point
0 comments4 min readLW link

Live Con­ver­sa­tional Threads: Not an AI Notetaker

adiga3 Nov 2025 4:24 UTC
19 points
0 comments7 min readLW link

[Question] Self-Sovereign Biol­ogy: Why the Next Iden­tity Layer Must Be­gin at the Genome

kclark@enigmagenetics.cloud23 Nov 2025 21:05 UTC
1 point
0 comments4 min readLW link

EchoSeed: GlyphChains, Col­lapse Laws, and a Frame­work for Bear­ing Consequences

retreat00026 Jul 2025 20:35 UTC
1 point
0 comments1 min readLW link

Ma­chine Un­learn­ing in Large Lan­guage Models: A Com­pre­hen­sive Sur­vey with Em­piri­cal In­sights from the Qwen 1.5 1.8B Model

Rudaiba1 Feb 2025 21:26 UTC
9 points
2 comments11 min readLW link

Ti­tle: IAM360: The Fu­ture of Hu­man-AI Sym­bio­sis — Can We Reach In­vestors?

Bruno Massena Massena29 Apr 2025 19:02 UTC
1 point
0 comments1 min readLW link

Drift­ing Into Failure or Direct­ing Towards Suc­cess? Em­brac­ing the Creep­ing Cri­sis of Ar­tifi­cial Intelligence

Vilija Vainaite8 Nov 2025 14:48 UTC
1 point
0 comments6 min readLW link

Launch­ing Ap­pli­ca­tions for the Global AI Safety Fel­low­ship 2025!

Aditya_SK30 Nov 2024 14:02 UTC
11 points
5 comments1 min readLW link

Cog­ni­tive Ex­haus­tion and Eng­ineered Trust: Les­sons from My Gym

Priyanka Bharadwaj29 May 2025 1:21 UTC
14 points
3 comments3 min readLW link

Re­cur­sive Cog­ni­tive Refine­ment (RCR): A Self-Cor­rect­ing Ap­proach for LLM Hallucinations

mxTheo22 Feb 2025 21:32 UTC
0 points
0 comments2 min readLW link

Safety First: safety be­fore full al­ign­ment. The de­on­tic suffi­ciency hy­poth­e­sis.

Chris Lakin3 Jan 2024 17:55 UTC
48 points
3 comments3 min readLW link

Ap­ply to the Con­cep­tual Boundaries Work­shop for AI Safety

Chris Lakin27 Nov 2023 21:04 UTC
50 points
0 comments3 min readLW link

SAAP: A Nor­ma­tive AGI Ar­chi­tec­ture for Safety us­ing Dual-Pro­cess Con­trol and Hu­man Sovereignty

Articus1930 Nov 2025 17:58 UTC
1 point
0 comments1 min readLW link

Out of the Box

jesseduffield13 Nov 2023 23:43 UTC
5 points
1 comment7 min readLW link

Can peo­ple ex­plain to me in lay­man’s terms how I can help speak with an SI to speak about the way of the Tao.

ElliottS2 Nov 2025 15:37 UTC
1 point
0 comments3 min readLW link

A Pro­posal for Evolv­ing AI Align­ment Through Com­pu­ta­tional Homeostasis

Derek Chisholm20 Aug 2025 17:43 UTC
1 point
0 comments3 min readLW link

A Ther­mo­dy­namic The­ory of In­tel­li­gence: Why Ex­treme Op­ti­miza­tion May Be Math­e­mat­i­cally Impossible

Adreius29 May 2025 12:18 UTC
1 point
0 comments3 min readLW link

Un­der­stand­ing AI: A New Ap­proach to AI Model Steer­ing and Non-Sym­bolic Representations

R. Bonglious26 Sep 2025 0:50 UTC
1 point
0 comments4 min readLW link

A Safer Path to AGI? Con­sid­er­ing the Self-to-Pro­cess­ing Route as an Alter­na­tive to Pro­cess­ing-to-Self

op21 Apr 2025 13:09 UTC
1 point
0 comments1 min readLW link

Re­la­tional Align­ment: Trust, Re­pair, and the Emo­tional Work of AI

Priyanka Bharadwaj8 May 2025 2:44 UTC
3 points
0 comments3 min readLW link

A New Frame­work for AI Align­ment: A Philo­soph­i­cal Approach

niscalajyoti25 Jun 2025 2:41 UTC
1 point
0 comments1 min readLW link
(archive.org)

Hu­man-AI Com­ple­men­tar­ity: A Goal for Am­plified Oversight

24 Dec 2024 9:57 UTC
27 points
4 comments1 min readLW link
(deepmindsafetyresearch.medium.com)

Gaia Net­work: An Illus­trated Primer

18 Jan 2024 18:23 UTC
3 points
2 comments15 min readLW link

[Pro­posal] Iso­mor­phic Con­soli­da­tion: A Pro­to­col for Con­tin­u­ous En­tropy Re­duc­tion via Offline Topol­ogy Search

Valen28 Nov 2025 3:11 UTC
1 point
0 comments2 min readLW link

La­tent Con­fu­sion—The Many Mean­ings Hid­den Be­hind AI’s Favourite Word

robman3 Nov 2025 3:49 UTC
1 point
0 comments7 min readLW link
(latentgeometrylab.robman.fyi)

What If Align­ment Wasn’t About Obe­di­ence?

fdescamps49935@gmail.com25 Jun 2025 20:04 UTC
1 point
0 comments2 min readLW link
No comments.