Is ChatGPT ac­tu­ally fixed now?

sjadler8 May 2025 23:34 UTC
17 points
0 comments1 min readLW link
(stevenadler.substack.com)

Post EAG Lon­don AI x-Safety Co-work­ing Retreat

plex8 May 2025 23:00 UTC
10 points
0 comments1 min readLW link

a brief cri­tique of reduction

Vadim Golub8 May 2025 22:43 UTC
−17 points
4 comments2 min readLW link

Video & tran­script: Challenges for Safe & Benefi­cial Brain-Like AGI

Steven Byrnes8 May 2025 21:11 UTC
26 points
0 comments18 min readLW link

Ap­pendix: In­ter­pretable by De­sign—Con­straint Sets with Disjoint Limit Points

Ronak_Mehta8 May 2025 21:09 UTC
2 points
0 comments2 min readLW link

In­ter­pretable by De­sign—Con­straint Sets with Disjoint Limit Points

Ronak_Mehta8 May 2025 21:08 UTC
24 points
2 comments9 min readLW link
(ronakrm.github.io)

Is there a Half-Life for the Suc­cess Rates of AI Agents?

Matrice Jacobine8 May 2025 20:10 UTC
8 points
0 comments1 min readLW link
(www.tobyord.com)

Misal­ign­ment and Strate­gic Un­der­perfor­mance: An Anal­y­sis of Sand­bag­ging and Ex­plo­ra­tion Hacking

8 May 2025 19:06 UTC
77 points
3 comments15 min readLW link

Be­hold the Pale Child (es­cap­ing Moloch’s Mad Maze)

rogersbacon8 May 2025 16:36 UTC
8 points
16 comments11 min readLW link
(www.secretorum.life)

An al­ign­ment safety case sketch based on debate

8 May 2025 15:02 UTC
57 points
21 comments25 min readLW link
(arxiv.org)

Mechanis­tic In­ter­pretabil­ity Via Learn­ing Differ­en­tial Equa­tions: AI Safety Camp Pro­ject In­ter­me­di­ate Re­port.

8 May 2025 14:45 UTC
8 points
0 comments7 min readLW link

AI #115: The Evil Ap­pli­ca­tions Division

Zvi8 May 2025 13:40 UTC
32 points
3 comments62 min readLW link
(thezvi.wordpress.com)

The Stegano­graphic Po­ten­tials of Lan­guage Models

8 May 2025 11:23 UTC
9 points
0 comments1 min readLW link

Our bet on whether the AI mar­ket will crash

8 May 2025 9:56 UTC
23 points
2 comments1 min readLW link

Con­cept-an­chored rep­re­sen­ta­tion en­g­ineer­ing for alignment

Sandy Fraser8 May 2025 8:59 UTC
5 points
0 comments3 min readLW link

Orthog­o­nal­ity Th­e­sis in lay­man’s terms.

Michael (@lethal_ai)8 May 2025 8:31 UTC
1 point
0 comments2 min readLW link

Arkose may be clos­ing, but you can help

Victoria Brook8 May 2025 7:28 UTC
8 points
0 comments2 min readLW link

Heal­ing pow­ers of med­i­ta­tion or the role of at­ten­tion in hu­moral reg­u­la­tion.

Yaroslav Granowski8 May 2025 6:48 UTC
7 points
0 comments1 min readLW link

Ori­ent­ing Toward Wizard Power

johnswentworth8 May 2025 5:23 UTC
564 points
147 comments5 min readLW link

Re­la­tional Align­ment: Trust, Re­pair, and the Emo­tional Work of AI

Priyanka Bharadwaj8 May 2025 2:44 UTC
3 points
0 comments3 min readLW link

There’s more low-hang­ing fruit in in­ter­dis­ci­plinary work thanks to LLMs

ChristianKl7 May 2025 19:48 UTC
26 points
2 comments1 min readLW link

OpenAI Claims Non­profit Will Re­tain Nom­i­nal Control

Zvi7 May 2025 19:40 UTC
65 points
4 comments11 min readLW link
(thezvi.wordpress.com)

So­cial sta­tus games might have “com­pute weight class” in the future

Raemon7 May 2025 18:56 UTC
34 points
7 comments2 min readLW link

Events of Low Prob­a­bil­ity: Buri­dan’s Principle

Nikita Gladkov7 May 2025 18:46 UTC
12 points
0 comments10 min readLW link

[Question] Which jour­nal­ists would you give quotes to? [one jour­nal­ist per com­ment, agree vote for trust­wor­thy]

Nathan Young7 May 2025 18:39 UTC
12 points
26 comments1 min readLW link

Please Donate to CAIP (Post 1 of 7 on AI Gover­nance)

Mass_Driver7 May 2025 17:13 UTC
119 points
20 comments33 min readLW link

UK AISI’s Align­ment Team: Re­search Agenda

7 May 2025 16:33 UTC
113 points
2 comments11 min readLW link

Four Pre­dic­tions About OpenAI’s Plans To Re­tain Non­profit Control

garrison7 May 2025 15:48 UTC
12 points
0 comments5 min readLW link
(www.obsolete.pub)

A Dis­ci­plined Way to Avoid Wireheading

amitlevy497 May 2025 15:20 UTC
18 points
6 comments5 min readLW link
(ivy0.substack.com)

Reflec­tions on Com­pat­i­bil­ism, On­tolog­i­cal Trans­la­tions, and the Ar­tifi­cial Divine

Mahdi Complex7 May 2025 12:16 UTC
2 points
1 comment22 min readLW link

The His­tor­i­cal Par­allels: Pre­limi­nary Reflection

EQ7 May 2025 8:06 UTC
3 points
0 comments9 min readLW link
(eqmind.substack.com)

Euro­pean Links (07.05.25)

Martin Sustrik7 May 2025 4:20 UTC
10 points
0 comments2 min readLW link
(250bpm.substack.com)

[Question] Chess—“Elo” of ran­dom play?

Shankar Sivarajan7 May 2025 2:18 UTC
10 points
16 comments1 min readLW link

$500 + $500 Bounty Prob­lem: Does An (Ap­prox­i­mately) Deter­minis­tic Max­i­mal Re­dund Always Ex­ist?

6 May 2025 23:05 UTC
73 points
16 comments3 min readLW link

Loss Curves

James Camacho6 May 2025 22:22 UTC
16 points
3 comments4 min readLW link
(github.com)

Nega­tive Re­sults on Group SAEs

Josh Engels6 May 2025 21:49 UTC
70 points
3 comments8 min readLW link

ACX At­lanta May 2025 Meetup

Steve French6 May 2025 21:00 UTC
2 points
0 comments1 min readLW link

[Question] What kind of policy by an AGI would make peo­ple happy?

StanislavKrym6 May 2025 18:05 UTC
1 point
2 comments1 min readLW link

AI Safety at the Fron­tier: Paper High­lights, April ’25

gasteigerjo6 May 2025 14:22 UTC
4 points
0 comments7 min readLW link
(aisafetyfrontier.substack.com)

Zucker­berg’s Dystopian AI Vision

Zvi6 May 2025 13:50 UTC
62 points
7 comments11 min readLW link
(thezvi.wordpress.com)

Will pro­tein de­sign tools solve the snake an­tivenom short­age?

Abhishaike Mahajan6 May 2025 13:11 UTC
31 points
0 comments17 min readLW link
(www.owlposting.com)

Utah Court Case Over State Law Re­gard­ing “Per­son­hood” for Non­hu­man Intelligences

Stephen Martin6 May 2025 12:54 UTC
10 points
3 comments2 min readLW link

Global Risks Weekly Roundup #18/​2025: US tar­iff short­ages, mil­i­tary polic­ing, Gaza famine.

NunoSempere6 May 2025 10:39 UTC
31 points
2 comments3 min readLW link
(blog.sentinel-team.org)

OpenAI’s Jig May Be Up

Vale6 May 2025 8:51 UTC
3 points
2 comments3 min readLW link

My Rea­sons for Us­ing Anki

Parker Conley6 May 2025 7:01 UTC
10 points
1 comment3 min readLW link
(parconley.com)

It’s ‘Well, ac­tu­ally...’ all the way down

benwr6 May 2025 5:44 UTC
40 points
34 comments1 min readLW link
(www.benwr.net)

Five Hinge‑Ques­tions That De­cide Whether AGI Is Five Years Away or Twenty

charlieoneill6 May 2025 2:48 UTC
126 points
17 comments5 min readLW link

Non­profit to re­tain con­trol of OpenAI

Archimedes5 May 2025 23:41 UTC
37 points
1 comment1 min readLW link
(openai.com)

Un­ex­pected Con­scious Entities

Gunnar_Zarncke5 May 2025 22:14 UTC
34 points
7 comments6 min readLW link

The First Law of Con­scious Agency: Lin­guis­tic Rel­a­tivity and the Birth of “I”

Dima (lain)5 May 2025 21:20 UTC
−17 points
4 comments2 min readLW link