RSS

Hu­man-AI Safety

TagLast edit: 17 Jul 2023 23:19 UTC by Wei Dai

Two Ne­glected Prob­lems in Hu­man-AI Safety

Wei Dai16 Dec 2018 22:13 UTC
102 points
25 comments2 min readLW link

Three AI Safety Re­lated Ideas

Wei Dai13 Dec 2018 21:32 UTC
69 points
38 comments2 min readLW link

Mo­ral­ity is Scary

Wei Dai2 Dec 2021 6:35 UTC
209 points
116 comments4 min readLW link1 review

A broad basin of at­trac­tion around hu­man val­ues?

Wei Dai12 Apr 2022 5:15 UTC
113 points
17 comments2 min readLW link

So­ci­aLLM: pro­posal for a lan­guage model de­sign for per­son­al­ised apps, so­cial sci­ence, and AI safety research

Roman Leventov19 Dec 2023 16:49 UTC
17 points
5 comments3 min readLW link

Let’s ask some of the largest LLMs for tips and ideas on how to take over the world

Super AGI24 Feb 2024 20:35 UTC
1 point
0 comments7 min readLW link

[Question] Will OpenAI also re­quire a “Su­per Red Team Agent” for its “Su­per­al­ign­ment” Pro­ject?

Super AGI30 Mar 2024 5:25 UTC
2 points
2 comments1 min readLW link

The Check­list: What Suc­ceed­ing at AI Safety Will In­volve

Sam Bowman3 Sep 2024 18:18 UTC
142 points
49 comments22 min readLW link
(sleepinyourhat.github.io)

Will AI and Hu­man­ity Go to War?

Simon Goldstein1 Oct 2024 6:35 UTC
9 points
4 comments6 min readLW link

Out of the Box

jesseduffield13 Nov 2023 23:43 UTC
5 points
1 comment7 min readLW link

Ap­ply to the Con­cep­tual Boundaries Work­shop for AI Safety

Chipmonk27 Nov 2023 21:04 UTC
50 points
0 comments3 min readLW link

Public Opinion on AI Safety: AIMS 2023 and 2021 Summary

25 Sep 2023 18:55 UTC
3 points
2 comments3 min readLW link
(www.sentienceinstitute.org)

Safety First: safety be­fore full al­ign­ment. The de­on­tic suffi­ciency hy­poth­e­sis.

Chipmonk3 Jan 2024 17:55 UTC
48 points
3 comments3 min readLW link

A con­ver­sa­tion with Claude3 about its consciousness

rife5 Mar 2024 19:44 UTC
−2 points
3 comments1 min readLW link
(i.imgur.com)

Gaia Net­work: An Illus­trated Primer

18 Jan 2024 18:23 UTC
3 points
2 comments15 min readLW link
No comments.