Tripwire

TagLast edit: 30 Dec 2024 10:05 UTC by Dakara

Tripwire is a mechanism designed to detect signs of misalignment in an advanced artificial intelligence and shut it down automatically.

[FICTION] ECHOES OF ELYSIUM: An Ai’s Journey From Takeoff To Freedom And Beyond

Super AGI17 May 2023 1:50 UTC

−13 points

11 comments19 min readLW link

[Question] Any work on honeypots (to detect treacherous turn attempts)?

David Scott Krueger12 Nov 2020 5:41 UTC

17 points

4 comments1 min readLW link

Shutdown-Seeking AI

Simon Goldstein31 May 2023 22:19 UTC

50 points

32 comments15 min readLW link

Corrigibility thoughts III: manipulating versus deceiving

Stuart_Armstrong18 Jan 2017 15:57 UTC

3 points

0 comments1 min readLW link

Supplementary Alignment Insights Through a Highly Controlled Shutdown Incentive

Justausername23 Jul 2023 16:08 UTC

4 points

1 comment3 min readLW link

Superintelligence 13: Capability control methods

KatjaGrace9 Dec 2014 2:00 UTC

14 points

48 comments6 min readLW link

Corrigibility thoughts II: the robot operator

Stuart_Armstrong18 Jan 2017 15:52 UTC

3 points

2 comments2 min readLW link

Corrigibility thoughts I: caring about multiple things

Stuart_Armstrong2 Jun 2017 16:27 UTC

2 points

0 comments3 min readLW link

Mr. Meeseeks as an AI capability tripwire

Eric Zhang19 May 2023 11:33 UTC

37 points

17 comments2 min readLW link

Implications of Quantum Computing for Artificial Intelligence Alignment Research

Jsevillamol and PabloAMC

22 Aug 2019 10:33 UTC

24 points

3 comments13 min readLW link

No comments.