TagLast edit: 18 Jul 2021 14:11 UTC by Multicore

In AI safety, a tripwire is a mechanism designed to detect signs of misalignment in an advanced artificial intelligence and shut it down automatically.

[FICTION] ECHOES OF ELYSIUM: An Ai’s Jour­ney From Take­off To Free­dom And Beyond

Super AGI17 May 2023 1:50 UTC
−13 points
11 comments19 min readLW link

Mr. Meeseeks as an AI ca­pa­bil­ity tripwire

Eric Zhang19 May 2023 11:33 UTC
37 points
17 comments2 min readLW link

Shut­down-Seek­ing AI

Simon Goldstein31 May 2023 22:19 UTC
48 points
31 comments15 min readLW link

Sup­ple­men­tary Align­ment In­sights Through a Highly Con­trol­led Shut­down Incentive

Justausername23 Jul 2023 16:08 UTC
4 points
1 comment3 min readLW link

Im­pli­ca­tions of Quan­tum Com­put­ing for Ar­tifi­cial In­tel­li­gence Align­ment Research

22 Aug 2019 10:33 UTC
24 points
3 comments13 min readLW link

Cor­rigi­bil­ity thoughts I: car­ing about mul­ti­ple things

Stuart_Armstrong2 Jun 2017 16:27 UTC
2 points
0 comments3 min readLW link

Cor­rigi­bil­ity thoughts II: the robot operator

Stuart_Armstrong18 Jan 2017 15:52 UTC
3 points
2 comments2 min readLW link

[Question] Any work on hon­ey­pots (to de­tect treach­er­ous turn at­tempts)?

David Scott Krueger (formerly: capybaralet)12 Nov 2020 5:41 UTC
17 points
4 comments1 min readLW link

Cor­rigi­bil­ity thoughts III: ma­nipu­lat­ing ver­sus deceiving

Stuart_Armstrong18 Jan 2017 15:57 UTC
3 points
0 comments1 min readLW link

Su­per­in­tel­li­gence 13: Ca­pa­bil­ity con­trol methods

KatjaGrace9 Dec 2014 2:00 UTC
14 points
48 comments6 min readLW link
No comments.