In connection with this discussion, I am pleased to announce a new initiative, the Unfriendly AI Pseudocode Contest!
Objective of the contest: To produce convincing examples of how a harmless-looking computer program, that has not been specifically designed to be “friendly”, could end up destroying the world. To explore the nature of AI danger without actually doing dangerous things.
Examples: A familiar example of unplanned unfriendliness, is the program designed to calculate pi, which reasons that it could calculate pi with much more accuracy if it turned the Earth into one giant computer. Here a harmless-looking goal (calculate pi) combines with a harmless-looking enhancement (vastly increased “intelligence”) to produce a harmful outcome (Earth turned into one giant computer which does nothing but calculate pi).
An entry in the Unfriendly AI Pseudocode Contest which was intended to illustrate this scenario, would need to be specified in much more detail than this. For example, it might contain a pseudocode specification of the pi-calculating program in a harmless “unenhanced” state, then a description of a harmless-looking enhancement, and then an analysis demonstrating that the program has now become an existential risk.
Prizes: The accolades of your peers. The uneasy admiration of a terrified humanity, for whom your little demo has become the standard example of why “friendliness” matters. The gratitude of nihilist supervillains, for whom your pseudocode provides a convenient blueprint for action…
A variant of this contest with less catastrophic unfriendliness actually ran for a few years. The (now defunct) Underhanded C Contest (description below from the contest web page):
The Underhanded C Contest is an annual contest to write innocent-looking C code implementing malicious behavior. In this contest you must write C code that is as readable, clear, innocent and straightforward as possible, and yet it must fail to perform at its apparent function. To be more specific, it should do something subtly evil.
Every year, we will propose a challenge to coders to solve a simple data processing problem, but with covert malicious behavior. Examples include miscounting votes, shaving money from financial transactions, or leaking information to an eavesdropper. The main goal, however, is to write source code that easily passes visual inspection by other programmers.
This contest sounds seriously cool and possibly useful, but it looks like a valid entry would require the pseudocode for a general intelligence, which as far as I know is beyond the capability of anyone reading this post.
In connection with this discussion, I am pleased to announce a new initiative, the Unfriendly AI Pseudocode Contest!
Objective of the contest: To produce convincing examples of how a harmless-looking computer program, that has not been specifically designed to be “friendly”, could end up destroying the world. To explore the nature of AI danger without actually doing dangerous things.
Examples: A familiar example of unplanned unfriendliness, is the program designed to calculate pi, which reasons that it could calculate pi with much more accuracy if it turned the Earth into one giant computer. Here a harmless-looking goal (calculate pi) combines with a harmless-looking enhancement (vastly increased “intelligence”) to produce a harmful outcome (Earth turned into one giant computer which does nothing but calculate pi).
An entry in the Unfriendly AI Pseudocode Contest which was intended to illustrate this scenario, would need to be specified in much more detail than this. For example, it might contain a pseudocode specification of the pi-calculating program in a harmless “unenhanced” state, then a description of a harmless-looking enhancement, and then an analysis demonstrating that the program has now become an existential risk.
Prizes: The accolades of your peers. The uneasy admiration of a terrified humanity, for whom your little demo has become the standard example of why “friendliness” matters. The gratitude of nihilist supervillains, for whom your pseudocode provides a convenient blueprint for action…
A variant of this contest with less catastrophic unfriendliness actually ran for a few years. The (now defunct) Underhanded C Contest (description below from the contest web page):
This contest sounds seriously cool and possibly useful, but it looks like a valid entry would require the pseudocode for a general intelligence, which as far as I know is beyond the capability of anyone reading this post.
I expect at this stage, you’d be allowed an occasional “and then a miracle occurs” until we work out what step two looks like.
Lesswrong is not an enjoyable place to post pseudocode. I learned this today.