Lemme expand on my thoughts a little bit. I imagine a non-self-modifying AI to be made of three parts: a thinking algorithm, a decision algorithm, and a belief database. The thinking and decision algorithms are immutable, and the belief database is (obviously) mutable. The supergoal is coded into the decision algorithm, so it can’t be changed. (Problem: the supergoal only makes sense in the concept of certain beliefs, and beliefs are mutable.) The contents of the belief database influence the thinking algorithm’s behavior, but they don’t determine its behavior.
The ideal possibility is that we can make the following happen:
The belief database is flexible enough that it can accommodate all types of beliefs from the very beginning. (If the thinking algorithm is immutable, it can’t be updated to handle new types of beliefs.)
The thinking algorithm is sufficiently flexible that the beliefs in the belief database can lead the algorithm in the right directions, producing super-duper intelligence.
The thinking algorithm is sufficiently inflexible that the beliefs in the belief database cannot cause the algorithm to do something really bad, producing insanity.
The supergoal remains meaningful in the context of the belief database regardless of how the thinking algorithm ends up behaving.
(My ideas haven’t been taken seriously in the past, and I have no special knowledge in this area, so it’s likely that my ideas are worthless. They feel valuable to me, however.)
Lemme expand on my thoughts a little bit. I imagine a non-self-modifying AI to be made of three parts: a thinking algorithm, a decision algorithm, and a belief database. The thinking and decision algorithms are immutable, and the belief database is (obviously) mutable. The supergoal is coded into the decision algorithm, so it can’t be changed. (Problem: the supergoal only makes sense in the concept of certain beliefs, and beliefs are mutable.) The contents of the belief database influence the thinking algorithm’s behavior, but they don’t determine its behavior.
The ideal possibility is that we can make the following happen:
The belief database is flexible enough that it can accommodate all types of beliefs from the very beginning. (If the thinking algorithm is immutable, it can’t be updated to handle new types of beliefs.)
The thinking algorithm is sufficiently flexible that the beliefs in the belief database can lead the algorithm in the right directions, producing super-duper intelligence.
The thinking algorithm is sufficiently inflexible that the beliefs in the belief database cannot cause the algorithm to do something really bad, producing insanity.
The supergoal remains meaningful in the context of the belief database regardless of how the thinking algorithm ends up behaving.
(My ideas haven’t been taken seriously in the past, and I have no special knowledge in this area, so it’s likely that my ideas are worthless. They feel valuable to me, however.)