Did you publish it? link?
if you are still here, check this: https://arxiv.org/abs/1712.01826
Seems you were right after all with dust theory.
Also, is there any way to see your original version of the post?
Interestingly, an agent with an unitary utility function may still find itself in a situation similar to akrasia, if it can’t make a choice between two lines of actions, which have almost equal weights. This was described as a situation of Buridan ass by Lamport, and he shows that the problem doesn’t have easy solutions and cause real life accidents.
Another part of the problem is that if I have a to make choice between equal alternatives – and the situations of choice are always choice between seemingly equal alternatives, or there are no need to make a choice – is that I have to search for additional evidence which of the alternatives is better, and as result my choice is eventually decided by very small piece of evidence. This make me vulnerable for adversarial attacks by, say, sellers, which could press me to make a choice by saying “It is 5 per cent discount today.”
A new article on the topic:
Boltzmannian Immortality* Christian Loew†
There are also two schools of bayesian thinking: “It is popular to divide Bayesians into two main categories, “objective” and “subjective” Bayesians. The divide is sometimes made formal, there are conferences labelled as one but not the other, for example.
A caricature of subjective Bayes is that all probabilities are just opinion, and the best we can do with an opinion is make sure it isn’t self contradictory, and satisfying the rules of probability is a way of ensuring that. A caricature of objective Bayes is that there exists a correct probability for every hypothesis given certain information, and that different people with the same information should make exactly the same probability judgments.”
The main assumption, imho, is that if we input very large capabilities, we will get very large achievements. That is, if we we assume possibility of superintelligence and nanotech, we will get a possibility of quick intergalactic travel.
However, arguing from infinities could be problematic, as it could produce contradicting results, as is well known in the philosophy: https://en.wikipedia.org/wiki/Omnipotence_paradox
Safety systems must be foolproof. (I am now in the airport and is going to board a Russian plane which will fly almost the same rout as the one which had a catastrophic fire a few day ago.)
One more example from aviation safety: a pilot put his son on a steering wheel, knowing that it is turned off and the plane is controlled by autopilot. However, after the wheel was turned 15 degrees, the autopilot turned off, as it has a new feature of “corrigibility” and the plane crashed.
Thanks, will correct in my working draft.
In Chernobyl, they added small pieces of uranium (or something which increases reactivity) on the tips of boron rods which should stop reactor. It was done to increase control of the reactor, as smaller changes in rods’ position will produce larger changes of reactivity. However, it didn’t works well (obviously), when they tried to use these rods to stop near critical reactor: as the rods entered the reactor, they produced a jump of total reactivity, which contributed to its explosion. This is how I remember the story.
TL;DR: The system which increases corrigibility contributed to runaway changes during a nuclear accident.
I wrote something like this in “Message to any future AI” there I suggest to the future AI that humans may have some now unknown instrumental value, and thus it is no wise to kill them now (this is in the end of the post).
Also in Global solution to AI safety, again at the end, there I look at solutions where AI consists of humans and in First human upload as AI Nanny which title is self-explanatory.
You may also have read The Age of Ems by Hanson.
“preference “my decisions should be mine”—and many people seems to have it”
I think it could be explained by social games. A person whose decision are unmovable are more likely to dominate eventually and by demostrating inflexibility a person pretends to have higher status. Also the person escapes any possible exploits, playing game of chicken preventively.
If I have preference “my decisions should be mine”—and many people seems to have it—then letting taxi driver decide is not ok.
There are “friends” who claim to have the same goals as me, but later turns out that they have hidden motives.
But 10 000 IQ AI can cheat 1000 IQ AI? If yes, only equally powerful AIs will cooperate.
Sometimes I overupdate on the evidence.
For example, I have equal preference to go to my country house for a weakened or to stay home, 50 to 50. I decide to go, but then I find that a taxi would be too long to wait, and this shift expected utility to stay home option (51-to-49). I decided to stay, but later I learn that sakura start to bloom, and I decide to go again (52-48), but now I find that a friend invited to me somewhere on the evening.
This have two negative results: I spend half a day meandering between options, like Buridan ass.
Second consequence is that I give the power over my final decisions to small random events around me, and more over, a potential adversary could manipulate my decisions by providing me with small pieces of evidence which favours his interest.
Other people, I know them, stick rigorously to any decision they made no matter what and ignore any incoming evidence. This eventually often turn to be winning strategy, compared to the flexible strategy of constant updating expected utility.
Anyone have similar problem or a solution?
This may fall in the the fallowing type of reasoning: “Superinteligent AI will be super in any human capability X. Human can cooperate. Thus SAI will have superhuman capability to cooperate.”
The problem of such conjecture is that if we take an opposite human quality not-X, SAI will also have superhuman capability in it. For example, if X= cheating, then superintelligent AI will have superhuman capability in cheating.
However, SAI can’t be simultaneously super-cooperator and super-cheater.
AI could have a superhuman capability to find win-win solutions and sell it as a service to humans in form of market arbitrage, courts, partner matching (e.g; Tinder).
Based on this win-win solutions finding capability, AI will not have to “take over the world”—it could negotiate its way to global power, and everyone will win because of it (at leat, initially).
There is some troubles in creating full and safe list of such human preferences, and there were an idea that AI will be capable to learn actual human preferences by observing human behaviour or by other means, like inverse reinforcement learning.
This my post basically shows that value learning will also have troubles, as there is no real human values, so some other ways to create such list of preferences is needed.
How to align the AI with existing preference, presented in human language, is another question. Yudkowsky wrote that without taking into account the complexity of value, we can’t make safe AI, as it would wrongly interpret short commands without knowing the context.
Interestingly, in 1960 the Foerster’s law has ended, but in 1965 the Moore’s law was born.
And in 2010s the Moore’s law is dying, but OpenAI’s law of growth in compute with 3 month doubling appeared.
This should not be surprising from the point of view of Spenser’s laws or progress (1857) where is said that the focal point of progress is constantly shifting to different domains and only inside this focal point the exponential progress is happening. (Note that this interpretation of Spenser’s laws comes from my university lecture, not form the reading of his book, and may be my interpretation of my teacher interpretation, but it seems correct, if we look on earlier explosions of innovations in different fields, like aviation).
I think that the focal point of progress is shifting towards the self-improving AI: that is, the focal point, where the growth of productivity increases the growth of productivity is moving form material supporting systems, like population, to intelligent systems, like computers and later to their programs.