I disagree with your interpretation of how human thoughts resolve into action. My biggest point of contention is the random pick of actions. Perhaps there is some Monte-Carlo algorithm that has a statistical guarantee that after some thousands or so tries, there is a very high probability that one of them is close to the best answer. Such algorithms exist, but it makes more sense to me that we take action based not only on context, but our memory of what has happened before. So instead of a probabilistic algorithm, you may have a structure more like a hash table. Then the input to the hash table would be what we see and feel in the moment: you see a mountain lion and feel fear, this information is hashed, and run like hell is the output. Collisions of this hash table could result in things like inaction.
I think your idea of consciousness is a good start and similar to my own ideas on the matter: we are a system and the observer of the system. What questions remain, however, are what are the sufficient and necessary components of the system, besides self-observation, that would create a subjective experience? Such as, would a system need to be self-preserving and aware of that self-preservation? Is sentience a prerequisite of sapience? By your definition, you seem to imply the other way around, that one must be a self-observing system to observe that you are observing something outside of your system. Maybe this is a chicken and egg problem, and the two are co-necessary factors. I would like to hear your thoughts on this.
As to your thoughts on a friendly AI...I have come up with a silly and perhaps incorrect counter-intuitive approach. Basically, it works like this: a computer system’s scheduler gives processor time to different actions in preference of some utility level. Let’s say 0 is the least important, and 5 the most. Lower level processes cannot preempt higher level ones; that is, a level 0 process cannot run before all level 1 processes are complete, and even if the completion of a level 0 process can aide the completion of a level 1 process, it cannot be run. The machine must find a different method, or return that the level 1 process cannot be completed with the current schedule. A level 5 request to make 1000 paperclips is given to the machine, and the machine determines that killing all humans will aid the completion of paperclips. Alas! Killing all humans is already scheduled at level 0, and another approach must be taken.
The other, less silly approach I thought of is to enforce a minimum energy requirement on all processes of a sufficiently dangerous machine. It stands to reason that creating 1000 paperclips can take significantly less energy than killing all humans, so killing all humans will be seen as a non-optimal strategy. In this scheme, we may not want to ask for world peace, but we should always be careful what we wish for....