Defining the normal computer control problem

There has been focus on controlling super intelligent artificial intelligence, however we currently can’t even control our un-agenty computers without having to resort to formatting and other large scale interventions.

Solving the normal computer control problem might help us solve the super intelligence control problem or allow us to work towards safe intelligence augmentation.

We cannot currently keep our computers doing what we want easily. They can get infected with malware, compromised or they get updates that may be buggy. If you have sufficient expertise you can go in and fix the problem or wipe the system, but this is not ideal.

We do not have control our computers, without resorting to out of band manipulation.

Genes have found a way to control the reptilian brain and also the more powerful mammalian and human brains, somewhat, as discussed in the control problem has already been solved. And the system continues to run, our brains aren’t reformatted when we get bad behaviours. Let us call this the normal computer control problem. We don’t know how the genes do it, but humans tend to do what they would want (if they wanted things!) despite our flexibility. There is some control there.

This problem of control has been neglected by traditional AI as there it is not trying to solve a cognitive problem. It is not like solving chess or learning to recognize faces. It is not making anything powerful, it is just weeding out the bad programs.

Comparing the normal computer control and AI control problem

The AI control problem has been defined as asking the question

What prior precautions can the programmers take to successfully prevent the superintelligence from catastrophically misbehaving?

In this language the normal computer control problem can be defined as.

What type of automated system can we implement to stop a normal general purpose computer system misbehaving (and carry on with its good behaviour ) if it has a malign program in it.

To make the differences explicit:

The normal control problem assumes a system with multiple programs some good, some bad
The normal control problem assumes that there is no specific agency in the programs (especially not super-intelligent agency)
The normal control problem allows minor misbehaviour, but that it should not persist over time

These make the problem more amenable to study. These sorts of systems can be seen in animals. They will stop pursuing behaviours that are malign to themselves. If a horse unknowingly walks into an electric fence whilst it was trying to get to an apple, they will stop trying to walk in that direction. This is operant conditioning, but it has not been applied to a whole computer system with arbitrary programs in.

Imagine being able to remove malware from a normal computer system by training that system. That is what I am looking to produce.

This might not be the right problem definition to help us understand what is going on the control done in brains. But I think it is precise and novel enough to form one of the initially research pathways. Ideally we should have a diverse community around this problem one that includes neuroscientists, psychologists and other relevant scientists. We would also have charitable organisations like the ones trying to solve the super intelligence control problem. All this would maximize the chance that the right question was being attempted to be answered.

Should we study the normal computer control problem?

I’m not going to try and argue that this is more important that super-intelligence work. Such things are probably unknowable until after the fact. But just that it is a more tractable problem to try and solve and might have insights useful for the super-intelligence work.

But as ever with the future there are trade-offs for doing this work.

Pros:

It might help solve the super intelligence control problem, by providing inspiration or allowing people to show exactly where it go wrong in the super intelligence side of things.
it might be the best that we can do towards the control problem, along with good training at (if formal proofs to do with values aren’t helpful for control)
We have can use science on brains in general, to give us inspiration on how this might work.
It can be more experimental and less theoretical than current work.
It might help with intelligence augmentation work (maybe a Con, depending upon pre-conceptions).
if deployed widely it might lead to computation being harder to control by malicious actors. This would making taking over the internet harder (ameliorating one take off scenario).
it could grow the pool of people that have heard of the control problem
my current hypothesis is that the programs within my current system aren’t bound to maximise utility so do not suffer from some of the failure modes associated with normal utility maximisation.

Cons:

It might speed up AI work (however it is not AI in itself, there is no set of problems it is trying to solve). It would speed up AI work by giving a common platform to work from.
it might distract from the large scale super intelligence problem

So having said all that: Is anyone with me in trying to solve this problem?