One has the motivations one has, and one would be inclined to defend them if someone tried to rewire the motivations against one’s will. If one happened to have different motivations, then one would be inclined to defend those instead.
The idea is that once a superintelligence gets going, its motivations will be out of our reach. Therefore, the only window of influence is before it gets going. If, at the point of no return, it happens to have the right kinds of motivations, we survive. If not, it’s game over.
thank you. Make some sense...but does “rewriting its own code” (the very code we thought would perhaps permanently influence it before it got-going) nullify our efforts at hardcoding our intentions?
I’m not a psychopath, and if I got the opportunity to rewrite my own source code to become a psychopath, I wouldn’t do it.
At the same time, it’s the evolutionary and cultural programming in my source code that contains the desire not to become a psychopath.
In other words, once the desire to not become a psychopath is there in my source code, I will do my best not to become one, even if I have the ability to modify my source code.
That makes sense. My intention was not to argue from the position of it becoming a psychopath though (my apologies if it came out that way)...but instead from a perspective of an entity which starts-out as supposedly Aligned (centered-on human safety, let’s say), but then, bc it’s orders of magnitude smarter than we are (by definition), it quickly develops a different perspective. But you’re saying it will remain ‘aligned’ in some vitally-important way, even when it discovers ways the code could’ve been written differently?
The AI would be expected to care about preserving its motivations under self-modification for similar reasons as it would care about defending them against outside intervention. There could be a window where the AI operates outside immediate human control but isn’t yet good at keeping its goals stable under self-modification. It’s been mentioned as a concern in the past; I don’t know what the state of current thinking is.
One has the motivations one has, and one would be inclined to defend them if someone tried to rewire the motivations against one’s will. If one happened to have different motivations, then one would be inclined to defend those instead.
The idea is that once a superintelligence gets going, its motivations will be out of our reach. Therefore, the only window of influence is before it gets going. If, at the point of no return, it happens to have the right kinds of motivations, we survive. If not, it’s game over.
thank you. Make some sense...but does “rewriting its own code” (the very code we thought would perhaps permanently influence it before it got-going) nullify our efforts at hardcoding our intentions?
I’m not a psychopath, and if I got the opportunity to rewrite my own source code to become a psychopath, I wouldn’t do it.
At the same time, it’s the evolutionary and cultural programming in my source code that contains the desire not to become a psychopath.
In other words, once the desire to not become a psychopath is there in my source code, I will do my best not to become one, even if I have the ability to modify my source code.
That makes sense. My intention was not to argue from the position of it becoming a psychopath though (my apologies if it came out that way)...but instead from a perspective of an entity which starts-out as supposedly Aligned (centered-on human safety, let’s say), but then, bc it’s orders of magnitude smarter than we are (by definition), it quickly develops a different perspective. But you’re saying it will remain ‘aligned’ in some vitally-important way, even when it discovers ways the code could’ve been written differently?
The AI would be expected to care about preserving its motivations under self-modification for similar reasons as it would care about defending them against outside intervention. There could be a window where the AI operates outside immediate human control but isn’t yet good at keeping its goals stable under self-modification. It’s been mentioned as a concern in the past; I don’t know what the state of current thinking is.