I am writing it as a comment, not as an answer (the answers, I suspect, are more social; people are not doing this yet, because the methods which would work capability-wise are mostly still in their blind spots).
two counter-arguments to this
Technically, it has been too difficult to do it this way. But it is becoming less and less difficult, and various versions of this route are becoming more and more feasible.
Although, the ability to predict behavior is still fundamentally limited, because the systems like that become complex really easy (one can have very complex behavior with really small number of parameters), and because they will interact with complex world around them (so one really needs to reason about the world containing software systems like this; even if software systems themselves are transparent and interpretable, if they are smart, the overall dynamics might be highly non-trivial).
This kind of paradigm (if it works) makes it much easier to modify these systems, so it is much easier to have self-modifying AIs, or, more likely, self-modifying ecosystems of AIs producing changing populations of AI systems.
Capability-wise, this is likely to give such systems a boost competing with current systems where self-modification is less fluent and so far rather sluggish.
But this is even more foomable than the status quo. So one really needs to solve AI existential safety for a self-evolving, better and better self-modifying ecosystem of AIs, this is even more urgent with this approach than with the current mainstream.
Might this problem be easier to solve here? Perhaps… At least, with (self-)modification being this fluent and powerful, one can direct it this way and that way more easily than with more sluggish and resistant methods. But, on the other hand, it is very easy to end up with a situation where things are changing even faster and are even more difficult to understand...
I do like looking at this topic, but the safety-related issues in this approach are, if anything, even more acute (faster timelines + very fluently reconfigurable machines)...
I expect that it is much more likely that most people are looking at the current state of the art and don’t even know or think about other possible systems and just narrowly focus on aligning the state of the art, not considering creating a “new paradigm”, because they think that would just take too long.
I would be surprised if there were a lot of people who carefully thought about the topic and used the following reasoning procedure:
“Well, we could build AGI in an understandable way, where we just discover the algorithms of intelligence. But this would be bad because then we would understand intelligence very well, which means that the system is very capable. So because we understand it so well now, it makes it easier for us to figure out how to do lots of more capability stuff with the system, like making it recursively self-improving. Also, if the system is inherently more understandable, then it would also be easier for the AI to self-modify because understanding itself would be easier. So all of this seems bad, so instead we shouldn’t try to understand our systems. Instead, we should use neural networks, which we don’t understand at all, and use SGD in order to optimize the parameters of the neural network such that they correspond to the algorithms of intelligence, but are represented in such a format that we have no idea what’s going on at all. That is much safer because now it will be harder to understand the algorithms of intelligence, making it harder to improve and use. Also if an AI would look at itself as a neural network, it would be at least a bit harder for it to figure out how to recursively self-improve.”
Obviously, alignment is a really hard problem and it is actually very helpful to understand what is going on in your system at the algorithmic level in order to figure out what’s wrong with that specific algorithm. How is it not aligned? And how would we need to change it in order to make it aligned? At least, that’s what I expect. I think not using an approach where the system is interpretable hurts alignment more than capabilities. People have been steadily making progress at making our systems more capable and not understanding them at all, in terms of what algorithms they run inside, doesn’t seem to be much of an issue there, however for alignment that’s a huge issue.
I am writing it as a comment, not as an answer (the answers, I suspect, are more social; people are not doing this yet, because the methods which would work capability-wise are mostly still in their blind spots).
Technically, it has been too difficult to do it this way. But it is becoming less and less difficult, and various versions of this route are becoming more and more feasible.
Although, the ability to predict behavior is still fundamentally limited, because the systems like that become complex really easy (one can have very complex behavior with really small number of parameters), and because they will interact with complex world around them (so one really needs to reason about the world containing software systems like this; even if software systems themselves are transparent and interpretable, if they are smart, the overall dynamics might be highly non-trivial).
This kind of paradigm (if it works) makes it much easier to modify these systems, so it is much easier to have self-modifying AIs, or, more likely, self-modifying ecosystems of AIs producing changing populations of AI systems.
Capability-wise, this is likely to give such systems a boost competing with current systems where self-modification is less fluent and so far rather sluggish.
But this is even more foomable than the status quo. So one really needs to solve AI existential safety for a self-evolving, better and better self-modifying ecosystem of AIs, this is even more urgent with this approach than with the current mainstream.
Might this problem be easier to solve here? Perhaps… At least, with (self-)modification being this fluent and powerful, one can direct it this way and that way more easily than with more sluggish and resistant methods. But, on the other hand, it is very easy to end up with a situation where things are changing even faster and are even more difficult to understand...
I do like looking at this topic, but the safety-related issues in this approach are, if anything, even more acute (faster timelines + very fluently reconfigurable machines)...
I expect that it is much more likely that most people are looking at the current state of the art and don’t even know or think about other possible systems and just narrowly focus on aligning the state of the art, not considering creating a “new paradigm”, because they think that would just take too long.
I would be surprised if there were a lot of people who carefully thought about the topic and used the following reasoning procedure:
“Well, we could build AGI in an understandable way, where we just discover the algorithms of intelligence. But this would be bad because then we would understand intelligence very well, which means that the system is very capable. So because we understand it so well now, it makes it easier for us to figure out how to do lots of more capability stuff with the system, like making it recursively self-improving. Also, if the system is inherently more understandable, then it would also be easier for the AI to self-modify because understanding itself would be easier. So all of this seems bad, so instead we shouldn’t try to understand our systems. Instead, we should use neural networks, which we don’t understand at all, and use SGD in order to optimize the parameters of the neural network such that they correspond to the algorithms of intelligence, but are represented in such a format that we have no idea what’s going on at all. That is much safer because now it will be harder to understand the algorithms of intelligence, making it harder to improve and use. Also if an AI would look at itself as a neural network, it would be at least a bit harder for it to figure out how to recursively self-improve.”
Obviously, alignment is a really hard problem and it is actually very helpful to understand what is going on in your system at the algorithmic level in order to figure out what’s wrong with that specific algorithm. How is it not aligned? And how would we need to change it in order to make it aligned? At least, that’s what I expect. I think not using an approach where the system is interpretable hurts alignment more than capabilities. People have been steadily making progress at making our systems more capable and not understanding them at all, in terms of what algorithms they run inside, doesn’t seem to be much of an issue there, however for alignment that’s a huge issue.