Maybe the self-improving system will get worse—or fail to get better. I wasn’t arguing that success was inevitable, just that the argument for near-certain failure due to compound interest on a small probability of failure is wrong.
Maybe we could slap together a half-baked intelligent agent, and it could muddle through and fix itself as it grew smarter and learned more about its intended purpose. That approach doesn’t follow the proposed methodology—and yet it evidently doesn’t have a residual probability of failure that accumulates and eventually dominates. So the idea that—without following the proposed methodology you are doomed—is wrong.
Your argument depends on the relative size of “success” where random stumbling needs to end up in, and its ability to attract the corrections. If “success” is something like “consequentialism”, I agree that intermediate errors might “correct” themselves (in some kind of selection process), and the program ends up as an agent. If it’s “consequentialism with specifically goal H”, it doesn’t seem like there is any reason for the (partially) random stumbling to end up with goal H and not some other goal G.
(Learning what its intended purpose was doesn’t seem different from learning what the mass of the Moon is, it doesn’t automatically have the power of directing agent’s motivations towards that intended purpose, unless for example this property of going towards the original intended purpose is somehow preserved in all the self-modifications, which does sound like a victory condition.)
I am not sure you can legitimately characterise the efforts of an intelligent agent as being “random stumbling”.
Anyway, I was pointing out a flaw in the reasoning supporting a small probability of failure (under the described circumstances). Maybe some other argument supports a small probability of failure. However, the original argument would still be wrong.
Other approaches—including messy ones like neural networks—might result in a stable self-improving system with a desirable goal, apart from trying to develop a deterministic self-improving system that has a stable goal from the beginning.
A good job too. After all, those are our current circumstances. Complex messy systems like Google and hedge funds are growing towards machine intelligence—while trying to preserve what they value in the process.
Maybe the self-improving system will get worse—or fail to get better. I wasn’t arguing that success was inevitable, just that the argument for near-certain failure due to compound interest on a small probability of failure is wrong.
Maybe we could slap together a half-baked intelligent agent, and it could muddle through and fix itself as it grew smarter and learned more about its intended purpose. That approach doesn’t follow the proposed methodology—and yet it evidently doesn’t have a residual probability of failure that accumulates and eventually dominates. So the idea that—without following the proposed methodology you are doomed—is wrong.
Your argument depends on the relative size of “success” where random stumbling needs to end up in, and its ability to attract the corrections. If “success” is something like “consequentialism”, I agree that intermediate errors might “correct” themselves (in some kind of selection process), and the program ends up as an agent. If it’s “consequentialism with specifically goal H”, it doesn’t seem like there is any reason for the (partially) random stumbling to end up with goal H and not some other goal G.
(Learning what its intended purpose was doesn’t seem different from learning what the mass of the Moon is, it doesn’t automatically have the power of directing agent’s motivations towards that intended purpose, unless for example this property of going towards the original intended purpose is somehow preserved in all the self-modifications, which does sound like a victory condition.)
I am not sure you can legitimately characterise the efforts of an intelligent agent as being “random stumbling”.
Anyway, I was pointing out a flaw in the reasoning supporting a small probability of failure (under the described circumstances). Maybe some other argument supports a small probability of failure. However, the original argument would still be wrong.
Other approaches—including messy ones like neural networks—might result in a stable self-improving system with a desirable goal, apart from trying to develop a deterministic self-improving system that has a stable goal from the beginning.
A good job too. After all, those are our current circumstances. Complex messy systems like Google and hedge funds are growing towards machine intelligence—while trying to preserve what they value in the process.