This is something I’ve never really understood. I can understand wanting to keep any moves directly towards creating an AI quiet—if you create 99% of an AI and someone else does the other 1%, goodbye world. It may not be optimal, but it’s a comprehensible position.
But the work on decision theory is presumably geared towards codifying Friendliness in such a way that an AI could be ‘guaranteed Friendly’. That seems like the kind of thing that would be aided by having many eyeballs looking at it, while being useless for anyone who wanted to put together a cobbled-together quick-results AI.
...a constructive theory of the world’s second most important math problem, reflective decision systems, is necessarily a constructive theory of seed AI; and constitutes, in itself, a weapon of math destruction, which can be used for destruction more quickly than to any good purpose. Any Singularity-value I attach to publicizing Friendly AI would go into explaining the problem. Solutions are far harder than this and will be specialized on particular constructive architectures.
So in a nutshell, he thinks solving decision theory will make building unfriendly AIs much easier. This doesn’t sound right to me because we already have idealized models like Solomonoff induction or AIXI, and they don’t help much with building real-world approximations to these ideals, so an idealized perfect solution to decision theory isn’t likely to help much either. But maybe he has some insight that I don’t.
I think Eliezer must have changed his mind after writing those words, because his TDT book was written for public consumption all along. (He gave two reasons for not publishing it sooner: he wanted to see if a university would offer him a PhD based on it, and he was using DT as a problem to test potential FAI researchers.) I guess his current lack of participation in our DT mailing list is probably due to some combination of being busy with his books and lack of significant new insights.
I think TDT is different from the “reflective decision systems” he was talking about, which sounds like it refers to a theory specifically of self-modifying agents.
Ah.
I see what he means, if you’re talking about a) just the ‘invariant under reflection’ part and not Friendliness and b) you’re talking about a strictly pragmatic tool. That makes sense.
This is something I’ve never really understood. I can understand wanting to keep any moves directly towards creating an AI quiet—if you create 99% of an AI and someone else does the other 1%, goodbye world. It may not be optimal, but it’s a comprehensible position. But the work on decision theory is presumably geared towards codifying Friendliness in such a way that an AI could be ‘guaranteed Friendly’. That seems like the kind of thing that would be aided by having many eyeballs looking at it, while being useless for anyone who wanted to put together a cobbled-together quick-results AI.
Eliezer stated his reasons here:
So in a nutshell, he thinks solving decision theory will make building unfriendly AIs much easier. This doesn’t sound right to me because we already have idealized models like Solomonoff induction or AIXI, and they don’t help much with building real-world approximations to these ideals, so an idealized perfect solution to decision theory isn’t likely to help much either. But maybe he has some insight that I don’t.
I think Eliezer must have changed his mind after writing those words, because his TDT book was written for public consumption all along. (He gave two reasons for not publishing it sooner: he wanted to see if a university would offer him a PhD based on it, and he was using DT as a problem to test potential FAI researchers.) I guess his current lack of participation in our DT mailing list is probably due to some combination of being busy with his books and lack of significant new insights.
I think TDT is different from the “reflective decision systems” he was talking about, which sounds like it refers to a theory specifically of self-modifying agents.
That’s the first time I noticed the pun. Good one. I want a tshirt.
Ah. I see what he means, if you’re talking about a) just the ‘invariant under reflection’ part and not Friendliness and b) you’re talking about a strictly pragmatic tool. That makes sense.