It’s been submitted, but I haven’t gotten any word on whether it’s accepted yet.
EDIT: Accepted!
It’s been submitted, but I haven’t gotten any word on whether it’s accepted yet.
EDIT: Accepted!
Goodness: an attempt at doing something useful!
Thanks :)
And thanks for pointing out your site on architectures—I’ll have to take a look at that.
I still can’t see why the AI, when deciding “A or B”, is allowed to simply deduce consequences, while when deciding “Deduce or Guess” it is required to first deduce that the deducer is “useful”, then deduce consequences. The AI appears to be using two different decision procedures, and I don’t know how it chooses between them.
Can you define exactly when usefulness needs to be deduced? In either case, it seems that it can deduce consequences in either case without deducing usefulness.
Apologies if I’m being difficult; if you’re making progress as it is (as implied by your idea about “probably”s), we can drop this and I’ll try to follow along again next time you post.
Thanks for posting this around! It’s great to see it creating discussion.
I’m working on replies to the points you, Bill Hibbard, and Curt Welch have made. It looks like I have some explaining to do if I want to convince you that O-maximizers aren’t a subset of reward maximizers—in particular, that my argument in appendix B doesn’t apply to O-maximizers.
Response to Bill Hibbard:
It seems to me that every O-maximizer can be expressed as a reward maximizer. Specifically, comparing equations (2) and (3), given an O-maximizer we can define reward r sub(m) (by this notation I mean “r subscript m”) as:
r sub(m) = SUM(r in R) U(r)P(r|yx sub(<=m))
and r sub(i) = 0 for i<m, where the paper sets m to the final time step, following Nick Hay. The reward maximizer so defined will behave identically with the O-maximizer.
In the reward-maximization framework, rewards are part of observations and come from the environment. You cannot define “r sub(m)” to be equal to something mathematically, then call the result a reward-maximizer; therefore, Hibbard’s formulation of an O-maximizer as a reward-maximizer doesn’t work.
If this is correct, doesn’t the “characteristic behavior pattern” shown for reward maximizers in Appendix B, as stated in Section 3.1, also apply to O-maximizers?
Since the construction was incorrect, this argument does not hold.
Response to Curt Welch:
Sadly, what he seems to have failed to realize, is that any actual implementation of an O-Maximizer or his Value-learners must also be reward maximizerr. Is he really that stupid so as not to understand they are all reward maximizer?
Zing! I guess he didn’t think I was going to be reading that. To be fair, it may seem to him that I’ve made a stupid error, thinking that O-maximizers behave differently than reward maximizers. I’ll try to explain why he’s mistaken.
A reward maximizer acts so as to bring about universes in which the rewards it receives are maximized. For this reason, it will predict and may manipulate the future actions of its rewarder.
An O-maximizer with utility function U acts so as to bring about universes which score highly according to U. For this reason, it is quite unlikely to manipulate or alter its utility function, unless its utility function directly values universes in which it self-alters.
In particular, note that an O-maximizer does not act so as to bring about universes in which the utility it assigns to the universe is maximized. Where the reward maximizer predicts and “cares about” what the rewarder will say tomorrow, an O-maximizer uses its current utility function to evaluate futures and choose actions.
O-maximizers and reward maximizers have different relationships with their “motivators” (utility function vs. rewarder), and they behave differently when given the option to alter their motivators. It seems clear to me that they are distinct.
The only difference is in the algorithm it uses to calculate the “expected value”. Dose he not understand that if you build a machine to do this, that there must be hardware in the machine that calculates that expected value? And that such a machine can then be seen as two machines, one which is calculating the expected value, and the other which is picking actions to maximize the output of that calculation? And once you have that machine, his argument of appendix B once again applies?
Actually trying to apply the argument in Appendix B to an O-maximizer, implemented or in the abstract, using the definitions given in the paper instead of reasoning by analogy, is sufficient to show that this is also incorrect.
An agent of unbounded intelligent will always reach a point of understanding he has the option to try and modify the reward function which means the wirehead problem is always on the table.
It may have the option, but will it be motivated to alter its “reward function”? Consider an O-maximizer with utility function U. It acts to maximize the universe’s utility as measured by U. How would the agent’s alteration of its own utility function bring about universes that score highly according to U?
Ah, I see. Thanks for taking the time to discuss this—you’ve raised some helpful points about how my argument will need to be strengthened (“universal action” is good food for thought) and clarified (clearly, my account of wireheading is unconvincing).
The paper’s been accepted, and I have a ton of editing to do (need to cut four pages!), so I may not be very quick to respond for the time being. I didn’t want to disappear without warning, and without saying thanks for your time!
Thank you! This is great!
I especially enjoyed the talk about Probabilistic Programs.
Well, overall.
I think most people understood the basic argument: powerful reinforcement learners would behave badly, and we need to look for other frameworks. Pushing that idea was my biggest goal at the conference. I didn’t get much further than that with most people I talked to.
Unfortunately, almost nobody seemed convinced that it was an urgent issue, or one that could be solved, so I don’t expect many people to start working on FAI because of me. Hopefully repeated exposure to SI’s ideas will convince people gradually.
Common responses I got when I failed to convince someone included:
“I don’t care what AGIs do, I just want to solve the riddle of intelligence.”
“Why would you want to control an AGI, instead of letting it do what it wants to?”
“Our system has many different reward signals, not just one. It has hunger, boredom, loneliness, etc.”
Thanks!
I think “Complex Value Systems are Required to Realize Valuable Futures” was peer-reviewed before it appeared in AGI-11, if that helps.
See also Pearl’s paper An Axiomatic Characterization of Causal Counterfactuals (ftp.cs.ucla.edu/pub/stat_ser/R250.pdf) for remarks on the relationship to Lewis.
I found this reaction enlightening. Thanks for writing it up.
I was dismayed that Pei has such a poor opinion of the Singularity Institute’s arguments, and that he thinks we are not making a constructive contribution. If we want the support of the AGI community, it seems we’ll have to improve our communication.
Good point!
Nice work!
I like that it’s very concise and readable, and wouldn’t want that to get lost—too many math papers are very dry! I do think, though, that a bit more formality would make mathematical readers more comfortable and confident in your results.
I would move the “note on language” to a preliminaries/definitions section. Though it’s nice that the first section is accessible to non-mathy people, don’t think that’s the way to go for publication. Defining terms like “computer program” and “source code” precisely before using them would look more trustworthy, and using the Kleene construction explicitly in all sections, instead of referring to quine techniques, seems better to me.
I probably wouldn’t mention coding style or comments unless your formalism for programs explicitly allows them.
I found it confusing that while A and B are one type of object, C and D are totally different. I think it’s also more conventional to use capital letters for sets and lowercase letters for individual objects.
This paper touches on many similar topics, and I think it balances precision and readability very well; maybe you can swipe some notation and structure ideas from there?
Will do.
I think there’s a straightforward answer to this part of your question.
“Decisions are informed by the output of my deduction engine” means, I presume, that the AI deduces the result of each possible choice, then makes the choice with the most desirable result.
Suppose the AI is considering replacing its deduction engine with a pseudo-random statement evaluator. The AI will deduce that this replacement will make future decisions pseudo-random. I assume that this will almost never give the most desirable result; therefore, the deducer will almost never be replaced.
If the AI works this way (which I think it compatible with just about any kind of decision theory), its display of “trust” in itself is the direct result of the AI using its deducer to predict the result of replacing its deducer; “trust” is inherent in the way the deducer determines decisions. The deduction engine is able to “tell you that [the] deduction engine is useful” not by deducing its own consistency, but by deducing a better result if the AI keeps it than if the AI replaces it with a pseudo-random substitute.
Does that make sense, and/or suggest anything about the larger issue you’re talking about?