abramdemski comments on Do Sufficiently Advanced Agents Use Logic?

abramdemski 20 Sep 2019 18:32 UTC
LW: 4 AF: 2
0
AF
The main difference between our ways of thinking seems to be: You are thinking in terms of, what happens inside the algorithm. I am thinking in terms of, what desiderata does the algorithm satisfy. I trust the latter approach more, because, while we can only speculate how intelligent agents work, we should be able to explain what an intelligent agent is: after all, something made us study intelligent agents in the first place. (Of course we do have some clues about how intelligent agents work from cognitive science and applied ML, but we are far from having the full picture.) In particular, I don’t directly care about the distinction between model-based and model-free.
I talked a lot about desiderata in both part (1) and part (2)! Granted, I think about these things somewhat differently than you. And I did not get very concrete with desiderata.
Part (1a) was something like “here are some things which seem like real phenomena when you look at intelligence, and how they relate to learning-theoretic criteria when we formalize them” (ie, regret bounds on reward vs regret bounds on predictive accuracy). Part (1b) was like “here are some other things which seem to me like real phenomena relating to intelligence, which a formal theory of intelligence should not miss out on, and which seem real fol very similar reasons to the more-commonly-discussed stuff from section 1a”.
Really, though, I think in terms of the stuff in (2) much more than the stuff in (1). I’m not usually thinking: “It sure seems like real agents reason using logic. How can we capture that formally in our rationality criteria?” Rather, I’m usually thinking things like: “If agents know a good amount about each other due to access to source code or other such means, then a single-shot game has the character of iterated game theory. How can we capture that in our rationality criteria?”
Perhaps one difference is that I keep talking about modeling something with regret bounds, whereas you are talking about achieving regret bounds. IE, maybe you more know what notion of regret you want to minimize, and are going after algorithms which help you minimize it; and I am more uncertain about what notion of regret is relevant, and am looking for formulations of regret which model what I am interested in.
Specifically, the desiderata I often consider is some type of regret bound. This desiderata doesn’t directly refer to prediction: it only cares about expected utility! Your reasoning would have us conclude it corresponds to model-free RL. However, the actual algorithms I use to prove regret bounds use posterior sampling: a model-based approach! So, not only that predictions enter the picture, but they enter the picture in a way that is clearly justified rather than ad hoc.
I see what you mean about not fitting the model-based/model-free dichotomy the way I set it up. In terms of the argument I try to make in the post, I’m drawing an analogy between prediction and logic. So I’m saying that (if we think of both prediction and logic as “proxies” in the way I describe) it is plausible that formal connections to prediction have analogs in formal connections to logic, because both seem to play a similar role in connection to doing well in terms of action utility.
(That said, the definition of Bayesian regret already refers to a prior, and the coefficients in the regret bounds depend on this prior, so I do introduce a notion of “model” a priori.)
Are you referring to results which require realizability here? (I know it can make sense to get a regret bound in terms of the prior probability w/o assuming that one of the things in the prior has the best asymptotic loss, but, it is common to need such an assumption)
Part of what I’m saying in the second part of the original post is that I find that I have to introduce logic a priori to discuss certain kinds of regret I think about, similarly to how here you are forced to introduce “model” a priori for certain kinds of regret. (I suspect I didn’t explain this very well because I thought my motivations were clear but underestimated the inferential gap.)
I do not agree that this is a “dire situation for learning theory”. Quite the contrary, this is exactly the situation I formalized using instrumental reward functions. Unless you think instrumental reward functions are not good enough for some reason? (Of course, in my setting, the agent can get feedback about utility, it just doesn’t get it all the time for free. But, the only alternative seems to be “flying spaghetti monster” utility functions that are completely unobservable, and I am not sure that’s an interesting case to study.)
I had in mind the setting where no particular assumption is made about the observability of utility at all, like in classic settings such as Savage etc. I’m not claiming it is very relevant to the discussion at hand. It’s just that there is some notion of rationality to be studied in such cases. I also misspoke in “dire situation for learning theory”—I don’t want to prematurely narrow what learning theory can fruitfully discuss. Something more like “dire situation for reinforcement learning” would have been more accurate.
It might be useful to consider arbitrary programs, but the agent will still not execute any of them for more than some small finite time. You might argue that this is why it needs logic, to prove theorems about computations that it cannot execute. But, I don’t believe this line of thinking leads to anything tractable that is not redundant w.r.t. a simpler logic-less approach.
I was definitely not suggesting that the agent would have to use logic in order to get feedback about programs which take too long to execute. (It could do that, and it could be useful, but it isn’t my argument for why the agent uses logic-like reasoning.) Rather, the idea was more that by virtue of testing out ways of reasoning on the programs which can be evaluated, the agent would learn to evaluate more difficult cases which can’t simply be run, which would in turn sometimes prove useful in dealing with the external environment.
One related idea that I came up with recently is what I call “Turing reinforcement learning” (TRL). [...]
Yeah, this would get at some of the idea of learning more about the environment by learning about computations. In addition to the potential for management of computational resources, it would illustrate something about how incomplete models help with logical uncertainty.
I think some of our disagreement has to do with my desire to model things which humans seem to do just fine (eg, heuristically estimate the probability of sentences in PA). Attempting to make models of this easily leads to infeasible things (such as logical induction). My reaction to this is: (1) at least now we have some idea about what it means to do well at the task; (2) but yeah, this means this isn’t how humans are doing it, so there is still something left to understand.
But, you have to set an agent with a training sequence, otherwise how does it ever figure out the rules of chess? I definitely agree that in some situations computational complexity is the bottleneck rather the sample complexity. My point was different: if you don’t add the context of a training sequences, you might end up with philosophical questions that are either poor starting points or don’t make sense at all.
I’m not saying you don’t have a training sequence at all. I’m saying the training sequence is enough to figure out the rules of chess (perhaps from indirect information, not necessarily ever playing chess yet), but not enough to learn to play well from experience.
Obviously in order to understand the rules of chess quickly from a human explanation, the system would need to have already learned quite a lot about the world (or have been specialized to this sort of task).
The point is that there is also something to gain from training to see how computations go (perhaps computations which look like counterfactuals, such as the game tree in chess).
Yeah, I’m now feeling more confident that the difference in perspectives is this: you are thinking of a relatively fixed notion of regret, and want things to be relevant to achieving good bounds in terms of that. I am more wanting a notion of regret that successfully captures something I’m trying to understand. EG, logical induction was a big improvement to my understanding of the phenomenon of conjecture in mathematics—how it can be understood within the context of a rationality criterion. It doesn’t directly have to allow me to improve regret bounds for reward learners. It’s an important question whether it helps us do anything we couldn’t before in terms of reward learning. (For me it certainly did, since I was thinking in terms of Bayesians with full models before—so logical induction showed me how to think about partial models and such. Perhaps it didn’t show you how to do anything you couldn’t already, since you were already thinking about things in optimal predictor terms.) But it is a different question from whether it tells us anything about intelligence at all.
- Vanessa Kosoy 24 Sep 2019 17:46 UTC
  LW: 4 AF: 2
  0
  AF Parent
  
  I had in mind the setting where no particular assumption is made about the observability of utility at all, like in classic settings such as Savage etc. I’m not claiming it is very relevant to the discussion at hand. It’s just that there is some notion of rationality to be studied in such cases. I also misspoke in “dire situation for learning theory”—I don’t want to prematurely narrow what learning theory can fruitfully discuss. Something more like “dire situation for reinforcement learning” would have been more accurate.
  
  By the way, from the perspective of what I call “subjective” learnability, we can consider this case as well. In subjective learnability, we define regret with respect to an optimal agent that has all of the user’s knowledge (rather than knowing the actual “true” environment). So, it seems not difficult to imagine communication protocols that will allow the AI learn the user’s knowledge including the unobservable reward function (we assume the user imagines the environment as a POMDP with rewards that are not directly observable but defined as a function of the state). (Incidentally, recently I am leaning towards the position that some of kind of generic communication protocol (with natural language “annotation” + quantilization to avoid detrimental messages) is the best approach to achieve alignment.)
- Vanessa Kosoy 21 Sep 2019 11:49 UTC
  LW: 2 AF: 1
  0
  AF Parent
  
  I’m not usually thinking: “It sure seems like real agents reason using logic. How can we capture that formally in our rationality criteria?” Rather, I’m usually thinking things like: “If agents know a good amount about each other due to access to source code or other such means, then a single-shot game has the character of iterated game theory. How can we capture that in our rationality criteria?”
  
  I would study this problem by considering a population of learning agents that are sequentially paired up for one-shot games where the source code of each is revealed to the other. This way they can learn how to reason about source codes. In contrast, the model where the agents try to formally prove theorems seems poorly motivated to me.
  
  Perhaps one difference is that I keep talking about modeling something with regret bounds, whereas you are talking about achieving regret bounds. IE, maybe you more know what notion of regret you want to minimize, and are going after algorithms which help you minimize it; and I am more uncertain about what notion of regret is relevant, and am looking for formulations of regret which model what I am interested in.
  
  Hmm, no, I don’t think this is it. I think that I do both of those. It is definitely important to keep refining our regret criterion to capture more subtle optimality properties.
  
  So I’m saying that (if we think of both prediction and logic as “proxies” in the way I describe) it is plausible that formal connections to prediction have analogs in formal connections to logic, because both seem to play a similar role in connection to doing well in terms of action utility.
  
  Do they though? I am not convinced the logic has a meaningful connection to doing well in terms of action utility. I think that for prediction we can provide natural models that show how prediction connects to doing well, whereas I can’t say the same about logic. (But I would be very interested in being proved wrong.)
  
  Are you referring to results which require realizability here?
  
  I usually assume realizability, but for incomplete model “realizability” just means that the environment satisfies some incomplete models (i.e. has some regularities that the agent can learn).
  
  ...the idea was more that by virtue of testing out ways of reasoning on the programs which can be evaluated, the agent would learn to evaluate more difficult cases which can’t simply be run, which would in turn sometimes prove useful in dealing with the external environment.
  
  Yes, I think this idea has merit, and I explained how TRL can address it in another comment. I don’t think we need or should study this using formal logic.
  
  ...I was thinking in terms of Bayesians with full models before—so logical induction showed me how to think about partial models and such. Perhaps it didn’t show you how to do anything you couldn’t already, since you were already thinking about things in optimal predictor terms.
  
  Hmm, actually I think logical induction influenced me also to move away from Bayesian purism and towards incomplete models. I just don’t think logic is the important part there. I am also not sure about the significance of this forecasting method, since in RL it is more natural to just do maximin instead. But, maybe it is still important somehow.
  - Vanessa Kosoy 21 Sep 2019 13:26 UTC
    LW: 2 AF: 1
    0
    AF Parent
    
    ...”If agents know a good amount about each other due to access to source code or other such means, then a single-shot game has the character of iterated game theory. How can we capture that in our rationality criteria?”
    
    I would study this problem by considering a population of learning agents that are sequentially paired up for one-shot games where the source code of each is revealed to the other. This way they can learn how to reason about source codes. In contrast, the model where the agents try to formally prove theorems seems poorly motivated to me.
    
    Actually, here’s another thought about this. Consider the following game: each player submits a program, then the programs are given each other as inputs, and executed in anytime mode (i.e. at given moment each program has to have some answer ready). The execution has some probability $ϵ$ to terminate on each time step, so that the total execution time follows the geometric distribution. Once execution is finished, the output of the programs is interpreted as strategies in a normal-form game and the payoffs are accordingly.
    
    This seems quite similar to an iterated game with geometric time discount! In particular, in this version of the Prisoner’s Dilemma, there is the following analogue of the tit-for-tat strategy: start by setting the answer to C, simulate the other agent, once the other agents starts producing answers, set your answer to equal to the other agent’s answer. For sufficiently shallow time discount, this is a Nash equilibrium.
    
    In line with the idea I explained in a previous comment, it seems tempting to look for proper/thermal equilibria in this game with some constraint on the programs. One constraint that seems appealing is: force the program to have $O (1)$ space complexity modulo oracle access to the other program. It is easy to see that a pair of such programs can be executed using $O (1)$ space complexity as well.