scasper(Stephen Casper)
Dissolving Confusion around Functional Decision Theory
I really like this analysis. Luckily, with the right framework, I think that these questions, though highly difficult, are technical but no longer philosophical. This seems like a hard question of priors but not a hard question of framework. I speculate that in practice, an agent could be designed to adaptively and non-permanently modify its actions and source code to slick past many situations, fooling predictors exploiting non-mere correlations when helpful.
But on the other hand, maybe a way to induce a great amount of uncertainty to mess up certain agents, you could introduce a set of situations like this to them.
For a few of my thoughts on these situations, I also wrote this post on Medium. https://medium.com/@thestephencasper/decision-theory-ii-going-meta-5bc9970bd2b9. However, I’m sure you’ve thought more in depth about these issues than this post goes. I think that the example of mind policing predictors is sufficient to show that there is no free lunch in decision theory. for every decision theory, there is a mind police predictor that will destroy it.
I wrote a LW post as a reply to this. I explain several points of disagreement with MacAskill and Y&S alike. See here.
I’m skeptical of this. Non-mere correlations are consequences of an agent’s source-code producing particular behaviors that the predictor can use to gain insight into the source-code itself. If an agent adaptively and non-permanently modifies its souce-code, this (from the perspective of a predictor who suspects this to be true), de-correlates it’s current source code from the non-mere correlations of its past behavior—essentially destroying the meaning of non-mere correlations to the extent that the predictor is suspicious.
Oh yes. I agree with what you mean. When I brought up the idea about an agent strategically acting certain ways or overwriting itself to confound the predictions that adversarial predictors, I had in mind that the correlations that such predictors used could be non-mere w.r.t. the reference class of agents these predictors usually deal but still confoundable by our design of the agent and thereby non mere to us.
For instance, given certain assumptions, we can make claims about which decision theories are good. For instance, CDT works amazingly well in the class of universes where agents know the consequences of all their actions. FDT (I think) works amazingly well in the class of universes where agents know how non-merely correlated their decisions are to events in the universe but don’t know why those correlations exist.
+1 to this. I agree that this is the right question to be asking, that it depends on a lot of assumptions about how adversarial an environment is, and that FDT does indeed seem to have some key advantages.
Also as a note, sorry for some differences in terminology between this post and the one I linked to on my Medium blog.
Huge thanks. I appreciate it. Fixed.
Thanks for the comment. I think it’s exciting for this to make it into the newsletter. I am glad that you liked these principles.
I think that even lacking a concept of free will, FDT can be conveniently thought of applying to humans through the installation of new habits or ways of thinking without conflicting with the framework that I aim to give here. I agree that there are significant technical difficulties in thinking about when FDT applies to humans, but I wouldn’t consider them philosophical difficulties.
Solipsism is Underrated
I agree—thanks for the comment. When writing this post, my goal was to share a reflection on solipsism in a vacuum rather than in context of decision theory. I acknowledge that solipsism doesn’t really tend to drive someone toward caring much about others and such. In that sense, it’s not very productive if someone is altruistically/externally motivated.
I don’t want to give any impression that this is a particularly important decision theoretic question. :)
Thanks for the comment. I’m not 100% on the computers analogy. I think answering the hard problem of consciousness is significantly different compared to understanding how complex information processing systems like computers work. Any definition or framing of consciousness in terms of informational or computational theory may allow it to be studied in those terms in the same way that computers are can be understood by system based theoretical reasoning based on abstraction. However, I don’t think this is what it means to solve the hard problem of consciousness. It seems more like solving the problem with a definition rather than an explanation.
I wonder how much differing perspectives here are due to differing intuitions. But in any case, I hope this makes my thinking more clear.
Great comment. Thanks.
In the case of idealism, we call the ontological primitive “mental”, and we say that external phenomena don’t actually exist but instead we just model them as if they existed to predict experiences. I suppose this is a consistent view and isn’t that different in complexity from regular materialism.
I can’t disagree. This definitely shifts my thinking a bit. I think that solipsism + structured observations might be comparable in complexity to materialism + an ability for qualia to arise from material phenomena. But at that point the questions hinges a bit on what we think is spookier. I’m convinced that a material solution to the hard problem of consciousness is spooky. I think I could maybe be convinced that hallucinating structured observations might be similarly spooky.
And I think you’re right about the problem of knowing what we’re talking about.
Thanks! This is insightful.
What exactly would it mean to perform a baysian update on you not experiencing qualia?
Good point. In an anthropic sense, the sentence this is a reply to could be redacted. Experiencing qualia themselves would not be evidence to prefer one theory over another. Only experiencing certain types of observations would cause a meaningful update.
The primitives of materialism are described in equations. Does a solipsist seek an equation to tell them how angry they will be next Tuesday? If not, what is the substance of a solipsistic model of the world?
I think this is the same type of argument as saying that other people whom I observe seem to be very similar to me. The materialistic interpretation makes us believe in a less capricious world, but there’s the trouble of explaining how conscious results from material phenomena. This is similar to my thoughts on the final 4 paragraphs of what you wrote.
I am not sure what you mean my that, I consider my mind to be just an arrangement of atoms. An arrangement governed by the same laws as the rest of the universe.
I think that works well. But I don’t think that subjective experience falls out of this interpretation for free.
Thanks.
I disagree a bit. My point has been that it’s easy for solipsism to explain consciousness and hard to materialism to. But it’s easy for materialism to account for structure and hard solipsism to. Don’t interpret the post as my saying solipsism wins—just that it’s underrated. I also don’t say qualia must be irreducible, just that there’s spookiness if they are.
Procrastination Paradoxes: the Good, the Bad, and the Ugly
The Achilles Heel Hypothesis for AI
Thanks for the comment. +1 to it. I also agree that this is an interesting concept: using Achilles Heels as containment measures. There is a discussion related to this on page 15 of the paper. In short, I think that this is possible and useful for some achilles heels and would be a cumbersome containment measure for others which could be accomplished more simply via bribes of reward.
It’s really nice to hear that the paper seems clear! Thanks for the comment.
I’ve been working on this since March, but at a very slow pace, and I took a few hiatuses. most days when I’d work on it, it was for less than an hour. After coming up with the initial framework to tie things together, the hardest part was trying and failing to think of interesting ways in which most of the achilles heels presented could be used as novel containment measures. I discuss this a bit in the discussion section.
For 2-3, I can give some thoughts, but these aren’t necessarily through through much more than many other people one could ask.
I would agree with this. From an agent to even have a notion of being turned off, it would need some sort or model that accounts for this but which isn’t learned via experience in a typical episodic learning setting (clearly because you can’t learn after you’re dead). This would all require a world model which would be more sophisticated than any sort of model-based RL techniques of which I know would be capable of by default.
I also would agree. The most straightforward way for these problems to emerge is if a predictor has access to source code. Though sometimes they can occur if the predictor has access to some other means of prediction which cannot be confounded by the choice of what source code the agent runs. I write a little about this in this post. https://www.lesswrong.com/posts/xoQRz8tBvsznMXTkt/dissolving-confusion-around-functional-decision-theory
Nice post. I’m leaving a reply there instead of here. :)
Thanks for this post, I think it has high clarificational value and that your interpretation is valid and good. In my post, I failed to cite Y&S’s actual definition and should have been more careful. I ended up critiquing a definition that probably resembled MacAskill’s definition more than Y&S’s, and it seems to have been somewhat of an accidental strawperson. In fairness to me though, Y&S never offered any example with the minimal conditions for SD to apply in their original paper while I did. This is part of what led to MacAskill’s counterpost.
This all said, I do think there is something that my definition offers (clarifies?) that Y&S’s does not. Consider your example. Suppose I have played 100 Newcombian games and one-boxed each time. Your Omega will then predict that on the 101st, I’ll one-box again. If I make decisions independently each time I play the game, then we have the example you presented and which I agree with. But I think it’s more interesting if I am allowed to change my strategy. From my perspective as an agent trying to counfound Omega and win, I should not consider Omega’s predictions and my actions to subjuntively depend, and my definition would say so. Under the definition from Y&S, I think it’s less clear in this situation what I should think. Should say that we’re SD “so far”? Probably not. Should I wait until I finish all interaction with Omega and then decide whether or not we were SD in retrospect? Seems silly. So I think my definition may lead to a more practical understanding than Y&S’s.
Do you think we’re about on the same page? Thanks again for the post.
Thanks for this post, but I don’t feel like I have the background for understanding the point you’re making. In the pingback post, Demski describes your point as saying:
an agent could reason logically but with some looseness. This can fortuitously block the Troll Bridge proof.
Could you offer a high level explanation of what the main principle here is and what Demski means by looseness? (If such an explanation exists. )Thanks.
Huge thanks. I fixed this.