↘↘↘↘↘↘↙↙↙↙↙↙
Checkout my Biography.
↗↗↗↗↗↗↖↖↖↖↖↖
Johannes C. Mayer
Newcomb: Can’t do whats optimal
You have a system, that can predict perfectly what you will do in the future. It presents you with two opaque boxes. If you take both boxes, then it will place in one box 10$ and in the other 0$. If you will take only one box, then it will place in one box 10$ and in the other 1,000,000$. The system does not use its predictive power to predict which box you will choose, but only to determine if you choose one or two boxes. It uses a random number generator to determine where to place which amount of dollars.
This is a modified version of Newcomb’s problem.
Imagine that you are an agent that can reliably pre-commit to an action. Now imagine you pre-commit to taking only one box in such a way, that it makes it impossible for you to not uphold that commitment. Now if you choose a box, and get 10$, you know that the other box contains 1,000,000$ for sure.
The interesting thing is that you can end up in a scenario where you actually know that the other box contains 1,000,000$ for sure. The one that you did not pick. Although you can’t take it because of the pre-commitment mechanism. And this pre-commitment mechanism is the only thing that prevents you from taking it. The thing that I found interesting is that such a situation can arise.
You have a system, that can predict perfectly what you will do in the future.
In fact, I do not. This (like Newcomb) doesn’t tell me anything about the world.
Also of course there is no system in reality that can predict you perfectly, but this is about an idealised scenario that is relevant because there are systems that can predict you with more than 50% accuracy.
I don’t really get that. For example, you could put a cryptographic lock on the box (let’s assume there is no way around it without the key), and then throw away the key. It seems that now you actually are not able to access the box, because you do not have the key. And you can also at the same time know that this is the case.
Not sure why this should be impossible to say.
There could be but there does not need to be, I would say. Or maybe I really do not get what you are talking about. It could really be that if the cryptographic lock was not in place, that then you could take the box, and there is nothing else that prevents you from doing this. I guess I have an implicit model where I look at the world from a cartesian perspective. So is what you’re saying about counterfactuals, and that I am using them in a way that is not valid, and that I do not acknowledge this?
The “Fu*k it” justification
Sometimes people seem to say “fuk it” towards some particular thing. I think this is a way to justify one’s intuitions. You intuitively feel like you should not care about something, but you actually can’t put your intuition into words. Except you can say “fuk it” to convey your conclusion, without any justification. “Because it’s cool” is similar.
In section: “New counterexample: better inference in the human Bayes net”, what is meant with that the reporter does perfect inference in the human Bayes net? I am also unclear how the modified counterexample is different.
My current understanding: The reporter is doing inference using and the action sequence and does not use to do inference ( is inferred). The reporter has an exact copy of the human Bayes net and now fixes the nodes for and the action sequence. Then it infers the probability for all possible combinations of values each node can have (including ) (i.e. the joint probability distribution).
I am not sure here. Is the reporter not using ? The graphic in that section shows a red arrow from in the predictor, to in the human Bayes net model that the reporter uses. But that could be about the better counterexample already.
Now we assume that the model knows how to map a question in natural language onto nodes in the Bayes net and that it can then translate values of nodes into answers to questions. The model can then use the joint probability distribution and the law of total probability to calculate the probabilities of nodes/events occurring which can then be used to answer questions.
The only difference in the better counterexample is that we now also fix the value of to whatever our predictor part of the model said would happen. And we do not assume that our predictor works perfectly, hence our reporter can give wrong answers because of that.
And now when we have , then calculating the joint probability distribution becomes computationally feasible? Are we still assuming that the reporter does perfect inference in the human Bayes net, given that our predictor predicted correctly?
Ah ok, thank you. Now I get it. I was confused by (i) “Imagine the reporter could do perfect inference” and (ii) “the reporter could simply do the best inference it can in the human Bayes net (given its predicted video)”.
(i) I thought of this as that the reporter alone can do it, but what is actually meant is that with the use of the predictor model it can do it.
(ii) Somehow I thought that “given its predicted video” is the important modification here, where in fact the only change is to go from that the reporter can do perfect inference, to that it does the best inference that it can.
From what I have heard (I have not researched any of this very thoroughly), the palatability is not the problem directly but something very related is. It is not the case that someone would eat a lot, just because it is just so tasty. It is rather about that the composition of processed food is often very different from unprocessed food. And this affects how our body responds, like when we feel full. Eating some Froot Loops is very different from eating a mango.
This might not be the only, or even the main effect, but I would guess that it is a significant factor. Intuitively it seems much less likely to me that if you put somebody on a diet of vegetables, fresh fruit, legumes, and whole grains, that they would then become obese (probably also make it low sodium to decrease to normal levels palatability).
Disgust is optimizing
Someone told me that they were feeling disgusted by the view of trying to optimize for specific things, using specific objectives. This is what I wrote to them:
That feeling of being disgusted is actually some form of optimization itself. Disgust is a feeling that is utilized for many things, that we perceive as negative. It was probably easier for evolution to rewire when to feel disgusted, instead of creating a new feeling. The point is that that feeling that arises is supposed to change your behavior steering you in certain directions. I.e. it redirects what you are optimizing for. For example, it could make you think about why trying to optimize for things directly using explicit objectives is actually a bad thing. But the value judgment comes first. You first feel disgusted, and then you try to combat in some way the thing that you are disgusted by and try to come up with reasons why it is bad. So it is ironic that one can feel disgusted at optimization when feeling disgusted is part of an optimization process itself.
We were talking about maximizing positive and minimizing negative conscious experiences. I guess with the implicit assumption that we could find some specification of this objective that we would find satisfactory (one that would not have unintended consequences when implemented).
Yes. There are lots of optimization processes built into us humans, but they feel natural to us, or we simply don’t notice them. Stating something that you want to optimize for, especially if it is something that seems to impose itself on the entire structure of the universe, is not natural for humans. And that goal, if implemented would restrict the individual’s freedoms. And that humans really don’t like.
I think this all makes sense when you are trying to live together in a society, but I am not sure if we should blindly extrapolate these intuitions to determine what we want in the far future.
I think I don’t quite understand what you are saying unless you mean that not all of the observations of bad behavior come from some “region in space”.
Then I would say that yes, it does not happen in one place. When you look on youtube for videos of murder confessions you get the videos from countries where this content is publically accessible and mandated to be produced. Though, these are not the conditions under which all people live. I don’t know the laws for every country, but I would guess that some don’t do it. Certainly, hunter-gatherer tribes don’t.
Thanks for the clarification, now I get it. I think that is a good point. I do not know anyone that I know who did terrible things. And I mean from all the people who I have ever met. Which is probably in the hundreds. But of course, if they had done something terrible they would not necessarily have said. But it feels like none of them did. I just know one person that got into prison. And with know, I mean that I said 2 words to him in all my life, and a friend who knew him better told me after I did not see him for many years. I would expect that most people’s lives are like that. For a start, at least no one of all of these people has told me that they themselves or other people had different experiences. And that would be thousands of people at least. Though again they might just not have mentioned it. I never talked with anyone about this explicitly until now.
I added a link, that should have been there from the start, thanks.
I fixed the spelling, thanks.
About the following point:
“Argue that wireheading, unlike many other reward gaming or reward tampering problems, is unlikely in practice because the model would have to learn to value the actual transistors storing the reward, which seems exceedingly unlikely in any natural environment.”
Well, that seems to be what happened in the case of rats and probably many other animals. Stick an electrode into the reward center of the brain of a rat. Then give it a button to trigger the electrode. Now some rats will trigger their reward centers and ignore food.
Humans value their experience. A pleasant state of consciousness is actually intrinsically valuable to humans. Not that this is the only thing that humans value, but it is certainly a big part.
It is unclear how this would generalize to artificial systems. We don’t know if, or in what sense they would have experience, and why that would even matter in the first place. But I don’t think we can confidently say that something computationally equivalent to “valuing experience”, won’t be going on in artificial systems we are going to build.
So somebody picking this point would probably need to address this point and argue why artificial systems are different in this regard. The observation that most humans are not heroin addicts seems relevant. Though the human story might be different if there were no bad side effects and you had easy access to it. This would probably be more the situation artificial systems would find themselves in. Or in a more extreme case, imagine soma but you live longer.
In short: Is valuing experience perhaps computationally equivalent to valuing transistors storing the reward? Then there would be real-world examples of that happening.
That seems to imply that humans would continue to wirehead conditional on that they started wireheading.
Yes, though I was actually already believing this when feeling bad about my thoughts. I was not worried about other people thinking about me strangely. I was seeing it as a personal failure, which still made me feel bad. My point is that having unrealistic standards of yourself can also lead to unproductive suffering.
One of the most useful moral heuristics that I know is: It is ok to do X, if you don’t hurt anyone by doing X. And this applies here too.
Sometimes people say: “Too much of X is bad for you”. Well, that is true by the definition of “too much”. You can use this to make the argument, that the actual important point that the person tries to convey is that it is possible, and probably not too hard and quite likely if you are not careful, to get so much that it is bad for you.