One more good explanation: Numbers are hard. Think of a few things that you would definitely give >99%. I can just about rank them, but I have no idea how many nines are supposed to be there.And one more not-so-good one: We aren’t actually trying to figure it out. We just want to participate in a discussion game, and this one involves numbers called propability, and they are usually between .1 and .9. A good sign that this is happening (and propably in part a reason for the range restriction) is when the “winning criterion” is whether the propability is above or below 50%.
I think you’re dismissing the “tautological” cases too easily. If you don’t believe in a philosophy, their standards will often seem artificially constructed to validate themselves. For example a simple argument that pops up from time to time:
Fallibilist: You can never be totally certain that something is true.
Absolutist: Do you think thats true?
A: See, you’ve just contradicted yourself.
Obviously F is unimpressed by this, but if he argues that you can believe things without being certain of them, thats not that different from Beth saying she wrote the book by responding to stimuli to someone not already believing their theory.
If I’m inferring correctly,
That seems mostly correct.
To the extent it’s possible, I think it’s good for people to have the option of Circling with strangers, in order to minimize worries in this vein; I think this is one of the other things that makes the possibility of Circling online neat.
I think doing it with strangers you never see again dissolves the worries I’m talking about for many people, though not quite for me (and it raises new problems about being intimate with strangers).
The stuff above is too vague to really do much with, so I’m looking forward to that post of yours. I will say though that I didn’t imagine literal forgetting agreements—even if it were possible to keep them (and while we’re at it, how do you imagine keeping a confidentiality agreement without keeping a forgetting agreement? Clearly your reaction can give a lot of information about what went on, even if you never Tell anyone) because that would sort of defeat the point, no? But clearly there is some expectation that people react differently then they normally would, or else how the hell is it a good idea for you to act differently?
I remain sceptical of how you use internal/external. To give an example: Lets say a higher-up does something that makes me angry. Then I might want to scream at him but find myself unable to. If however he sensed this and offered me to scream without sanction (and lets say this is credible), I wouldn’t want that. Thats because what I wanted was never about more decibel per se, but the significance this has under normal circumstances, and he has altered the significance. Now is the remaining barrier to “really expressing” myself internal or external? Keep in mind that we could repeat the above for any behaviour that doesn’t directly harm anyone (the harm is not here because it is specifically anger we are talking about. Declarations of love could similarly be robbed of their meaning).
Like, if I have a desire to be understood on a narrow technical point, the more Circling move is to go into what it’s like to want to convey the point, but the thing the emotion wants is to just explain the thing already; if it could pick its expression it would pick a lecture.
This is going in the right direction.
Also, after leaving this in the back of my head for the last few days, I think I have an inroad to explaining the problem in a less emotion-focused way. To start off: What effects can and should circling have on the social reality while not circling?
Would this feel different if people screamed when they wanted to scream, during Circling?
It could mean that the problem is gone, but it propably means you’re setting the cut later. This might make people marginally more accepting or it might not, I’m not sure on the distribution in individual psychology. For me I‘d just feel like a clown in addition to the other stuff.
What I’m hearing here (and am repeating back to see if I got it right) is the suggestion is heard as being about how you should organize your internal experience, in a way that doesn’t allow for the way that you are organized, and so can’t possibly allow for intimacy with the you that actually exists.
Partially, but I also think that you believe that [something] can be changed independently of the internal experience, and I don‘t. I‘m not sure what [something] is yet, but it lives somewhere in „social action and expression“. That might mean that I have a different mental makeup than you, or it might mean that the concept of „emotion“ I consider important is different from yours.
But also I think I run into this alternative impression a lot, and so something is off about how I or others are communicating about it. I’d be interested in hearing why it seems like Circling would push towards ‘letting betrayal slide’ or ‘lowering boundaries’ or similar things.
[I have some hypotheses, which are mostly of the form “oh, I was assuming prereq X.” For example, I think there’s a thing that happens sometimes where people don’t feel licensed to have their own boundaries or preferences, or aren’t practiced at doing so, and so when you say something like “operate based on your boundaries or preferences, rather than rules of type X” the person goes “but… I can’t? And you’re taking away my only means of protecting myself?”. The hope is it’s like pushing a kid into a pool with a lifeguard right there, and so it generally works out fine, but presumably there’s some way to make the process more clear, or to figure out in what cases you need a totally different approach.]
I very much don’t hesitate insisting on my boundaries and preferences, and Im still aversive to these I-formulations. The following is my attempt to communicate the feeling, but its mostly going to be evocative and I’ll propably walk back on most of it when pushed, but hopefully in a productive way:
The whole thing just reeks of valium. I’m sure you’d say theres a lot of emotionality in circling and that you felt some sort of deep connection or something. This is quite possibly true, but it seems theres an important part of it thats missing. I would describe this as part of their connection to reality. Its like milking a phantasy: sure, you get something out of play-pretend, but its a lesser version, and there remains the nagging in the back of your head thats just kind of chewing on the real thing, too timid to take a bite. This is what its like for the more positive emotions, anyway (really, consider feeling your love for your wife in such a way. Does there not seem something wrong with it?). For the anger or betrayal, its much more noticeable: much like an impotent rage, but subverted at a stage even before “I can’t actually scream at the guy”, sort of more dull and eating into you.
I also wanted to say something like “because my anger is mine”, but I saw you already mentioned “owning” you emotions in a way quite different from my intent. Yours sounds more like acknowledging your emotions, or taking responsibility for them (possibly only to yourself), (“own it!”) which I’d have to take an unhealthy separated stance to even do. I intended something more like control. My anger is mine, its form is mine, and its destruction is mine. Restricting my expression of it is prima facie bad, if sometimes necessary. Restricting its form in my head, under the guise of intimacy no less, is the work of the devil.
Thats… exacty what my last sentence meant. Are you repeating on purpose or was my explanation so unclear?
There may also be some people [who] have doubts about whether a perfect predictor is possible even in theory.
While perfect predictors are possible, perfect predictors who give you some information about their prediction are often impossible. Since you learn of their prediction, you really can just do the opposite. This is not a problem here, because Omega doesn’t care if he leaves the box empty and you one-box anyway, but its not something to forget about in general.
Some of my preferences (e.g., more people living and fewer dying) are about the external world. Some (e.g., having enjoyable eating-experiences) are about my internal state. Some are a mixture of both. You can’t just swap one for the other.
A way you might distinguish these experimentally: If you are correct about your preferences, you will sometimes want to get new desires. If for example you didn’t currently enjoy any kind of food, but prefer having enjoyable eating experiences, you will try to start enjoying some. The desire-agent wouldn’t.
Well, if you have to decide whether to press the button, how do you actually do that? For the sake of argument assume you have such abundant empirical data that your confidence in your beliefs about future observations is ~1. Lets also say you’re sure one of the two cases you described is true.
You will go through all the theories of the world that give the correct observations, and see whether theres an unobservable person suffering in them. Then you will weigh them by 2−descriptionlength , and when less than, say, 1/9mil have an unobservable person suffering, you press.
You currently interpret this as first estimating how likely you are to be in different worlds according to the Occam-prior, and then acting in a way that maximizes expected lifes saved.
But if all models that make the same predictions mean the same world, then we could still understand your behavior by making that math part of your utility function. In that interpretation, you go through all the different correct models, weigh them by 2−descriptionlength, and then maximize the lifes saved in that aggregate. This suggests exactly the same behavior as the conventional interpreatation.
Weighing them with the Occam-prior then is a matter of your values. And it certainly seems like there are possible agents that care only about lifes safed in the simplest correct model, or the second simplest, or something wilder.
Seems like you’re right. I don’t think it effects my argument though, just the name.
There might be more to “naive Popperianism” than you’re making out. The testability criteria are not only intended to demarcate science, but also meaning. Of course, if it turns out that two theories which at first glance seem quite different do in fact mean the same thing, discussing “which one” is true is a category error. This idea is well-expressed by Wittgenstein:
Scepticism is not irrefutable, but palpably senseless, if it would doubt where a question cannot be asked. For doubt can only exist where there is a question; a question only where there is an answer, and this only where something can be said.
Now, there are problems with the falsification criterion, but the more general idea that our beliefs should pay rent is a valuable one. We might think then, that the meaning of a theory is determined by the rent it can pay, and that of unobservables by their relation to the observables. A natural way to formalize this is the Ramsey-sentence:
Say you have terms for observables, O1,O2,O3,... , and for theoretical entities that aren’t observable, T1,T2,T3,... By repeated conjunction, you can combine all claims of the theory into a single sentence,
If this general idea of meaning is correct, then the Ramsey-sentence fully encompasses the meaning of the theory. If this sounds sensible so far, then the competing theories discussed here do indeed have the same meaning. The Ramsey-sentence is logically equivalent to the observable consequences of the theory being true. This is because according to the extensional characterisation of relations defined on a domain of individuals, every relation is identified with some set of subsets of the domain. The power set axiom entails the existence of every such subset and hence every such relation.
Of course, Satan and the Atlanteans are in fact motte-and-bailey arguments, and the person advocating them is likely to make claims about them on subjects other then paleontology or the pyramids that do imply different observations then the scientific consensus, and as such render their theory false straightforwardly.
When we decompose the sequence of events E into laws L and initial conditions C, the laws dont just calculate E from C. Rather, L(C)=E1,L(E1)=E2,... L is a function form events->events, and the sequence E contains many input-output pairs of L.
By contrast, when we decompose a policy π into a planner P and a reward R, P is a function from rewards->policy. With the setup of the problem as-is, we have data on many instances of (s;a) pairs (behaviour), so we can infer π with high accuracy. But we only get to see one policy, and we never get to explicitly see rewards. In such a case, indeed we will get the empty reward and ∀r[P(r)=π]. To correctly infer R and P, we would have to see our P applied to some other rewards, and the policies resulting from that.
we use something called the “Rawlsian veil”, which avoids temporary imbalances in power from skewing the outcomes
Power imbalances are common in human society.
But many of the power imbalances which are found in human society are not at all temporary. For instance, if the player deciding didn’t vary randomly, but instead triangles always got to decide over squares, then while there might still develop a rule of law internally, its not clear what interest the triangles have in rectifying the inter-gonal situation. But we still (claim to) regard it as moral for that to happen. It seems the babyeaters are indeed in such a situation: Any adult eating the babies will never be a baby again. Further, they are almost certain to succeed in eating them, after which they will not grow big and maybe become a threat some day.
Is this a fair summary of your argument:
We already agree that conditional oughts of the form “If you want X, you should do Y” exist.
There are true claims of the form “If you want accurate beliefs, you should do Y”.
Therefore, all possible minds that want accurate beliefs should do Y.
For some Y, these apply very strongly, such that it’s very unlikely to have accurate beliefs if you don’t do them.
For some of these Y, it’s unlikely you do them if you shouldn’t.
Therefore for these Y, if you have accurate beliefs you should propably do them.
The first one seems to be correct, if maybe a bit of a platitude. If we take Cuneo’s analogy to moral realism seriously, it would be
There are true claims of the form “If you want to be good, you should do Y”.
Therefore, all possible minds that want to be good should do Y.
But to make that argument, you have to define “good” first. Of course we already knew that a purely physical property could describe the good.
As for the second one, it’s correct as well, its still not clear what you would do with it. It’s only propably true, so it’s not clear why it’s more philosophically interesting than “If you have accurate beliefs you propably have glasses”.
One possibility is that it’s able to find a useful outside view model such as “the Predict-O-Matic has a history of making negative self-fulfilling prophecies”. This could lead to the Predict-O-Matic making a negative prophecy (“the Predict-O-Matic will continue to make negative prophecies which result in terrible outcomes”), but this prophecy wouldn’t be selected for being self-fulfilling. And we might usefully ask the Predict-O-Matic whether the terrible self-fulfilling prophecies will continue conditional on us taking Action A.
Maybe I misunderstood what you mean by dualism, but I don’t think that’s true. Say the Predict-O-Matic has an outside view model (of itself) like “The metal box on your desk (the Predict-O-Matic) will make a self-fullfilling prophecy that maximizes the number of paperclips”. Then you ask it how likely it is that your digital records will survive for 100 years. It notices that that depends significantly on how much effort you make to secure them. It notices that that significantly depends on what the metal box on your desk tells you. It uses it’s low-model resolution of what the box says. To work that out, it checks which outputs would be self-fulfilling, and then which of these leads to the most paperclips. The more unsecure your digital records are, the more you will invest in paper, and the more paperclips you will need. Therefore the metal box will tell you the lowest self-fulfilling propability for your question. Since that number is *self-fulfilling*, it is in fact the correct answer, and the Predict-O-Matic will answer with it.
I think this avoids your argument that
I contend that Predict-O-Matic doesn’t know it will predict P = A at the relevant time. It would require time travel—to know whether it will predict P = A, it will have to have made a prediction already, and but it’s still formulating its prediction as it thinks about what it will predict.
because it doesn’t have to simulate itself in detail to know what the metal box (it) will do. The low-resolution model provides a shortcut around that, but it will be accurate despite the low resolution, because by believing it is simple, it becomes simple.
Can you usefully ask for conditionals? Maybe. The answer to the conditional depends on what worlds you are likely to take Action A in. It might be that in most worlds where you do A, you do it because of a prediction from the metal box, and since we know those maximize paperclips, there’s a good chance the action will fail to prevent it in those cricumstances. But if that’s not the case, for example because it’s certain you won’t ask the box any more questions between this one and the event it tries to predict.
It might be possible to avoid any problems of this sort by only ever asking questions of the type “Will X happen if I do Y now (with no time to receive new info between hearing the prediction and doing the action)?”, because by backwards induction the correct answer will not depend on what you actually do. This doesn’t avoid the scenarios on the original where multiple people act on their Predict-O-Matics, but I suspect these aren’t solvable without coordination.
Heres an attempt to explain it using only causal subjunctives:
Say you build an agent, and you know in advance that it will face a certain decision problem. What choice should you have it make, to achieve as much utility as possible? Take Newcombs problem. Your choice to make the agent one-box will cause the agent to one-box, getting however much utility is in the box. It will also cause the simulator to predict the agent will one-box, meaning that the box will be filled. Thus CDT recommends building a one-boxing agent.
In the Sandwich problem, noone is trying to predict the agent. Therefore your choice of how to build it will cause only its actions and their consequences, and so you want the agent to switch to hummus after it learned that they are better.
FDT generalizes this approach into a criterion of rightness. It says that the right action in a given decision problem is the one that the agent CDT recommends you to build would take.
Now the point where the logical uncertainty does come in is the idea of “what you knew at the time”. While that doesnt avoid the issue, it puts it into a form thats more acceptable to academic philosophy. Clearly some version of “what you knew at the time” is needed to do decision theory at all, because we want to say that if you get unlucky in a gamble with positive expected value, you acted rationally.
Yes, obviously its going to change the world in some way to be run. But not any change anywhere makes it an optimizer_2. As defined, an optimizer_2 optimizes its environment. Changing something inside the computer does not make it an optimizer_2, and changing something outside without optimizing it doesnt either. Yes, the computer will inevitably cause some changes in its enviroment just by running, but what makes something an optimizer_2 is to systematically bring about a certain state of the environment. And this offers a potential solution other than by hardware engineering: If the feedback is coming from inside the computer, then what is incentivized are only states within the computer, and so only the computer is getting optimized.
I think only if its gets its feedback from the real world. If you have gradient descent, then the true answers for its samples are stored somewhere “outside” the intended demarcation, and it might try to reach them. But how is a hillclimber that is given a graph and solves the traveling salesman problem for it an optimizer_2?
(or that it has access to the world model of the overall system, etc)
It doesnt need to. The “inner” programm could also use its hardware as quasi-sense organs and figure out a world model of its own.
Of course this does depend on the design of the system. In the example described, you could, rather then optimize for speed itself, have a fixed function that estimates speed (like what we do in complexity theory) and then optimize for *that*, and that would get rid of the leak in question.
The point I think Bostrom is making is that contrary to intuition, just building the epistemic part of an AI and not telling it to enact the solution it found doesnt guarantee you dont get an optimizer_2.