Somewhat true, but without further bells and whistles, RL does not replicate the Pavlov strategy in Prisoner’s Dilemma, so I think looking at it that way is missing something important about what’s going on.
Ah, ok. I note that it may have been intended more as a meditative practice, since the goal appears to have been reaching a state of bliss, the epistemic practice being a means to that end. Practicing doubting everything could be an interesting meditation (though it could perhaps be dangerous).
this procedure is called (the weak form of) Pyrrhonian skepticism
What’s the strong form?
I think in a conversation I had with you last year, I kept going back to ‘state’ despite protests because I kept thinking “if AUP works, surely it would be because some of the utility functions calculate a sensible state estimate in a humanlike ontology and then define utility from this”. It isn’t necessarily the right way to critique AUP, but I think I was right to think those thoughts conditional on that assumption—ie, even if it isn’t the argument you’re trying to make for AUP, it seems like a not-unreasonable position to consider, and so thinking about how AUP does in terms of state can be a reasonable and important part of a thought-process assessing AUP. I believe I stopped making the assumption outright at some point, but kept bringing out the assumption as a tool for analysis—for example, supporting a thought experiment with the argument that there would at least be some utility functions which thought about the external world enough to case about such-and-such. I think in our conversation I managed to appropriately flag these sorts of assumptions such that you were OK with the role it was playing in the wider argument (well… not in the sense of necessarily accepting the arguments, but in the sense of not thinking I was just repeatedly making the mistake of thinking it has to be about state, I think).
Other people could be thinking along similar lines without flagging it so clearly.
Giving people a slider with “safety” written on one end and “capability” written on the other, and then trying to get people to set it close enough to the “safety” end, seems like a bad situation. (Very similar to points you raised in your 5-min-timer list.)
An improvement on this situation would be something which looked more like a theoretical solution to Goodhart’s law, giving an (in-some-sense) optimal setting of a slider to maximize a trade-off between alignment and capabilities (“this is how you get the most of what you want”), allowing ML researchers to develop algorithms orienting toward this.
Even better (but similarly), an approach where capability and alignment go hand in hand would be ideal—a way to directly optimize for “what I mean, not what I say”, such that it is obvious that things are just worse if you depart from this.
However, maybe those things are just pipe dreams—this should not be the fundamental reason to ignore impact measures, unless promising approaches in the other two categories are pointed out; and even then, impact measures as a backup plan would still seem desirable.
My response to this is roughly that I prefer mild optimization techniques for this back up plan. Like impact measures, they are vulnerable to the objection above; but they seem better in terms of the objection which follows.
Part of my intuition, however, is just that mild optimization is going to be closer to the theoretical heart of anti-Goodhart technology. (Evidence for this is that quantilization seems, to me, theoretically nicer than any low-impact measure.)
In other words, conditioned on having a story more like “this is how you get the most of what you want” rather than a slider reading “safety ------- capability”, I more expect to see a mild optimizer as opposed to an impact measure.
Unlike mild-optimization approaches, impact measures still allow potentially large amounts of optimization pressure to be applied to a metric that isn’t exactly what we want.
It is apparent that some attempted impact measures run into nearest-unblocked-strategy type problems, where the supposed patch just creates a different problem when a lot of optimization pressure is applied. This gives reason for concern even if you can’t spot a concrete problem with a given impact measure: impact measures don’t address the basic nearest-unblocked-strategy problem, and so are liable to severe Goodheartian results.
If an impact measure were perfect, then adding it as a penalty on an otherwise (slightly or greatly) misaligned utility function just seems good, and adding it as a penalty to a perfectly aligned utility function would seem an acceptable loss. If impact is slightly misspecified, however, then adding it as a penalty may make a utility function less friendly than it otherwise would be.
(It is a desirable feature of safety measures, that those safety measures do not risk decreasing alignment.)
On the other hand, a mild optimizer seems to get the spirit of what’s wanted from low-impact.
This is only somewhat true: a mild optimizer may create a catastrophe through negligence, where a low-impact system would try hard to avoid doing so. However, I view this as a much more acceptable and tractable problem than the nearest-unblocked-strategy type problem.
Both mild optimization and impact measures require separate approaches to “doing what people want”.
Arguably this is OK, because they could greatly reduce the bar for alignment of specified utility functions. However, it seems possible to me that we need to understand more about the fundamentally puzzling nature of “do what I want” before we can be confident even in low-impact or mild-optimization approaches, because it is difficult to confidently say that an approach avoids risk of hugely violating your preferences while still being so confused about what human preference even is.
Ok, so you find yourself in this situation where the Truth Tester has verified that the Predictor is accurate, and you’ve verified that the Truth Tester is accurate, and the Predictor tells you that the direction you’re about to turn your head has a perfect correspondence to the orbit of some particular asteroid. Lacking the orbit information yourself, you now have a subjective link between your next action and the asteroid’s path.
This case does appear to present some difficulty for me.
I think this case isn’t actually so different from the previous case, because although you don’t know the source code of the Predictor, you might reasonably suspect that the Predictor picks out an asteroid after predicting you (or, selects the equation relating your head movement to the asteroid orbit after picking out the asteroid). We might suspect this precisely because it is implausible that the asteroid is actually mirroring our computation in a more significant sense. So using a Truth Teller intermediary increases the uncertainty of the situation, but increased uncertainty is compatible with the same resolution.
What your revision does do, though, is highlight how the counterfactual expectation has to differ from the evidential conditional. We may think “the Predictor would have selected a different asteroid (or different equation) if its computation of our action had turned out different”, but, we now know the asteroid (and the equation); so, our evidential expectation is clearly that the asteroid has a different orbit depending on our choice of action. Yet, it seems like the sensible counterfactual expectation given the situation is … hm.
Actually, now I don’t think it’s quite that the evidential and counterfactual expectation come apart. Since you don’t know what you actually do yet, there’s no reason for you to tie any particular asteroid to any particular action. So, it’s not that in your state of uncertainty choice of action covaries with choice of asteroid (via some particular mapping). Rather, you suspect that there is such a mapping, whatever that means.
In any case, this difficulty was already present without the Truth Teller serving as intermediary: the Predictor’s choice of box is already known, so even though it is sensible to think of the chosen box as what counterfactually varies based on choice of action, on-the-spot what makes sense (evidentially) is to anticipate the same box having different contents.
So, the question is: what’s my naive functionalist position supposed to be? What sense of “varies with” is supposed to necessitate the presence of a copy of me in the (logico-)causal ancestry of an event?
It occurs to me that although I have made clear that I (1) favor naive functionalism and (2) am far from certain of it, I haven’t actually made clear that I further (3) know of no situation where I think the agent has a good picture of the world and where the agent’s picture leads it to conclude that there’s a logical correlation with its action which can’t be accounted for by a logical cause (ie something like a copy of the agent somewhere in the computation of the correlated thing). IE, if there are outright counterexamples to naive functionalism, I think they’re actually tricky to state, and I have at least considered a few cases—your attempted counterexample comes as no surprise to me and I suspect you’ll have to try significantly harder.
My uncertainty is, instead, in the large ambiguity of concepts like “instance of an agent” and “logical cause”.
“How do you propose to reliably put an agent into the described situation?”—Why do we have to be able to reliably put an agent in that situation? Isn’t it enough that an agent may end up in that situation?
For example, we can describe how to put an agent into the counterfactual mugging scenario as normally described (where Omega asks for $10 and gives nothing in return), but critically for our analysis, one can only reliably do so by creating a significant chance that the agent ends up in the other branch (where Omega gives the agent a large sum if and only if Omega would have received the asked-for $10 in the other branch). If this were not the case, the argument for giving the $10 would seem weaker.
But in terms of how the agent can know the predictor is accurate, perhaps the agent gets to examine its source code after it has run and its implemented in hardware rather than software so that the agent knows that it wasn’t modified?
I’m asking for more detail about how the predictor is constructed such that the predictor can accurately point out that the agent has the same output as the box. Similarly to how counterfactual mugging would be less compelling if we had to rely on the agent happening to have the stated subjunctive dependencies rather than being able to describe a scenario in which it seems very reasonable for the agent to have those subjunctive dependencies, your example would be less compelling if the box just happens to contain a slip of paper with our exact actions, and the predictor just happens to guess this correctly, and we just happen to trust the predictor correctly. Then I would agree that something has gone wrong, but all that has gone wrong is that the agent had a poor picture of the world (one which is subjunctively incorrect from our perspective, even though it made correct predictions).
On the other hand, if the predictor runs a simulation of us, and then purposefully chose a box whose output is identical to ours, then the situation seems perfectly sensible: “the box” that’s correlated with our output subjectively is a box which is chosen differently in cases where our output is different; and, the choice-of-box contains a copy of us. So the example works: there is a copy of us somewhere in the computation which correlates with us.
(Also, just wanted to check whether you’ve read the formal problem description in Logical Counterfactuals and the Co-operation Game)
I’ve read it now. I think you could already have guessed that I agree with the ‘subjective’ point and disagree with the ‘meaningless to consider the case where you have full knowledge’ point.
I disagree, and I thought my objection was adequately explained. But I think my response will be more concrete/understandable/applicable if you first answer: how do you propose to reliably put an agent into the described situation?
The details of how you set up the scenario may be important to the analysis of the error in the agent’s reasoning. For example, if the agent just thinks the predictor is accurate for no reason, it could be that the agent just has a bad prior (the predictor doesn’t really reliably tell the truth about the agent’s actions being correlated with the box). To that case, I could respond that of course we can construct cases we intuitively disagree with by giving the agent a set of beliefs which we intuitively disagree with. (This is similar to my reason for rejecting the typical smoking lesion setup as a case against EDT! The beliefs given to the EDT agent in smoking lesion are inconsistent with the problem setup.)
I’m not suggesting that you were implying that, I’m just saying it to illustrate why it might be important for you to say more about the setup.
The box itself isn’t necessarily thought of as possessing an instance of my consciousness. The bullet I want to bite is the weaker claim that anything subjunctively linked to me has me somewhere in its computation (including its past). In the same way that a transcript of a conversation I had contains me in its computation (I had to speak a word in order for it to end up in the text) but isn’t itself conscious, a box which very reliably has the same output as me must be related to me somehow.
I anticipate that your response is going to be “but what if it is only a little correlated with you?”, to which I would reply “how do we set up the situation?” and probably make a bunch of “you can’t reliably put me into that epistemic state” type objections. In other words, I don’t expect you to be able to make a situation where I both assent to the subjective subjunctive dependence and will want to deny that the box has me somewhere in its computation.
For example, the easiest way to make the correlation weak is for the predictor who tells me the box has the same output as me to be only moderately good. There are several possibilities. (1) I can already predict what the predictor will think I’ll do, which screens off its prediction from my action, so no subjective correlation; (2) I can’t predict confidently what the predictor will say, which means the predictor has information about my action which I lack; then, even if the predictor is poor, it must have a significant tie to me; for example, it might have observed me making similar decisions in the past. So there are copies of me behind the correlation.
I haven’t heard anyone else express the extremely naive view we’re talking about that I recall, and I probably have some specific decision-theory-related beliefs that make it particularly appealing to me, but I don’t think it’s out of the ballpark of other people’s views so to speak.
The point I make there is that the processes are subjunctively linked to you is more a matter of your state of knowledge than anything about the intrinsic properties of the object itself.
I (probably) agree with this point, and it doesn’t seem like much of an argument against the whole position to me—coming from a Bayesian background, it makes sense to be subjectivist about a lot of things, and link them to your state of knowledge. I’m curious how you would complete the argument—OK, subjunctive statements are linked to subjective states of knowledge. Where does that speak against the naive functionalist position?
I’m very negative on Naive Functionalism. I’ve still got some skepticism about functionalism itself (property dualism isn’t implausible in my mind), but if I had to choose between Functionalist theories, that certainly isn’t what I’d pick.
I’m trying to think more about why I feel this outcome is a somewhat plausible one. The thing I’m generating is a feeling that this is ‘how these things go’—that the sign that you’re on the right track is when all the concepts start fitting together like legos.
I guess I also find it kind of curious that you aren’t more compelled by the argument I made early on, namely, that we should collapse apparently distinct notions if we can’t give any cognitive difference between them. I think I later rounded down this argument to occam’s razor, but there’s a different point to be made: if we’re talking about the cognitive role played by something, rather than just the definition (as is the case in decision theory), and we can’t find a difference in cognitive role (even if we generally make a distinction when making definitions), it seems hard to sustain the distinction. Taking another example related to anthropics, it seems hard to sustain a distinction between ‘probability that I’m an instance’ and ‘degree I care about each instance’ (what’s been called a ‘caring measure’ I think), when all the calculations come out the same either way, even generating something which looks like a Bayesian update of the caring measure. Initially it seems like there’s a big difference, because it’s a question of modeling something as a belief or a value; but, unless some substantive difference in the actual computations presents itself, it seems the distinction isn’t real. A robot built to think with true anthropic uncertainty vs caring measures is literally running equivalent code either way; it’s effectively only a difference in code comment.
Sounds like the disagreement has mostly landed in the area of questions of what to investigate first, which is pretty firmly “you do you” territory—whatever most improves your own picture of what’s going on, that is very likely what you should be thinking about.
On the other hand, I’m still left feeling like your approach is not going to be embedded enough. You say that investigating 2->3 first risks implicitly assuming too much about 1->2. My sketchy response is that what we want in the end is not a picture which is necessarily even consistent with having any 1->2 view. Everything is embedded, and implicitly reflective, even the decision theorist thinking about what decision theory an agent should have. So, a firm 1->2 view can hurt rather than help, due to overly non-embedded assumptions which have to be discarded later.
Using some of the ideas from the embedded agency sequence: a decision theorist may, in the course of evaluating a decision theory, consider a lot of #1-type situations. However, since the decision theorist is embedded as well, the decision theorist does not want to assume realizability even with respect to their own ontology. So, ultimately, the decision theorist wants a decision theory to have “good behavior” on problems where no #1-type view is available (meaning some sort of optimality for non-realizable cases).
I agree, that’s a serious issue with the setup here. The simple answer is that I didn’t think of that when I was writing the post. I later noticed the problem, but how to react isn’t totally obvious.
Defense #1: An easy response is that I was talking about updateful DTs in my smoking lesion discussion. If a DT learns, it is hard to see why it would have seriously miscalibrated estimates of its own behavior. For UDT, there is no similar argument. Therefore the post as written above stands.
Reply: Perhaps that’s not very satisfying, though—despite UDT’s fixed prior, failure due to lack of calibration about oneself seems like a particularly damning sort of failure. We might construct the prior using something similar to a reflective oracle to block this sort of problem.
Defense #2: Then, the next easy response is that material-conditional-based UDT 1.0 with such a self-knowledgeable prior has two possible fixed points. The probability distribution described in the post isn’t one of them, but one with a more extreme assignment favoring dancing is: if the prior expects the agent to dance with certainty or almost certainly, then dancing looks good, and not dancing looks like a way to guarantee you don’t get the money. Again, the concern raised in the post is a valid one, just requiring a tweak to the probabilities in the example.
Reply: Sure, but the solution in this case is very clear: you have to select the best fixed point. This seems like an option which is available to the agent, or to the agent designer.
Defense #3: True, but then you’re essentially taking a different counterfactual to decide the consequences of a policy: consideration of what fixed point it puts you in. This implies that you have something richer than just a probability distribution to work with, vindicating the overall point of the post, which is to discuss an issue which arises if you try to “condition on a conditional” when given only a probability distribution on actions and outcomes. Reasoning involving fixed points is going to end up being a (very particular) way to add a more basic counterfactual, as suggested by the post.
Also, even if you do this, I would conjecture there’s going to be some other problem with using the material conditional formulation of conditioning-on-conditionals. I would be interested if this turned out not to be true! Maybe there’s some proof that the material-conditional approach turns out not to be equivalent to other possible approaches under some assumptions relating to self-knowledge and fixed-points. That would be interesting.
Also also, if we take the fixed-point idea seriously, there are problems we run into there as well. Reflective oracles (and their bounded cousins, for constructing computable priors) don’t offer a wonderful notion of counterfactual. Selecting a fixed point offers some logical control over predictors which themselves call the reflective oracle to predict you, but if a predictor does something else (perhaps even re-computes the reflective oracle in a slightly different way, side-stepping a direct call to it but simulating it anyway), the result of using selection of fixed point as a notion of counterfactual could be intuitively wrong. You could try to define a special type of reflective oracle which lack this problem. You could also try other options like conditional oracles. But, it isn’t clear how everything should fit together. In particular, if the oracle itself is treated as a part of the observation, what is the type of a policy?
So, “select the best fixed point” may not be the straightforward option it sounds like.
Reply: This seems to not take the concern seriously enough. The overall type signature of “conditioning on conditionals” seems wrong here. The idea of having a probability distribution on actions may be wrong, stopping the argument in the post in its tracks—IE, the post may be right in its conclusion that there is a problem, but we should have been reasoning in a way which never went down that wrong path in the first place, and the conclusion of the post is making too small of a change to accomplish that.
For example, maybe distributed oracles offer a better picture of decision-making: the real process of deciding occurs in the construction of the fixed point, with nothing left over to decide once a fixed point has been constructed.
Clearly matters are getting too complicated for a simple correction to the argument in the post.
Defense #4: I still stand by the post as a cautionary tale about how not to define UDT, barring any “if you deal with self-reference appropriately, the material conditional option turns out to be equivalent to [some other options]” result, which could make me think the problem is more fundamental as opposed to a problem with a naive material-conditional approach to conditioning. The post might be improved by explicitly dealing with the self-reference issue, but the fact that it’s not totally clear how to do so (ie ‘select the best fixed point’ seems to fix things on the surface but has its own more subtle issues when considered as a general approach) makes such a treatment potentially very complicated, so that it’s better to look at the happy dance problem without explicitly worrying about all of that.
The basic point of the post is that formally specifying UDT is complicated even if you assume classical bayesian probability w/o worrying about logical uncertainty. Making UDT into a simple well-defined object requires the further assumption that there’s a basic ‘policy’ object (the observation counterfactual, in the language of the post), with known probabilistic relationships to everything else. This essentially just gives you all the counterfactuals you need, begging the question of where such counterfactual information comes from. This point stands, however naive we might think such an approach is.
I provided reasons why I believe that Naive Functionalism is implausible in an earlier comment. I’ll admit that inconsistency is too strong of a word. My point is just that you need an independent reason to bite the bullet other than simplicity. Like simplicity combined with reasons why the bullets sound worse than they actually are.
Ah, I had taken you to be asserting possibilities and a desire to keep those possibilities open rather than held views and a desire for theories to conform to those views.
Maybe something about my view which I should emphasize is that since it doesn’t nail down any particular notion of counterfactual dependence, it doesn’t actually directly bite bullets on specific examples. In a given case where it may seem initially like you want counterfactual dependence but you don’t want anthropic instances to live, you’re free to either change views on one or the other. It could be that a big chunk of our differing intuitions lies in this. I suspect you’ve been thinking of me as wanting to open up the set of anthropic instances much wider than you would want. But, my view is equally amenable to narrowing down the scope of counterfactual dependence, instead. I suspect I’m much more open to narrowing down counterfactual dependence than you might think.
The argument that you’re making isn’t that the Abstraction Approach is wrong, it’s that by supporting other theories of consciousness, it increases the chance that people will mistakenly fail to choose Naive Functionalism. Wrong theories do tend to attract a certain number of people believing in them, but I would like to think that the best theory is likely to win out over time on Less Wrong.
(I note that I flagged this part as not being an argument, but rather an attempt to articulate a hazy intuition—I’m trying to engage with you less as an attempt to convince, more to explain how I see the situation.)
I don’t think that’s quite the argument I want to make. The problem isn’t that it gives people the option of making the wrong choice. The problem is that it introduces freedom in a suspicious place.
Here’s a programming analogy:
Both of us are thinking about how to write a decision theory library. We have a variety of confusions about this, such as what functionality a decision theory library actually needs to support, what the interface it needs to present to other things is, etc. Currently, we are having a disagreement about whether it should call an external library for ‘consciounsens’ vs implement its own behavior. You are saying that we don’t want to commit to implementing consciousness a particular way, because we may find that we have to change that later. So, we need to write the library in a way such that we can easily swap consciousness libraries.
When I imagine trying to write the code, I don’t see how I’m going to call the ‘consciousness’ library while solving all the other problems I need to solve. It’s not that I want to write my own ‘consciousness’ functionality. It’s that I don’t think ‘consciousness’ is an abstraction that’s going to play well with the sort of things I need to do. So when I’m trying to resolve other confusions (about the interface, data types I will need, functionality which I may want to implement, etc) I don’t want to have to think about calling arbitrary consciousness libraries. I want to think about the data structures and manipulations which feel natural to the problem being solved. If this ends up generating some behaviors which look like a call to the ‘naive functionalism’ library, this makes me think the people who wrote that library maybe were on to something, but it doesn’t make me any more inclined to re-write my code in a way which can call ‘consciousness’ libraries.
If another programmer sketches a design for a decision theory library which can call a given consciousness library, I’m going to be a bit skeptical and ask for more detail about how it gets called and how the rest of the library is factored such that it isn’t just doing a bunch of work in two different ways or something like that.
Actually, I’m confused about how we got here. It seems like you were objecting to the (reductive-as-opposed-to-merely-analogical version of the) connection I’m drawing between decision theory and anthropics. But then we started discussing the question of whether a (logical) decision theory should be agnostic about consciousness vs take a position. This seems to be a related but separate question; if you reject (or hold off on deciding) the connection between decision theory and anthropics, a decision theory may or may not have to take a position on consciousness for other reasons. It’s also not entirely clear that you have to take a particular position on consciousness if you buy the dt-anthropics connection. I’ve actually been ignoring the question of ‘consciousness’ in itself, and instead mentally substituting it with ‘anthropic instance-ness’. I’m not sure what I would want to say about consciousness proper; it’s a very complicated topic.
This is an argument for Naive Functionalism vs other theories of consciousness. It isn’t an argument for the Abstracting Approach over the Reductive approach. The Abstracting Approach is more complicated, but it also seeks to do more. In order to fairly compare them, you have to compare both on the same domain. And given the assumption of Naive Functionalism, the Abstracting Approach reduces to the Reductive Approach.
An argument in favor of naive functionalism makes applying the abstraction approach less appealing, since it suggests the abstraction is only opening the doors to worse theories. I might be missing something about what you’re saying here, but I think you are not only arguing that you can abstract without losing anything (because the agnosticism can later be resolved to naive functionalism), but that you strongly prefer to abstract in this case.
But, I agree that that’s not the primary disagreement between us. I’m fine with being agnostic about naive functionalism; I think of myself as agnostic, merely finding it appealing. Primarily I’m reacting to the abstraction approach, because I think it is better in this case for a theory of logical counterfactuals to take a stand on anthropics. The fact that I’m uncertain about naive functionalism is tied to the fact that I’m uncertain about counterfactuals; the structure of my uncertainty is such that I expect information about one to provide information about the other. You want to maintain agnosticism about consciousness, and as a result, you don’t want to tie those beliefs together in that way. From my perspective, it seems better to maintain that agnosticism (if desired) by remaining agnostic about the specific connection between anthropics and decision theory which I outlined, rather than by trying to do decision theory in a way which is agnostic about anthropics in general.
You can formalize UDT in a more standard game-theoretic setting, which allows many problems like Parfit’s Hitchhiker to be dealt with, if that is enough for what you’re interested in. However, the formalism assumes a lot about the world (such as the identity of the agent being a nonproblematic given, as Wei Dai mentions), so if you want to address questions of where that structure is coming from, you have to do something else.
Various comments, written while reading:
The broad categories of causal/evidential/logical are definitely right in terms of what people generally talk about, but it is important to keep in mind that these are clusters rather than fully formalized options. There are many different formalizations of causal counterfactuals, which may have significantly different consequences. Though, around here, people think of Pearlian causality almost exclusively.
“Evidential” means basically one thing, but we can differentiate between what happens in different theories of uncertainty. Obviously, Bayesianism is popular in these parts, but we also might be talking about evidential reasoning in a logically uncertain framework, like logical induction.
Logical counterfactuals are wide open, since there’s no accepted account of what exactly they are. Though, modal DT is a concrete proposal which is often discussed.
Again, the causal/evidential/logical split seems good for capturing how people mostly talk about things here, but internally I think of it more as two dimensions: causal/evidential and logical/not. Logical counterfactuals are more or less the “causal and logical” option, conveying intuitions of there being some kind of “logical causality” which tells you how to take counterfactuals.
Also, getting into nitpicks: some might say “evidential” is the non-counterfactual option. A broader term which could be used is “conditional”, with counterfactual conditionals (aka subjunctive conditionals) being a subtype. I think evidential conditionals would fall under “indicative conditional” as opposed to “counterfactual conditional”. Academic philosophers might also nitpick that logical counterfactuals are not counterfactuals. “Counterfactual” in academic philosophy usually does not include the possibility of counterfacting on logical impossibilities; “counterlogical” is used when logical impossibilities are being considered. Posts on this forum usually ignore all the nitpics in this paragraph, and I’m not sure I’m even capturing the language of academic decision theorists accurately—just attempting to mention some distinctions I’ve encountered.
You’re right that reflective consistency is something which is supposed to emerge (or not emerge) from the specification of the decision theory. If there were a ‘reflective consistency’ option, we would want to just set it to ‘yes’; but unfortunately, things are not so easy.
Another source of variation, related to your ‘graphical models’ point, could broadly be called choice of formalism. A decision problem could be given as an extensive-form game, a causal Bayes net, a program (probabilistic or deterministic), a logical theory (with some choices about how actions, utilities, etc get represented, whether causality needs to be specified, and so on), or many other possibilities.
This is critical; new formalisms such as reflective oracles may allow us to accomplish new things, illuminate problems which were previously murky, make distinctions between things which were previously being conflated, and so on. However, the high-level clusters like CDT, EDT, FDT, and UDT do not specify formalism—they are more general ideas, which can be formalized in multiple ways.
I agree that we are using “depends” in different ways. I’ll try to avoid that language. I don’t think I was confusing the two different notions when I wrote my reply; I thought, and still think, that taking the abstraction approach wrt consciousness is in itself a serious point against a decision theory. I don’t think the abstraction approach is always bad—I think there’s something specific about consciousness which makes it a bad idea.
Actually, that’s too strong. I think taking the abstraction approach wrt consciousness is satisfactory if you’re not trying to solve the problem of logical counterfactuals or related issues. There’s something I find specifically worrying here.
I think part of it is, I can’t imagine what else would settle the question. Accepting the connection to decision theory lets me pin down what should count as an anthropic instance (to the extent that I can pin down counterfactuals). Without this connection, we seem to risk keeping the matter afloat forever.
Making a theory of counterfactuals take an arbitrary theory of consciousness as an argument seems to cement this free-floating idea of consciousness, as an arbitrary property which a lump of matter can freely have or not have. My intuition that decision theory has to take a stance here is connected to an intuition that a decision theory needs to depend on certain ‘sensible’ aspects of a situation, and is not allowed to depend on ‘absurd’ aspects. For example, the table being wood vs metal should be an inessential detail of the 5&10 problem.
This isn’t meant to be an argument, only an articulation of my position. Indeed, my notion of “essential” vs “inessential” details is overtly functionalist (eg, replacing carbon with silicon should not matter if the high-level picture of the situation is untouched).
Still, I think our disagreement is not so large. I agree with you that the question is far from obvious. I find my view on anthropics actually fairly plausible, but far from determined.
When you talk about “depends” and say that this is a disadvantage, you mean that in order to obtain a complete theory of anthropics, you need to select a theory of consciousness to be combined with your decision theory. I think that this is actually unfair, because in the Reductive Approach, you do implicitly select a theory of consciousness, which I’ll call Naive Functionalism. I’m not using this name to be pejorative, it’s the best descriptor I can think of for the version of functionalism which you are using that ignores any concerns that high-level predictors might deserve to be labelled as a consciousness.
“Naive” seems fine here; I’d agree that the position I’m describing is of a “the most naive view here turns out to be true” flavor (so long as we don’t think of “naive” as “man-on-the-street”/”folk wisdom”).
I don’t think it is unfair of me to select a theory of consciousness here while accusing you of requiring one. My whole point is that it is simpler to select the theory of consciousness which requires no extra ontology beyond what decision theory already needs for other reasons. It is less simple if we use some extra stuff in addition. It is true that I’ve also selected a theory of consciousness, but the way I’ve done so doesn’t incur an extra complexity penalty, whereas you might, if you end up going with something else than I do.
My argument is that Occams’ razor is about accepting the simplest theory that is consistent with the situation. In my mind it seems like you are allowing simplicity to let you ignore the fact that your theory is inconsistent with the situation, which is not how I believe Occam’s Razor is suppose to work. So it’s not just about the cost, but about whether this is even a sensible way of reasoning.
We agree that Occam’s razor is about accepting the simplest theory that is consistent with the situation. We disagree about whether the theory is inconsistent with the situation.
What is the claimed inconsistency? So far my perception of your argument has been that you insist we could make a distinction. When you described your abstraction approach, you said that we could well choose naive functionalism as our theory of consciousness.
Well, in any case, the claim I’m raising for consideration is that these two may turn out to be the same. The argument for the claim is the simplicity of merging the decision theory phenomenon with the anthropic phenomenon.