I am not familiar with those concepts. References would be appreciated. 🙏
snarles
- It seems obvious that your change in relationship with suffering constitutes a kind of value shift, doesn’t it? - This is not obvious to me. In the first place, I never had the value “avoid suffering” even before I started my practices. Since before I even knew the concept of suffering, I have always had the compulsion of avoiding suffering, but the value to transcend it. - What’s your relationship with value drift? Are you unafraid of it? That gradual death by mutation? The infidelity of your future self? - I am afraid of value drift, but I am even more afraid that the values that I already have are based on incoherent thinking and false assumptions, which, once exposed, would lead me to realize that I have been spending my life in pursuing the wrong things entirely. - Because I am afraid of both value drift and value incoherence, I place a high priority in learning how I can upgrade my understanding of my own values, while at the same time being very cautious about which sources I trust and learn from. I cannot seek to improve value coherence without making myself vulnerable to value drift. Therefore, I only invest in learning from sources authored by people who appear to be aligned with my values. - Do you see it as a kind of natural erosion, a more vital aspect of the human telos than the motive aspects it erodes? - No, I do not think that value drift is inevitable, nor do I think the “higher purpose”, if such a thing exists, involves constantly drifting. My goal is to achieve a state of value constancy. 
- Anyone, it seems, can have the experience of “feeling totally fine and at ease while simultaneously experiencing intense … pain”[1]: - It would greatly please me if people could achieve a deeper understanding of suffering just by taking analgesics. If that were the case, perhaps we should encourage people to try them just for that purpose. However, I’m guessing that the health risks, especially cognitive side-effects (a reduction of awareness that would preclude the possibility of gaining any such insight), risks of addiction and logistical issues surrounding the distribution of drugs for non-medical purposes will render infeasible any attempt to systematically employment of analgesics for the purpose of spiritual insight. In all likelihood, we’ll be stuck with the same old meditations and pranayamas and asanas for a while. - But the reason you bring up the topic of analgesics, if I am not mistaken, is to challenge the legitimacy of my insight by an argument that boils down to: “the experience you describe could be obtained through drugs, so it must not be that profound”. I do not know if you were also expecting to rely on a negative halo effect of “drug usage” to augment your rhetoric, but as you may have guessed from the preceding paragraph, my opinion is that the negative connotations of drug-induced states is due to irrational associations. If we ignore the drugs, then the remaining constituent of your rhetoric is the underlying assumption that “any easily obtained insight must be trivial.” That is far from the truth. I believe there are many simple things that people could do, which would profoundly increase their wisdom at a very low cost [1]. But precisely because these things are so simple, people wouldn’t take them seriously even if somebody suggested it to them. (The chance is a bit higher, but still not terribly high, if it’s said by the teacher of an expensive paid workshop, or their guru, or their psychotherapist. But the most effective way so far to get people to do these kinds of simple things is to integrate them inside some elaborate social ritual.) - But concluding from this that “there’s no such thing as suffering” is a conceptual confusion of the highest order—and not some insight into deep Truth. - I agree that the statement “there’s no such thing as suffering” is false, and not any kind of insight into deep Truth. - That’s because I am not claiming that “there’s no such thing as suffering.” I claim to have an insight which can’t be described in words, but the best verbal description of this insight is something like “suffering is an illusion.” - I don’t even consider this insight to be particularly deep. Like you said, maybe you could get it by taking painkillers. Certainly not an insight into deep Truth. That is not to say that I don’t take other people’s suffering seriously—far from it, it concerns me greatly. However if you were to compare the difficulty of understanding suffering and the difficulty of understanding consciousness, I think suffering is a far easier problem to resolve. - ETA: And it seems to me to be far from obvious, that it is good or desirable to voluntarily induce in yourself a state akin to a morphine high or a lobotomy… especially if doing so has the additional consequence of leading you into the most elementary conceptual errors - You are wise to be cautious, because neural self-modification could potentially lead to states where one loses all concern for one’s own well-being. However, just because an altered state of consciousness is similar to a drug-induced state or a state of neurological impairment doesn’t, by itself, imply that it should be avoided. It all depends on whether you’ve taken appropriate steps to control the risk (e.g. by only accessing the state under the guidance of an experienced teacher), and what insight you stand to gain by experiencing that state. The states of consciousness may be the same, but the intentions and degree of control makes all the difference. Recreational drug users pursue these states with little or no understanding of the process, little or no control over the outcomes, and out of the intention of thrill-seeking, social bonding, or alleviating boredom. Mystics pursue these states, often backed up by a tradition which has precise knowledge of how to attain these states, are equipped with mental tools to control the process, and seek these states with the intention of obtaining insights that will be of enduring value to their lives. - [1] Such as: take just one hour to think and reflect. Dance with abandon. Volunteer to take care of children. Go to the forest, just to look around. Play in the mud. Fast for one day. Learn and go see where your food comes from, how your clothes are made. 
- I would assign that a probability less than 0.1, and that’s because I already experienced some insights which defy verbal transmission. For instance, I feel that I am close to experientially understanding the question of “what is suffering?” The best way I can formulate my understanding into words is, “there is no such thing as suffering. It is an illusion.” I don’t think additional words or higher-context instructions would help in conveying my understanding to someone who cannot relate to the experience of feeling totally fine and at ease while simultaneously experiencing intense physical and emotional pain. - I don’t think Buddha ever attempted to describe the Truth in words. Sometimes he would give a koan to a student who just needed a little push. But most of his sutras were for giving the instructions for how students could work at the Truth, and also just practical advice on how to live skillfully. 
- I’m reducing my subjective probability that you will abandon rationality... - I suppose what you are attempting is similar to what Buddha did in the first place. The sages of his time must have felt pained to see their beautiful non-dualism sliced and diced into mass-produced sutras, rather than the poems and songs and mythology which were, up until then, the usual vehicle of expression for these truths. - I guess I’m just narcissistic enough to still be a Quinean naturalist and say ‘yep, that is also me.’ - Considering God to be part of yourself is very elevated and good. The only problem is the things that you don’t consider to be part of yourself. So I guess what I am saying is that you should amplify your narcissism :) 
- The truths of General Relativity cannot be conveyed in conventional language. But does one have to study the underlying mathematics before evaluating its claims? - Just as there exists a specialized language that accurately conveys General Relativity, there similarly exists a specialized language (mythological language) for conveying mystical truths. However, I think the wrong approach would be to try to understand that language without having undergone the necessary spiritual preparation. As St. Paul says in 1 Corinthians 2:14 - The natural person does not accept the things of the Spirit of God, for they are folly to him, and he is not able to understand them because they are spiritually discerned. - This echoes countless similar statements in other traditions, the most famous (and probably oldest) being “the Tao that can be told is not the eternal Tao.” - That is not to say that one cannot approximate the truth by means of analogy, You can approximately capture the truth General Relativity in the statement, “gravity bends space.” This approximation of the truth is useful because it allows you to understand certain consequences of that truth, such as gravitational lensing. Hence, even someone untrained in physics can be convinced of General Relativity, because they understand an approximate version of it, which in turn intuitively explains phenomena such as the Hubble Space telescope photo of a horseshoe Einstein ring. - Likewise, approximations to the Truth abound in the various spiritual traditions. “God exists, and is the only entity that exists. I am God, you are God, we are all one being” is one such approximation [1]. It is an approximation because the words “you” and “God” are not well-defined. My own definition of these terms has been continually evolving as I progress in spirituality. - One consequence of this approximation is that the feeling that we are separate individuals must be flawed. I will take another analogy from physics: the four fundamental forces are in fact different aspects of the same unified force, but they become distinct at lower energy levels. At higher energy levels, they become clearly unified. - Similarly, I have experienced that at high levels of awareness, my feeling of distinctness from other people and from the rest of the universe is reduced. It starts to seem like “my thoughts” and “my feelings” are not mine, they are just the thoughts generated by one particular mind, and furthermore I feel like I can start to feel what others are feeling. I have encountered others who appear to be even higher on the “energy scale.” One day I was greeted extremely warmly out of the blue by a homeless man who was just working on the street. I responded to him in kind. - A consequence of the statement “we are all One” is that we should be able to experience this unity. If there exist people who experience this as a reality (and not just as an altered state,) they should be able to detect the thoughts and feelings of others around them. I find it plausible that such people exist, both from my reading and my encounters with people such as the homeless man. But it does bother me that there exists no known scientific mechanism that would enable us to read each other’s minds, other than some very speculative ideas about consciousness being based on quantum phenomenon. - I do not expect that this particular example should be particularly convincing to a skeptic. I know that there exist non-mystical theories for explaining non-dualistic states, for instance Jill Bolte Taylor’s theory that it is caused by a switch in dominance from the left to the right hemisphere. What ultimately convinced me to ‘cross over’ was not a single experience or insight but rather the aggregation of many experiences: listing all of these would distract from the point I am trying to making. I suggest any curious individuals to consult the much richer collection of data on personal experiences that exists in the religious studies literature; my own experiences are nothing special in comparison. - My aim for now was just to address your question about how a claim can be evaluated in the absence of the necessary cognitive framework to understand its content. To summarize, one limited form of evaluation can be obtained by learning of the different approximations of the truth, and then evaluating consequences of those approximations in comparison to empirical data. - That said, at a certain stage of maturity, one who is seeking the truth should stop bothering with approximations, because the approximations will not give you that necessary cognitive framework to really understand the truth. Reading popular science books can never give you the understanding that a Ph. D. physicist has obtained from rigorous training. You have to “sit down and learn the math”, or in the case of spirituality, to follow your chosen path. If the approximations have any value, it would only be in giving the hope that the skeptic needs before they can make the commitment to seek the real thing. - [1] Another approximation, equally valid in my view, would be that “God created you and loves you.” Note that combining with the first approximation yields the near-tautology that “You love yourself.” Still, even a statement such as “God loves you” which might be parsed to something logically trivial can take a new profoundness to one who has undergone the proper cultivation. - Another approximation is “there is no self.” Or that “everything is nothing.” Combine those two, and you get “everything is self.” The name of the Hindu god Shiva, literally means “No-thing.” 
- I started out as a self-identified rationalist, got fascinated by mysticism and ‘went native.’ Ever since, I have been watching the rationality from the sidelines to see if anyone else will ‘cross over’ as well. - I predict that if Romeo continues to work on methods for teaching meditation, that eventually he will also ‘go mystical’ and publicly rescind his claim that all perceived metaphysical insights can be explained as pathological disconnects with reality caused by neural rewiring. Conditional on his continuing to teach, I predict with 70% probability that it occurs within the next 10 years. - There are two theories that both lead me to make this prediction, which I call Theory M and Theory D. I ascribe probability 0.8 to Theory M and 0.15 to Theory D. I ascribe a probability of 0.02 to their intersection. - Theory M (‘Mystical’) is that there exists a truth that cannot be expressed in conventional language, and that truth has been rediscovered independently by hundreds of seekers throughout history, and that many of the most established spiritual traditions—Taoist, Buddhist, Hindu, Kabbalistic, Christian, Sufi, and many others—were founded for the purpose of disseminating that same truth. If Theory M is true, and conditional on Romeo continuing to teach, I predict that his role in the community will motivate him to deepen his practice and will also bring him in contact with teachers from other spiritual practices. These experiences will catalyze his own mystical insights. - Theory D (‘Delusional’) is that spiritual seekers expose themselves to mechanisms of self-delusion that are so strong that they would even convince someone who is initially highly identified with rationality and skepticism to start assigning a high probability that Theory M is true. - The main reason I place so much confidence in Theory M is similar to the reason why string theorists place so much confidence in string theory: so many aspects of reality that were previously baffling to me are suddenly very comprehensible and elegant when I understand them through Theory M. Secondary reasons are my personal experiences, but I concede that some aspect of these could be due to delusions so I will not defend them here. - The reason I have so much confidence in Theory D is that I consider myself to have a high capacity for rational thinking, that I am very well-informed in the ideas of the rationality community and I consider myself knowledgeable about a variety of disciplines necessary for understanding reality: neuroscience, psychology, sociology, statistics, philosophy, biology, physics, religious history. I hold a Ph. D. in Statistics and am currently working as a research scientist in a research institution specializing in mental health research. And yet I have daily experiences which reaffirm my confidence in Theory M. If theory M is not true, then I would conclude that the delusion I am suffering is incredibly strong, lies outside of all mental diseases which I know about and evades my own attempts to self-diagnose. I admit that I have little incentive to eliminate the delusion since it makes my life so pleasant. I admit that I perhaps could have protected myself from delusion even more solidly by embedding myself deeply in a physical social community that was dedicated to rationality. I have not done that. However, I am surrounded by scientists at my workplace. - If theory M is true, then I warmly wish all of you reading to discover its truth. If it is not true, I wish for my delusion to be eliminated so that I can stop living in a pleasant fantasy and re-align myself with rationalists who are trying to maximize their positive impact on the world. - I expect that my views will not find much support among you, but I challenge you to judge my claims by the professed standards of your own community. If you feel yourself strongly disagreeing with me, I challenge you to engage me as a fellow rationalist (or to point out how I have violated the community rules of discourse) rather than succumbing the knee-jerk reaction of dismissing a set of beliefs which make you uncomfortable. 
- Cool, I will take a look at the paper! 
- Great comment, mind if I quote you later on? :) - That said, if you have example problems where a logically omniscient Bayesian reasoner who incorporates all your implicit knowledge into their prior would get the wrong answers, those I want to see, because those do bear on the philosophical question that I currently see Bayesian probability theory as providing an answer to—and if there’s a chink in that armor, then I want to know :-) - It is well known where there might be chinks in the armor, which is what happens when two logically omniscient Bayesians sit down to play a a game of Poker? Bayesian game theory is still in a very developmental stage (in fact, I’m guessing it’s one of the things MIRI is working on) and there could be all kinds of paradoxes lurking in wait to supplement the ones we’ve already encountered (e.g. two-boxing.) 
- If the game is really working like they say it is, then the frequentist is often concentrating probability around some random psi for no good reason, and when we actually draw random thetas and check who predicted better, we’ll see that they actually converged around completely the wrong values. Thus, I doubt the claim that, setting up the game exactly as given, the frequentist converges on the “true” value of psi. If we assume the frequentist does converge on the right answer, then I strongly suspect either (1) we should be using a prior where the observations are informative about psi even if they aren’t informative about theta or (2) they’re making an assumption that amounts to forcing us to use the “tortured” prior. I wouldn’t be too surprised by (2), - The frequentist result does converge, and it is possible to make up a very artificial prior which allows you to converge to psi. But the fact that you can make up a prior that gives you the frequentist answer is not surprising. - A useful perspective is this: there are no Bayesian methods, and there are no frequentist methods. However, there are Bayesian justifications for methods (“it does well based in the average case”) and frequentist justifications (“it does well asymptotically or in a minimax sense”) for methods. If you construct a prior in order to converge to psi asymptotically, then you may be formally using Bayesian machinery, but the justification you could possibly give for your method is completely frequentist. 
- Ok. So the scenario is that you are sampling only from the population f(X)=1. - EDIT: Correct, but you should not be too hung up on the issue of conditional sampling. The scenario would not change if we were sampling from the whole population. The important point is that we are trying to estimate a conditional mean of the form E[Y|f(X)=1]. This is a concept commonly seen in statistics. For example, the goal of non-parametric regression is to estimate a curve defined by f(x) = E[Y|X=x]. - Can you exhibit a simple example of the scenario in the section “A non-parametric Bayesian approach” with an explicit, simple class of functions g and distribution over them, for which the proposed procedure arrives at a better estimate of E[ Y | f(X)=1 ] than the sample average? - The example I gave in my first reply (where g(x) is known to be either one of two known functions h(x) or j(x)) can easily be extended into the kind of fully specified counterexample you are looking for: I’m not going to bother to do it, because it’s very tedious to write out and it’s frankly a homework-level problem. - Is the idea that it is intended to demonstrate, simply that prior knowledge about the joint distribution of X and Y would, combined with the sample, give a better estimate than the sample alone? - The fact that prior information can improve your estimate is already well-known to statisticians. But statisticians disagree on whether or not you should try to model your prior information in the form of a Bayesian model. Some Bayesians have expressed the opinion that one should always do so. This post, along with Wasserman/Robbins/Ritov’s paper, provides counterexamples where the full non-parametric Bayesian model gives much worse results than the “naive” approach which ignores the prior. 
- Update from the author: - Thanks for all of the comments and corrections! Based on your feedback, I have concluded that the article is a little bit too advanced (and possibly too narrow in focus) to be posted in the main section of the site. However, it is clear that there is a lot of interest in the general subject. Therefore, rather than posting this article to main, I think it would be more productive to write a “Philosophy of Statistics” sequence which would provide the necessary background for this kind of post. 
- The confusion may come from mixing up my setup and Robins/Ritov’s setup. There is no missing data in my setup. - I could write up my intuition for the hierarchical model. It’s an almost trivial result if you don’t assume smoothness, since for any x1,...,xn the parameters g(x1)...g(xn) are conditionally independent given p and distributed as F(p), where F is the maximum entropy Beta with mean p (I don’t know the form of the parameters alpha(p) and beta(p) off-hand). Smoothness makes the proof much more difficult, but based on high-dimensional intuition one can be sure that it won’t change the result substantially. - It is quite possible that estimating E[Y] and E[Y|event] are “equivalently hard”, but they are both interesting problems with different quite different real-world applications. The reason I chose to write about estimating E[Y|event] is because I think it is easier to explain than importance sampling. 
- I didn’t reply to your other comment because although you are making valid points, you have veered off-topic since your initial comment. The question of “which observations to make?” is not a question of inference but rather one of experimental design. If you think this question is relevant to the discussion, it means that you neither understand the original post nor my reply to your initial comment. The questions I am asking have to do with what to infer after the observations have already been made. 
- By “importance sampling distribution” do you mean the distribution that tells you whether Y is missing or not? - Right. You could say the cases of Y1|D=1 you observe in the population are an importance sample from Y1, the hypothetical population that would result if everyone in the population were treated. E[Y1], the quantity to be estimated, is the mean of this hypothetical population. The importance sampling weights are q(x) = Pr[D=1|x]/p(x) where p(x) is the marginal distribution (ie you invert these weights to get the average), the importance sampling distribution is the conditional density of X|D=1. 
- I will go ahead and answer your first three questions - Objective Bayesians might have “standard operating procedures” for common problems, but I bet you that I can construct realistic problems where two Objective Bayesians will disagree on how to proceed. At the very least the Objective Bayesians need an “Objective Bayesian manifesto” spelling out what are the canonical procedures. For the “coin-flipping” example, see my response to RichardKennaway where I ask whether you would still be content to treat the problem as coin-flipping if you had strong prior infromation on g(x). 
- MaxENT is not invariant to parameterization, and I’m betting that there are examples where it works poorly. Far from being a “universal principle” it ends up being yet another heuristic joining the ranks of asymptotic optimality, minimax, minimax relative to oracle, etc. Not to say these are bad principles—each of them is very useful, but when and where to use them is still subjective. 
- That would be great if you could implement a Solomonoff prior. It is hard to say whether implementing an approximate algorithmic prior which doesn’t produce garbage is easier or harder than encoding the sum total of human scientific knowledge and heuristics into a Bayesian model, but I’m willing to bet that it is. (This third bet is not a serious bet, the first two are.) 
 
- It is worth noting that the issue of non-consistency is just as troublesome in the finite setting. In fact, in one of Wasserman’s examples he uses a finite (but large) space for X. 
- Yes, I think you are missing something (although it is true that causal inference is a missing data problem). - It may be easier to think in terms of the potential outcomes model. Y0 is the outcome is no treatment, Y1 is the outcome of treatment, you only ever observe either Y0 or Y1, depending on whether D=0 or 1. Generally you are trying to estimate E[Y1] or E[Y0] or their difference. - The point is that the quantity Robbins and Wasserman are trying to estimate, E[Y], does not depend on the importance sampling distribution. Whereas the quantity I am trying to estimate, E[Y|f(X)], does depend on f. Changing f changes the population quantity to be estimated. - It is true that sometimes people in causal inference are interested in estimating things like E[Y1 - Y0|D], ” e.g. the treatment effect on the treated.” However this is still different from my setup because D is a random variable, as opposed to an arbitrary function of the known variables like f(X). 
- My example is very similar to the Robbins/Wasserman example, but you end up drawing different conclusions. Robbins/Wasserman show that you can’t make sense of importance sampling in a Bayesian framework. My example shows that you can’t make sense of “conditional sampling” in a Bayesian framework. The goal of importance sampling is to estimate E[Y], while the goal of conditional sampling is to estimate E[Y|event] for some event. - We did talk about this before, that’s how I first learnt of the R/W example. 
Thanks for the link MakoYass.
I am familiar with the concept of superrationality, which seems similar with what you are describing. The lack of special relationship between observer moments—let’s call it non-continuity—is also a common concept in many mystical traditions. I view both of these concepts as different than the concept of unity, “we are all one”.
Superrationality combines a form of unity with a requirement for rationality. I could think that “we are all one” without thinking that we should behave rationally. If I thought, “we are all one” and and also that “one ought to be rational”, the behavior that results might be described as superrational.
Non-continuity is orthogonal to unity. I could think “we are distinct” and still think “I only exist in the moment”. This might have been the view of Heraclitus. But I could also think “we are one” and also think “we only exist in the moment.” This might be a natural view to have if you think of the universe as an amplitude distribution over a large number of quantum states that is evolving according to some transition function. If you identify with a particular quantum state, then there is no sense in which you have a unique “past” or “future” path, because all “moments” (states) are concurrent: the only thing that is changing is the amplitude flow.