Donald Hobson gives a comment below explaining some reasoning around dealing with unknown unknowns, but it’s not a direct answer to the question, so I’ll offer that.
The short answer is “yes”.
The longer answer is that this is one of the fundamental considerations in approaching AI alignment and is why some organizations, like MIRI, have taken an approach that doesn’t drive straight at the object-level problem and instead tackles issues likely to be foundational to any approach to alignment that could work. In fact you might say the big schism between MIRI and, say, OpenAI, is that MIRI places greater emphasis on addressing the unknown whereas OpenAI expects alignment to look more like an engineering problem with relatively small and not especially dangerous unknown unknowns.
(note: I am not affiliated with either organization so this is an informed opinion on their general approaches, and also note that neither organization is monolithic and individual researches vary greatly in their assessment of these risks.)
My own efforts addressing AI alignment are largely about addressing these sorts of questions, because I think we still poorly understand what alignment even really means. In this sense I know that there is a lot we don’t know, but I don’t know all of what we don’t know that we’ll need to (so known unknown unknowns).
I’ve not heard of anyone trying this. I imagine it’s a bad idea for a couple reasons:
most would-be patients live in countries where what you want is illegal
those would-be patients’ access to cryonics depends largely on cryonics organizations being in good standing with the local government in case they die unexpectedly or don’t want to travel far
even if you found a jurisdiction where you could do what you want it might have repercussions back where the organization is based and stores the brains/bodies because the home country/state/municipality might forbid import due to how the brain/body was obtained
the countries where would-be patients live might forbid them from contracting for such a service in a location where it is legal (compare the way some countries require their citizens follow national laws when abroad, and that such citizens can be prosecuted for actions they took in foreign nations)
I think if you wanted to do this you would need to find a jurisdiction that would be okay with this and be also otherwise suitable for basing a cryonics operation. My guess is the set of places that meet both criteria is empty. And this ignoring the patient access issues I mentioned. Since the current market for cryonics is quite small, my guess is that there just isn’t enough demand to make this happen, since I’m sure with enough demand you’d have the money to make a favorable jurisdiction suitable for basing a cryonics operation in.
Aside from the case where you may have access to euthanasia, the answer is no. The issue is that cryonics, where it is legally allowed, is considered a mortuary procedure rather than a medical procedure. The reasons for doing this are a bit involved, but can be summed up by saying it was easier to get legal approval for a novel procedure on dead bodies than on live ones.
In theory it seems likely you could get a better preservation by anesthetizing a live patient, replacing their blood, and slowing cooling the body and letting them die slowly while freezing rather than dying first and then starting the cooling process, but this is extremely legally complicated because it both involves a live patient, so it’s a medical procedure, and it kills the patient, so it’s euthanasia (or so we hope; if it wasn’t painless you definitely wouldn’t be allowed to do it!). This would require a level of acceptance of cryonics we have no reason to believe is forthcoming.
So we are left with the case where you have to die first before being cryo-preserved. However, it’s even a bit more complicated than that, because how you die matters. Mortuary procedures can’t begin until a patient has a completed death certificate from a doctor in most places, and in some cases you can’t formally complete that process without an autopsy to determine cause of death, especially in cases that look suspicious like a murder or suicide. In fact, without modern assisted suicide laws, suicide generally requires an autopsy by law, which will of course ruin your chance of preservation.
The only known, reliable way of doing what you propose (and I know of cases in there past where it successfully happened), is that a patient with a terminal illness entered a hospice near a cryonics facility with a cryonics team on standby and then refused all food and water. It takes several days to die this way depending on body composition, and at time of death the doctor on staff can quickly mark that you died of natural causes (I don’t entirely understand why this doesn’t count as suicide, but it apparently doesn’t) and the procedure can begin within minutes. That, to the best of my knowledge, is the state-of-the-art in cryonic preservation: cryocide by starvation/dehydration.
Right, both of these views on truth, traditional rationality and postmodernism, result in theories of truth that don’t quite line up with what we see in the world but in different ways. The traditional rationality view fails to account for the fact that humans judge truth and we have no access to the view from nowhere, so it’s right that traditional rationality is “wrong” in the sense that it incorrectly assumes it can gain privileged access to the truth of claims to know which ones are facts and which ones are falsehoods. The postmodernist view makes an opposite and only slightly less equal mistake by correctly noticing that humans judge truth but then failing to adequately account for the ways those judgements are entangled with a shared reality. The way through is to see that both there is something shared out there that there can in theory be a fact of the matter of and also realizing that we can’t directly ascertain those facts because we must do so across the gap of (subjective) experience.
As always, I say it comes back to the problem of the criterion and our failure to adequately accept that it demands we make a leap of faith, small though we may manage to make it.
The standard rebuttal here is that even if a superintelligent AI system is not goal directed, we should be concerned that the AI will spontaneously develop goal directed behavior because it is instrumentally valuable to doing whatever it is doing (and is not “doing whatever it is doing” a “goal”, even if the AI does not conceive of it as a goal, the same way as the calculator has a “goal” or purpose, even if the calculator is unaware of it). This is of course contingent on it being “superintelligent”.
For what it’s worth this is also the origin, as I recall it, of concerns about paperclip maximizers: you won’t build an AI that sets out to tile the universe with paperclips, but through a series of unfortunate misunderstandings it will, as a subagent or an instrumental action, end up optimizing for paperclips anyway because it seemed like a good idea at the time.
Finally, the human expresses a judgement about the states of M, mentally categorising a set of states as better than another. This is an anti-symmetric partial function J:S×S→R, a partial function that is non trivial on at least one pair of inputs.
I continue to be unsure if we can even claim anti-symmetry of the preference relation. For example, let SA be the state “I eat an apple” and SO the state “I eat an orange”, and today J(SA,SO) but tomorrow J(SO,SA), seemingly violating antisymmetry. Now of course maybe I misunderstood my own understanding of SA and SO such that they actually included a hidden-to-my-awareness property conditioning them on time or something else such that anti-symmetry is not violated, but the fact that there may be some property on the states that I didn’t think about at first that salvages anti-symmetry makes my worry that this model is confused in this and other ways because it was so easily to think of and construct something that seemingly violated the property but then on further reflection seems like it doesn’t.
That’s not a slam-dunk argument against this formalization. This is more me sharing some thoughts on my reservations of using this type of model. If we can so easily fail to notice something relevant about how we formalize some simple preferences, what else may we be failing to notice? And if so what happens if we build an AI based in part on this formalization? Will it also fail to account for relevant aspects of how human preferences are calculated because they are not easily visible to us in the model, or is that a failure of humans to understand themselves rather than the model? These are the things I’m wrestling with lately.
I also have some reservations about whether we can even really model humans has having discrete preferences that we can reason about in this way without getting ourselves into trouble and confused. Not to say that I doubt that this model often works, only that I worry that it’s missing some important details that are relevant for alignment and without accounting for them we will fail to produce aligned AI. I worry this because there doesn’t seem to be anything in the human mind that actually is a preference; preferences are more like reifications of a pattern of action that appears in humans. Getting closer to understanding the mechanism that produces the pattern we interpret as preferences seems valuable to me in this work because I worry we’re missing crucial details when we reason about preferences at the level of detail you pursue here.
I apologise for my simplistic understanding and definitions of moral realism. However, my partial experience in this field has been enough to convince me that there are many incompatible definition of moral realism, and many arguments about them, so it’s not clear there is a single simple thing to understand. So I’ve tried to define is very roughly, enough so that the gist of this post makes sense. ↩︎
I think this is mostly because there are lots of realist and anti-realist positions and they cluster around features other than their stance on realism, i.e. whether or not moral facts exist, or said less densely, whether or not moral claims can be true or false. The two camps seems to have a lot more going on, though, than is captured by this rather technical point, as you point out. In fact, most of the interesting debate is not about this point, but about things that can be functionally the same regardless of your stance on realism, hence your noticing how realists and anti-realists can look like each other in some cases.
(My own stance is to be skeptical, since I’m not even sure we have a great idea of what we really mean when we say things are true or false. It seems like we do at first, but if we poke too hard the whole thing starts to come apart at the seams, which makes it a bit hard to worry too much about moral facts when you’re not even sure about facts in the first place!)
Recently Robin Hanson posted about the difference between fighting along the frontier vs. expanding the frontier. It’s a well known point, but given I was recently reminded of it it’s salient to me, and it seems quite relevant here.
When we ask if human values have “improved” or “degenerated” over time we have to have some way of judging increase or decrease. One way to understand this is to check if humans get to realize more value, as judged by each individual and then normalized and aggregated, along certain dimensions within the multidimensional space of values. To take your example of “engagement with extended family”, most moderns have less of this than ancients did both on average and it seems at maximum, i.e. modern systems preclude as much engagement as was possible in the past, such that a modern person maximally engaged with their extended family is less engaged than was maximally possible in the past. This seems to be traded-off, though, against greater freedom from need to engage with extended family because alternative systems allow a person to fulfill other values without reliance on extended family. As a result this looks much like a “fight”, i.e. a trade-off along the value frontier of one value to another.
You give the example of reduced slavery being a general benefit, but I think we can tell a similar story that it is a trade-off. We trade-off individual choice of labour use, living conditions, etc. for the right of the powerful to make those decisions for the less powerful. In this sense the reduction in slavery takes away something of value from someone—the would be slaveholders—to give it to someone else—the would be slaves. We may judge this to be an expansion or value efficiency improvement under two conditions (which change slightly what we mean by expansion):
(1) there is more value overall, i.e. we traded less value away than we got back in return
(2) there is more value overall along all dimensions
I would argue that case (1) is really still a fight though because we are still making a tradeoff, we are just moving to somewhere more efficient along the frontier. From this perspective the end of slavery was not an expansion of values, but it was a trade-off for more value.
But if we are so strict, is anything truly a pure expansion? This seems quite tricky, because humans can value arbitrary things, and so for every action that increases some value it would seem that we are necessarily decreasing the ability the realize some counter-value. For example, it might seem that something like “greater availability of calories” would result in pure value expansion, assuming we can screen off all the complicated details of how we make more calories available to humans and how that process will affect values. But suppose you value scarcity of calories, maybe even directly, then for you this will be a fight and we must interpret an increase in the availability of calories as a trade-off rather than as a pure expansion in values.
This is potentially troubling because it means there’s no universal way to judge moral progress if there can be no expansion without some contraction somewhere. It would seem that there must always be contraction of something, even if it is an efficient contraction that generates more value than it gives up.
So in the end I guess I am forced to (mostly) agree with your assessment even though you frame it in a way that seem foreign to me. It feels foreign to me because it seems every improvement is also a degeneration and vice versa, and the relevant question of improvement is mostly whether or not we are generating more value in aggregate (an efficiency improvement) if we want to be neutral on which value dimensions to optimize along.
I actually don’t love the idea of making aggregate value something we optimize for, though, because I worry about degenerate cases like highly optimizing along a single value dimension at the expense of all others such that it results in an overall increase in value but in a way we wouldn’t want, even though arguably if we were measuring value correctly in this system such a situation would be impossible because it would be factored in by a decrease in whatever value we had that was being traded off against that made us dislike the “optimization”.
I instead continue to think that value is a confused concept that we need to break apart and reunderstand, but I’m still working on deconfusing myself on this, so I have nothing additional to report in that direction for now.
Some years ago I got interested in the Yi Jing after reading Philip K. Dick’s The Man in the High Castle, which features the Yi Jing prominently where the book within the book (which is the alternate dimension/history version of The Man in the High Castle) is written by using the Yi Jing to make plot decisions and one of the characters relies on it heavily to navigate life. I went on to write a WebOS Yi Jing phone app so I could more easily consult it from my phone and played around with it myself.
My experience of it was mostly that it offered me nothing I wasn’t already doing on my own, but I could see how it would have been helpful to others who lack my particular natural disposition to letting my mind go quiet and seeing what it has to tell me. As you note, it seems a good way to be able to step back and consider something from a different angle, and to consider different aspects of something you may be currently ignoring. The commentary on the Yi Jing is carefully worded such that it’s more about the decision generation process than the decision itself, and when used well I think can result in the sort of sudden realization of the action you will take the same way my sitting quietly and waiting for insight does.
I also know a decent number of rationalists who enjoy playing with Tarot cards for seemingly this same reason. Tarot works a bit different because it more tells a story than highlights a virtue, but I think like you much of the value comes from placing an random framing on events, injecting noise into an otherwise too stable algorithm, and helping people get out of local maxima/minima traps.
I’d also include rubber ducking as a modern divination method. I think it does something similar, but by using a different method to get you to see things more clearly and find out what you already implicitly knew but weren’t making explicit enough to let it have an impact on your actions. My speculation at a possible mechanism of action here is something like what happens when I sit quietly with a decision and wait for an answer: you let the established patterns of thought get out of the way and let other things come through so you can consider them, in part because you can generate your own internal noise if you stop trying to direct your thought. But not everyone finds this easy or possible, in which case more traditional divination methods with external noise injection are likely useful.
Actually, good thing you asked, because I gave wrong information in my original comment. Chisholm is an expert on the problem of the criterion, but I was actually thinking of William Alston in my comment. Here’s two papers, one by Alston and one by another author that I’ve referenced in the past and found useful:
William P. Alston. Epistemic Circularity. Philosophy and Phenomenological Research, 47(1):1, sep 1986.
Jonathan Dancy. Ethical Particularism and Morally Relevant Properties. Mind, XCII(368):530– 547, 1983.
You might like the work of Roderick Chisholm on this topic. He spent a good deal of effort on addressing the issue of epistemic circularity (the issue created by the problem of the criterion) and gives what is, in my opinion, one of the better and more technical treatments of the topic. His work also lets us make a distinction between particularism (making minimal leaps of faith) and pragmatism (making any leaps of faith), which I find useful because in practice most people seem to be pragmatists (they have other things to do than wrestle with epistemology) while thinking they are particularists because their particular leaps of faith (the facts they assume without justification) are intuitive to them and they can’t think of a way to make them smaller.
First, let me start by saying this comment is ultimately a nitpick. I agree with the thrust of your position and think in most cases your point stands. However, there’s no fun and nothing to say if I leave it at that, so grab your tweezers and let’s get that nit.
Even if Hypothesis H is true, it doesn’t have any decision-relevant implications,
So to me there seems to be a special case of this that is not rationalization, and that’s in cases where one fact dominates another.
By “dominates” I here mean that for the purpose for which the fact is being considered, i.e. the decision about which the truth value of H may have relevant implications, there may be another fact about another hypothesis, H’, such that if H’ is true or H’ is false then whether or not H is true or false will have no impact on the outcome because H’ is relatively so much more important than H.
To make this concrete, consider the case of the single-issue voter. They will vote for a candidate primarily based on whether or not that candidate supports their favored position on the single issue they care about. So let’s say Candidate Brain Slug is running for President of the World on a platform whose main plank is implanting brain slugs on all people. You argue with your single-issue voter friend they should not vote for Brain Slug because it will put a brain slug on them, but they say even if that’s true, it’s not relevant to their decision, because Brain Slug also supports a ban on trolley switches, which is your friend’s single issue.
Now maybe you think your friend is being stupid, but in this case they’re arguably not rationalizing. Instead they’re making a decision based on their values that place such a premium on the issue of trolley switch bans that they reasonably don’t care about anything else, even if it means voting for President Brain Slug and its brain slug implanting agenda.
To my reading, all of this seems to pretty well match a (part of) the Buddhist notion of dependent origination, specifically the way senses beget sense contact (experience) begets feeling begets craving (preferences) begets clinging (beliefs/values) begets being (formal ontology). There the focus is a bit different and is oriented around addressing a different question, but I think it’s tackling some of the same issues via different methods.
1) The bedrock of our values are probably the same for any human being, and any difference between conscious values is either due to having seen different data, but more likely due to different people situationally benefitting more under different moralities. For example a strong person will have “values” that are more accepting of competition, but that will change once they become weaker.
I continue to find minimization of confusion while maintaining homeostasis around biologically determined set points a reasonable explanation for the bedrock of our values. Hopefully these ideas will coalesce well enough in me soon to be able to write something more about this than that headline.
I agree there are other possible interpretations; mainly wanted to document for myself in case I wanted to reference it later, and it seems potentially relevant, especially if we wanted to go back and interview the voters or analyze the comments.
This is a pretty interesting idea. I can imagine this being part of a safety-in-depth approach: not a single method we would rely on but one of many fail-safes along with sandboxing and actually trying to directly address alignment.
Additional evidence of boo/yay voting culture: Ben Hoffman’s “Downing children are rare” post (on EAF, on LW, my comment about votes)
To give an additional example of tight feedback loops being helpful, I’ve been taking Alexander lessons for nearly a year. Each lesson consists of 30 minutes of me doing movements (although sometime the “movement” is holding a posture, like sitting, standing, standing on toes, or crouching) and 30 minutes of “table time”, i.e. I lay on a massage table while my teacher users her hands to very subtly suggest changes to my posture. Although I could go on about how great this has been and how much value I get from it, what I mostly want to say about it here is that it depends very much on tight feedback loops to perform a kind of reinforcement learning. As I make a movement she uses her hands and some taught jargon (part of the technique involves associating jargon with postures and movements so you can easily call them up on command by saying or thinking the jargon) to adjust what I do, giving me rapid feedback on how I’m doing. The result was that within the first 10 hours of training I dramatically improved my posture and reduced posture and movement related pain.
For comparison, overlapping with learning Alexander technique I’ve been more deeply practicing formal meditation, and learning formal meditation has very long feedback loops and requires months to make significant progress. Now, maybe the long feedback cycles are not why it takes months to make progress, and I can think of reasonable stories as to why that would be, I can also imagine finding ways to shorten feedback cycles would have made progress much faster. For example, when I’ve done biofeedback stuff in the past it only took 4 or 5 hours of sessions before I could make myself fall asleep at will (sadly I’ve forgotten how to do this), and I think it’s quite likely that it was helped a lot by having a computer telling me when I got a little closer to what I needed to do to make that happen and when I got a little farther away, such that I didn’t have to spend as much time guessing and waiting for strong evidence that I was doing the right thing before I could reliably train that ability and then go on to the next step.
While these objects may be unidentified, the idea that they are the products of aliens, a simulation, AI, or something else seems unlikely given the low quality of the evidence. In all cases I’m aware of evidence for something like this being the true origin of a UFO would have to overcome the more likely alternatives of
secret, experimental, or stealth aircraft, probably military, with advanced capabilities undisclosed to the public;
observational errors and instrumentation glitches;
misremembering, embellishment, and outright lying.
For a comparison, the literature on cryptids (claimed to be real but unobserved by science animals like bigfoot, the Loch Ness monster, and the chupacabra) is full of cases where the evidence looks pretty compelling...so long as we only look for evidence that confirms the hope that a cryptid exists. Perhaps sadly, there are no cryptid humanoids or sea monsters that we know of, and all evidence of them thus far collected is either best categorized as hoaxes, misidentifications, and hopeful misinterpretations or turned out to be evidence of real, undiscovered, and not fantastical animals.
The natural argument against this is of course that separation is an illusion. I don’t say that to sound mysterious, I mean that just in the simple way that everything is tangled up together, dependent on each other for its existence, and its only in our models that clean separation can exist, and then only by ignore some parts of reality in order to keep our models clean.
As a working programmer, I’m very familiar with the original context of the idea of separation of concerns, and I can also tell you that even there is never totally works. It’s a tool we use to help us poor humans who can’t fathom the total, complete, awesome complexity of the world to get along well enough anyway to collect a paycheck. Or something like that.
Relatedly, every abstraction is leaky, and if you think it isn’t you just haven’t looked hard enough.
None of that is to say we shouldn’t respect the separation of concerns when useful, but also that we shouldn’t elevate it more than it deserves to be, because the separation is a construction of our minds, not a natural feature of the world.