To believe that you’re a one in a million case (e.g. in the first or last millionth of all humans), you need 20 bits of information (because 2^20 is about 1000000).
So on the one hand, 20 bits can be hard to get if the topic is hard to get reliable information about. But we regularly get more than 20 bits of information about all sorts of questions (reading this comment has probably given you more than 20 bits of information). So how hard this should “feel” depends heavily on how well we can translate our observational data into information about the future of humanity.
Extra note: In the case that there are an infinite number of humans, this uniform prior actually breaks down (or else naively you’d think you have a 0.0% chance of being anyone at all), so there can be a finite contribution from the possibility that there are infinite people.
People are bad at interpreting the Doomsday Argument, because people are bad at dealing with evidence as Bayesian evidence, rather than a direct statement of the correct belief.
The Doomsday Argument is evidence that we should update on. But it is not a direct statement of the correct belief.
On a parallel earth, humanity is on the decline. Some disaster has struck, and the once-billions of proud humanity have been reduced to a few scattered thousands. Now the last exiles of civilization hide in sealed habitats that they no longer have the supply chains to repair, and they know that soon enough the end will come for them too. But on the other hand, the philosophers among them remark, at least there’s the Doomsday Argument, which says that on average we should expect to be in the middle of humanity. So if the DA is right, the current crisis is merely a bottleneck in the middle of humanity’s time, and everything will probably work itself out any day now. The last philosopher dies after breathing in contaminated air, with the last words “No! The position I occupy is… very unlikely!”
Your eyes and ears also provide you evidence about the expected span of humanity.
Fun podcast. The analogy to human planning horizons was a very thought-provoking one. Though obviously, there are forces that explain the way things are; competition between different interests is a strong selection pressure for short-termism.
Is SIDLE not also a perfectly fine word? I don’t know how this went through peer review.
Anyhow, good newsletter this week, thanks :)
I almost agree, but still ended up disagreeing with a lot of your bullet points. Since reading your list was useful, I figured it would be worthwhile to just make a parallel list. ✓ for agreement, × for disagreement (• for neutral).
✓ I think we’re confused about what we really mean when we talk about human values.
× But our real problem is on the meta-level: we want to understand value learning so that we can build an AI that learns human values even without starting with a precise model waiting to be filled in.
_× We can trust AI to discover that structure for us even though we couldn’t verify the result, because the point isn’t getting the right answer, it’s having a trustworthy process.
_ × We can’t just write down the correct structure any more than we can just write down the correct content. We’re trying to translate a vague human concept into precise instructions for an AI
✓ Agree with extensional definition of values, and relevance to decision-making.
• Research on the content of human values may be useful information about what humans consider to be human values. I think research on the structure of human values is in much the same boat—information, not the final say.
✓ Agree about Stuart’s work being where you’d go to write down a precise set of preferences based on human preferences, and that the problems you mention are problems.
✓ Agree with assumptions.
• I think the basic model leaves out the fact that we’re changing levels of description.
_ × Merely causing events (in the physical level of description) is not sufficient to say we’re acting (in the agent level of description). We need some notion of “could have done something else,” which is an abstraction about agents, not something fundamentally physical.
_ × Similar quibbles apply to the other parts—there is no physically special decision process, we can only find one by changing our level of description of the world to one where we posit such a structure.
_ × The point: Everything in the basic model is a statistical regularity we can observe over the behavior of a physical system. You need a bit more nuanced way to place preferences and meta-preferences.
_ • The simple patch is to just say that there’s some level of description where the decision-generation process lives, and preferences live at a higher level of abstraction than that. Therefore preferences are emergent phenomena from the level of description the decision-generation process is on.
_ _ × But I think if one applies this patch, then it’s a big mistake to use loaded words like “values” to describe the inputs (all inputs?) to the decision-generation process, which are, after all, at a level of description below the level where we can talk about preferences. I think this conflicts with the extensive definitions from earlier.
× If we recognize that we’re talking about different levels of description, then preferences are not either causally after or causally before decisions-on-the-basic-model-level-of-abstraction. They’re regular patterns that we can use to model decisions at a slightly higher level of abstraction.
_ • How to describe self-aware agents at a low level of abstraction then? Well, time to put on our GEB hats. The low level of abstraction just has to include a computation of the model we would use on the higher level of abstraction.
✓ Despite all these disagreements, I think you’ve made a pretty good case that the human brain plausibly computes a single currency (valence) that it uses to rate both most decisions and most predictions.
_ × But I still don’t agree that this makes valence human values. I mean values in the sense of “the cluster we sometimes also point at with words like value, preference, affinity, taste, aesthetic, intention, and axiology.” So I don’t think we’re left with a neuroscience problem, I still think what we want the AI to learn is on that higher level of abstraction where preferences live.
It’s just a measure of how close the data is to the line—like the “inside view” uncertainty that the model has about the data. In fact, that’s more precisely what it is if this is the chi squared statistic (or square root thereof) that you minimized to fit the model. And it’s in nice convenient units that you can compare to other things.
It’s not quite right, because it uses an implicit prior about noise and models that doesn’t match your actual state of information. But it’s something that someone who’s currently reporting R^2 to us can do in 30 seconds in Excel.
This was not what I expected to learn today :) Alas, poor gonads, I hardly knew ye.
Well, I was skimming through Word and Object when I “became enlightened,” but it may have mostly been a catalyst. Still recommended though?
I don’t think l was very clear about what problem I was solving, and I don’t think you managed to read my mind, so let me try again.
The problem I was interested in was: how does reference work? How can I point at or verbally indicate some thingie, and actually be indicating the thingie in question? And could I program that into an AI?
In your post, you connect this to indexicals, which I’ve interpreted as a question like “how does reference work? How can I point at or verbally indicate some thingie, and actually be indicating that thingie, in a way that you could explain to a microscopic physics simulation?”
One of the key parts of the solution is that words don’t have inherent “aboutness” attached to them. Reference doesn’t make any sense if you just focus on the speaker and try to define the aboutness in their statements. It needs to be interpreted as communication, which uses some notion of a functional audience you’re constructing a message for.
So that question of “How do I verbally indicate the thing and really indicate it?” has to be left unanswered to the extent that we have false beliefs about our ability to “really indicate” things. Instead, I advocate breaking it down into questions about how you model other people and choose communicative acts.
So I am absolutely not saying we should replace “is x true?” with “is x a communicatively useful act?”. The closest thing I’m saying would be that we can cash out “what is the referent of sentence x?” into “what is the modeled audience getting pointed at by the act of saying sentence x?”.
I’m not sure how you’re interpreting physicalism here. But if we single out the notion that there should be some kind of “physics shorthand” for human concepts and references—like H2O is for water, or like the toy model of reference as passing numerical coordinates—then yeah, there is no physics shorthand. Where there is something like it, it is humans that have done the work to accomodate physics, not vice versa.
Yeah, I spent a lot of last year struggling with the reference thing. In the end I decided that reference was not fundamental even within the human-centered picture, and that reference was just a special case of communication (in the sense of Quine, Grice, et al.: I do a communicative act because I model you as modeling why I do it.)
Figuring this out made me a bit upset with academic philosophy, because I’d been looking through the recent literature fruitlessly before I found Quine basically solving the problem 50 years before. This is the opposite of the problem I usually pin on philosophy, that it’s too backward-looking. In this case, it’s more like the people talking about reference within the last 20 years are all self-selected for not caring about Quine much at all.
Whether or not you find this useful may depend on a certain mental maneuver of taking something you were asking a question about, and breaking it into pieces rather than answering the question. In this case, “How are the semantics of a sentence determined?” is a question, but rather than answering it I’m advocating getting rid of this high-level-of-abstraction word “semantics” by working in a more concrete level of description where there are humans with models of each other. And of course I’ve framed this in a very palatable way, but I think whether this maneuver feels good or not is a big dividing line—if you have the unshakeable feeling that I have missed something vital by not answering the original question, then you fall on the other side of the line—though perhaps one can still be lured over with practical applications.
It sure seems like if he really grokked the philosophical and technical challenge of getting a GAI agent to be net beneficial, he would write a different paper. That first challenge sort of overshadows the task of dividing up the post-singularity pie.
But I’m not sure whether the overshadowing is merely by being bigger (in which case this paper is still doing useful work), or if we should expect that solutions to the pie-dividing problems (e.g. weighing egalitarianism vs. utilitarianism) will necessarily fall out of the process that lets the AI learn how to behave well.
I’ll probably post a child comment after I actually read the article, but I want to note before I do that I think the power of ResNets are evidence against these claims. Having super-deep networks with residual connections promote a picture that looks much more like a continuous “massaging” of the data than a human-friendly decision tree.
Picking a descriptive statistic for these sorts of problems is pretty tricky. But I think we can do better than R^2, even without going all Bayesian-parameter-estimation.
What I mostly care about is just the standard deviation (in excel, STDEV.S() ) of the difference between the data and the model. Then I want to know how this compares to other scales in the data (like the average number of new cases per day).
Right. Some intuition is necessary. But a lot of these choices are ad hoc, by which I mean they aren’t strongly constrained by the result you want from them.
For example, you have a linear penalty governed by this parameter lambda, but in principle it could have been any old function—the only strong constraint is that you want it to monotonically increase from a finite number to infinity. Now, maybe this is fine, or maybe not. But I basically don’t have much trust for meditation in this sort of case, and would rather see explicit constraints that rule out more of the available space.
My very general concern is that strategies that maximize RAUP might be very… let’s say creative, and your claims are mostly relying on intuitive arguments for why those strategies won’t be bad for humans.
I don’t really buy the claim that if you’ve been able to patch each specific problem, we’ll soon reach a version with no problems—the exact same inductive argument you mention suggests that there will just be a series of problems, and patches, and then more problems with the patched version. Again, I worry that patches are based a lot on intuition.
For example, in the latest version, because you’re essentially dividing out by the long-term reward of taking the best action now, if the best action now is really really good, then it becomes cheap to take moderately good actions that still increase future reward—which means the agent is incentivized to concentrate the power of actions into specific timsteps. For example, an agent might be able to set things up so that it can sacrifice its ability to achieve total future reward of 1010 to make it cheap to take an action that increases its future reward by 108 . This might looks like sacrificing the ability to colonize distant galaxies in order to gain total control over the Milky Way.
For interesting stuff, two weeks to two months. Usually this is warranted, because ideas are cheap but filtering and thinking are hard. The ideal faster time mostly just means that ideally I’d be spending more hours per week on ideas, not that I’d be spending less time per idea.
After a bit more thought, I’ve learned that it’s hard to avoid ending back up with EU maximization—it basically happens as soon as you require that strategies be good not just on the true environment, but on some distribution of environments that reflect what we think we’re designing an agent for (or the agent’s initial state of knowledge about states of the world). And since this is such an effective tool at penalizing the “just pick the absolute best answer” strategy, it’s hard for me to avoid circling back to it.
Here’s one possible option, though: look for strategies that are too simple to encode the one best answer in the first place. If the absolute best policy has K-complexity of 10^3 (achievable in the real world by strategies being complicated, or in the multi-armed bandit case by just having 2^1000 possible actions) and your agent is only allowed to start with 10^2 symbols, this might make things interesting.
I like it! But you know, Northwest Passage is already written as a retrospective.
Three centuries thereafter, I take passage overlandIn the footsteps of brave Kelso, where his “sea of flowers” beganWatching cities rise before me, then behind me sink againThis tardiest explorer, driving hard across the plain.
And through the night, behind the wheel, the mileage clicking westI think upon Mackenzie, David Thompson and the restWho cracked the mountain ramparts and did show a path for meTo race the roaring Fraser to the sea.
Because the singer is modern, the chorus “Ah, for just one time / I would take the Northwest Passage” is about wishing to identify a lonely life with the grandeur of the past. A verse about the loss of the historical arctic would tie right back into this without needing to change the chorus a jot.