The reason I don’t personally find these kinds of representation super useful is because each of those boxes is a quite complicated function, and what’s in the boxes usually involves many more bits worth of information about an AI system than how the boxes are connected. And sometimes one makes different choices in how to chop an AI’s operation up into causally linked boxes, which can lead to an apples-and-oranges problem when comparing diagrams (for example, the diagrams you use for CIRL and IDI are very different choppings-up of the algorithms).
I actually have a draft sitting around of how one might represent value learning schemes with a hierarchical diagram of information flow. Eventually I decided that the idea made lots of sense for a few paradigm cases and was more trouble than it was worth for everything else. When you need to carefully refer to the text description to understand a diagram, that’s a sign that maybe you should use the text description.
This isn’t to say I think one should never see anything like this. Different ways of presenting the same information (like diagrams) can help drive home a particularly important point. But I am skeptical that there’s a one-size-fits-all solution, and instead think that diagram usage should be tailored to the particular point it’s intended to make.
I think I must be the odd one out here in terms of comfort using probabilities close to 1 and 0. Because 90% and 99% are not “near certainty” to me.
How sure are you that the English guy who you’ve been told helped invent calculus and did stuff with gravity and optics was called “Isaac Newton”? We’re talking about probabilities like 99.99999% here. (Conditioning on no dumb gotchas from human communication, e.g. me using a unicode character from a different language and claiming it’s no longer the same, which has suddenly become much more salient to you and me both. An “internal” probability, if you will.)
Maybe it would help to think of this as about 20 bits of information past 50%? Every bit of information you can specify about something means you are assigning a more extreme probability distribution about that thing. The probability of the answer being “Isaac Newton” has a very tiny prior for any given question, and only rises to 50% after lots of bits of information. And if you could get to 50%, it’s not strange that you could have quite a few more bits left over, before eventually running into the limits set by the reliability of your own brain.
So when you say some plans require near certainty, I’m not sure if you mean what I mean but chose smaller probabilities, or if you mean some somewhat different point about social norms about when numbers are big/small enough that we are allowed to stop/start worrying about them. Or maybe you mean a third thing about legibility and communicability that is correlated with probability but not identical?
“At the most basic level of description, things—the quantum fields or branes or whatever—just do what they do. They don’t do it nondeterministically, but they also don’t do it deterministically”
How do you know? If that claim isn’t based on a model, what is it based on?
I’m happy to reply that the message of my comment as a whole applies to this part—my claim about “what things do at a basic level of description” is a meta-model claim about what you can say about things at different levels of description.
It’s human nature to interpret this as a claim that things have some essence of “just doing what they do” that is a competitor to the essences of determinism and nondeterminism, but there is no such essence for the same reasons I’m already talking about in the comment. Maybe I could have worded it more carefully to prevent this reading, but I figure that would sacrifice more clarity than it gained.
The point is not about some “basic nature of things,” the point is about some “basic level of description.” We might imagine someone saying “I know there are some deterministic models of atoms and some nondeterministic models, but are the atoms really deterministic or not?” Where this “really” seems to mean some atheoretic direct understanding of the nature of atoms. My point, in short, is that atheoretic understanding is fruitless (“It’s just one damn thing after another”) and the instinct that says it’s desirable is misleading.
“much better explanation of wetness would be in terms of surface tension and intermolecular forces and so on”
Why? Because they are real properties?
Because they’re part of a detailed model of the world that helps tell a “functional and causal story” about the phenomenon. If I was going to badmouth one set of essences just to prop up another, I would have said so :P
You can explain away some properties in terms of others, but there is, going to be some residue
My point is that this residue is never going to be the “Real Properties,” they’re just going to be the same theory-laden properties as always.
What makes a theory of everything a theory of everything is not that it provides a final answer for which properties are the real properties that atoms have in some atheoretic direct way. It’s that it provides a useful framework in which we can understand all (literally all) sorts of stuff.
Now that I think about it, it’s a pretty big PR problem if I have to start every explanation of my value learning scheme with “humans don’t have actual preferences so the AI is just going to try to learn something adequate.” Maybe I should figure out a system of jargon such that I can say, in jargon, that the AI is learning peoples’ actual preferences, and it will correspond to what laypeople actually want from value learning.
I’m not sure whether such jargon would make actual technical thinking harder, though.
This is really similar to some stuff I’ve been thinking about, so I’ll be writing up a longer comment with more compare/contrast later.
But one thing really stood out to me—I think one can go farther in grappling with and taking advantage of “where UH lives.” UH doesn’t live inside the human, it lives in the AI’s model of the human. Humans aren’t idealized agents, they’re clusters of atoms, which means they don’t have preferences except after the sort of coarse-graining procedure you describe, and this coarse-graining procedure lives with a particular model of the human—it’s not inherent in the atoms.
This means that once you’ve specified a value learning procedure and human model, there is no residual “actual preferences” the AI can check itself against. The challenge was never to access our “actual preferences,” it was always to make a best effort to model humans as they want to be modeled. This is deeply counterintuitive (“What do you mean, the AI isn’t going to learn what humans’ actual preferences are?!”), but also liberating and motivating.
Interesting question! My answer is basically a long warning about essentialism: this question might seem like it’s stepping down from the realm of human models to the realm of actual things, to ask about the essence of those things. But I think better answers are going to come from stepping up from the realm of models to the realm of meta-models, to ask about the properties of models.
At the most basic level of description, things—the quantum fields or branes or whatever—just do what they do. They don’t do it nondeterministically, but they also don’t do it deterministically! Without recourse to human models, all us humans can say is that the things just do what they do—models are the things that make talk about categories possible in the first place.
Any answer of the sort “no, things can always be rendered into a deterministic form by treating ‘random’ results as fixed constants” or “yes, there are perfectly valid classes of models that include nondeterminism” is going to be an answer about models, within some meta-level framework. And that’s fine!
This can seem unsatisfying because it goes against our essentialist instinct—that the properties in our models should reflect the real properties that things have. If water is considered wet, it’s because water has the basic property or essence of wetness (so the instinct goes).
Note that this doesn’t explain any of the mechanics or physics of wetness. If you could look inside someone’s head as they were performing this essentialist maneuver, they would start with a model (“water is wet”), then they would notice their model (“I model water as wet”), then they would justify themselves to themselves, in a sort of reassuring pat on the back (“I model water as wet because water is really, specially wet”).
I think that this line of self-reassuring reasoning is flawed, and a much better explanation of wetness would be in terms of surface tension and intermolecular forces and so on—illuminating the functional and causal story behind our model of the world, rather than believing you’ve explained wetness in terms of “real wetness”. Also see the story about Bleggs and Rubes.
Long story short, any good explanation for why we should or shoudn’t have nondeterminism in a model is either going to be about how to choose good models, or it’s going to be a causal and functional story that doesn’t preserve nondeterminism (or determinism) as an essence.
I think there’s an interesting question about physics in whether or not (and how) we should include nondeterminism as an option in fundamental theories. But first I just wanted to warn that the question “models aside, are things really nondeterministic” is not going to have an interesting answer.
Basically, we have a mental model of logic the same way we have a mental model of geography. It’s useful to say that logical facts have referents for the same internal reason it’s useful to say that geographical facts have referents. But if you looked at a human from outside, the causal story behind logical facts vs. geographical facts would be different.
As a practicing quantum mechanic, I’d warn you against the claim that dialetheism is used in quantum computers. Qubits are sometimes described as taking “both states at the same time,” but that’s not precisely what’s going on, and people who actually work on quantum computers use a more precise understanding that doesn’t involve interpreting intermediate qubits as truth values.
There are also two people I wanted to see in your post: Russell and Gödel—mathematicians rather than philosophers. Russell’s type theory was one of the main attempts to eliminate paradoxes in mathematical logic. Gödel showed how that doesn’t quite work, but also showed how things become a lot clearer if you consider provability as well as truth value.
On Wednesdays at the Princeton Graduate College, various people would come in to give talks. The speakers were often interesting, and in the discussions after the talks we used to have a lot of fun. For instance, one guy in our school was very strongly anti-Catholic, so he passed out questions in advance for people to ask a religious speaker, and we gave the speaker a hard time.
Another time somebody gave a talk about poetry. He talked about the structure of the poem and the emotions that come with it; he divided everything up into certain kinds of classes. In the discussion that came afterwards, he said, “Isn’t that the same as in mathematics, Dr. Eisenhart?”
Dr. Eisenhart was the dean of the graduate school and a great professor of mathematics. He was also very clever. He said, “I’d like to know what Dick Feynman thinks about it in reference to theoretical physics.” He was always putting me on in this kind of situation.
I got up and said, “Yes, it’s very closely related. In theoretical physics, the analog of the word is the mathematical formula, the analog of the structure of the poem is the interrelationship of the theoretical bling-bling with the so-and so”—and I went through the whole thing, making a perfect analogy. The speaker’s eyes were beaming with happiness.
Then I said, “It seems to me that no matter what you say about poetry, I could find a way of making up an analog with any subject, just as I did for theoretical physics. I don’t consider such analogs meaningful.”
In the great big dining hall with stained-glass windows, where we always ate, in our steadily deteriorating academic gowns, Dean Eisenhart would begin each dinner by saying grace in Latin. After dinner he would often get up and make some announcements. One night Dr. Eisenhart got up and said, “Two weeks from now, a professor of psychology is coming to give a talk about hypnosis. Now, this professor thought it would be much better if we had a real demonstration of hypnosis instead of just talking about it. Therefore he would like some people to volunteer to be hypnotized.
I get all excited: There’s no question but that I’ve got to find out about hypnosis. This is going to be terrific!
Dean Eisenhart went on to say that it would be good if three or four people would volunteer so that the hypnotist could try them out first to see which ones would be able to be hypnotized, so he’d like to urge very much that we apply for this. (He’s wasting all this time, for God’s sake!)
Eisenhart was down at one end of the hall, and I was way down at the other end, in the back. There were hundreds of guys there. I knew that everybody was going to want to do this, and I was terrified that he wouldn’t see me because I was so far back. I just had to get in on this demonstration!
Finally Eisenhart said, “And so I would like to ask if there are going to be any volunteers …”
I raised my hand and shot out of my seat, screaming as loud as I could, to make sure that he would hear me: “MEEEEEEEEEEE!”
He heard me all right, because there wasn’t another soul. My voice reverberated throughout the hall—it was very embarrassing. Eisenhart’s immediate reaction was, “Yes, of course, I knew you would volunteer, Mr. Feynman, but I was wondering if there would be anybody else.”
Finally a few other guys volunteered, and a week before the demonstration the man came to practice on us, to see if any of us would be good for hypnosis. I knew about the phenomenon, but I didn’t know what it was like to be hypnotized.
He started to work on me and soon I got into a position where he said, “You can’t open your eyes.”
I said to myself, “I bet I could open my eyes, but I don’t want to disturb the situation: Let’s see how much further it goes.” It was an interesting situation: You’re only slightly fogged out, and although you’ve lost a little bit, you’re pretty sure you could open your eyes. But of course, you’re not opening your eyes, so in a sense you can’t do it.
He went through a lot of stuff and decided that I was pretty good.
When the real demonstration came he had us walk on stage, and he hypnotized us in front of the whole Princeton Graduate College. This time the effect was stronger; I guess I had learned how to become hypnotized. The hypnotist made various demonstrations, having me do things that I couldn’t normally do, and at the end he said that after I came out of hypnosis, instead of returning to my seat directly, which was the natural way to go, I would walk all the way around the room and go to my seat from the back.
All through the demonstration I was vaguely aware of what was going on, and cooperating with the things the hypnotist said, but this time I decided, “Damn it, enough is enough! I’m gonna go straight to my seat.”
When it was time to get up and go off the stage, I started to walk straight to my seat. But then an annoying feeling came over me: I felt so uncomfortable that I couldn’t continue. I walked all the way around the hall.
I was hypnotized in another situation some time later by a woman. While I was hypnotized she said, “I’m going to light a match, blow it out, and immediately touch the back of your hand with it. You will feel no pain.”
I thought, “Baloney!” She took a match, lit it, blew it out, and touched it to the back of my hand. It felt slightly warm. My eyes were closed throughout all of this, but I was thinking, “That’s easy. She lit one match, but touched a different match to my hand. There’s nothin’ to that; it’s a fake!”
When I came out of the hypnosis and looked at the back of my hand, I got the biggest surprise: There was a burn on the back of my hand. Soon a blister grew, and it never hurt at all, even when it broke.
So I found hypnosis to be a very interesting experience. All the time you’re saying to yourself, “I could do that, but I won’t”—which is just another way of saying that you can’t.
Surely You Must Be Joking Mr Feynman
You can’t talk sensibly about what values are right, or what we ‘should’ build into intelligent agents.
I agree that in our usual use of the word, it doesn’t make sense to talk about what (terminal) values are right.
But you agree that (within a certain level of abstraction and implied context) you can talk as if you should take certain actions? Like “you should try this dessert” is a sensible English sentence. So what about actions that impact intelligent agents?
Like, suppose there was a pill you could take that would make you want to kill your family. Should you take it? No, probably not. But now we’ve just expressed a preference about the values of an intelligent agent (yourself).
Modifying yourself to want bad things is wrong in the same sense that the bad things are wrong in the first place: they are wrong with respect to your current values, which are a thing we model you as having within a certain level of abstraction.
We have to separate colleges from public K-12 education. Colleges are the place where you hear about increasing numbers of non-teaching staff. K-12 actually has fewer administrators per student than 20 years ago (in most places).
Minimum message length fitting uses an approximation of K-complexity and gets used sometimes when people want to fit weird curves in a sort of principled way. But “real” Solomonoff induction is about feeding literally all of your sensory data into the algorithm to get predictions for the future, not just fitting curves.
So I guess I’d say that it’s possible to approximate K-complexity and use that in your prior for curve fitting, and people sometimes do that. But that’s not necessarily going to be your best estimate, because your best estimate is going to take into account all of the data you’ve already seen, which becomes impossible very quickly (even if you just want a controlled approximation).
Sure. The OP might more accurately have asked “How is the Solomonoff prior calculated?”
Yeah I think this is definitely a “stance” thing.
Take the use of natural selection and humans as examples of optimization and mesa-optimization—the entire concept of “natural selection” is a human-convenient way of describing a pattern in the universe. It’s approximately an optimizer, but in order to get rid of that “approximately” you have to reintroduce epicycles until your model is as complicated as a model of the world again. Humans aren’t optimizers either, that’s just a human-convenient way of describing humans.
More abstractly, the entire process of recognizing a mesa-optimizer—something that models the world and makes plans—is an act of stance-taking. Or Quinean radical translation or whatever. If a cat-recognizing neural net learns an attention mechanism that models the world of cats and makes plans, it’s not going to come with little labels on the neurons saying “these are my input-output interfaces, this is my model of the world, this is my planning algorithm.” It’s going to be some inscrutable little bit of linear algebra with suspiciously competent behavior.
Not only could this competent behavior be explained either by optimization or some variety of “rote behavior,” but the neurons don’t care about these boundaries and can occupy a continuum of possibilities between any two central examples. And worst of all, the same neurons might have multiple different useful ways of thinking about them, some of which are in terms of elements like “goals” and “search,” and others are in terms of the elements of rote behavior.
In light of this, the problem of mesa-optimizers is not “when will this bright line be crossed?” but “when will this simple model of the AI’s behavior be predictable useful?” Even though I think the first instinct is the opposite.
Nice post. I suspect you’ll still have to keep emphasizing that fuzziness can’t play the role of uncertainty in a human-modeling scheme (like CIRL), and is instead a way of resolving human behavior into a utility function framework. Assuming I read you correctly.
I think that there are some unspoken commitments that the framework of fuzziness makes for how to handle extrapolating irrational human behavior. If you represent fuzziness as a weighting over utility functions that gets aggregated linearly (i.e. into another utility function), this is useful for the AI making decisions but can’t be the same thing that you’re using to model human behavior, because humans are going to take actions that shouldn’t be modeled as utility maximization.
To bridge this gap from human behavior to utility function, what I’m interpreting you as implying is that you should represent human behavior in terms of a patchwork of utility functions. In the post you talk about frequencies in a simulation, where small perturbations might lead a human to care about the total or about the average. Rather than the AI creating a context-dependent model of the human, we’ve somehow taught it (this part might be non-obvious) that these small perturbations don’t matter, and should be “fuzzed over” to get a utility function that’s a weighted combination of the ones exhibited by the human.
But we could also imagine unrolling this as a frequency over time, where an irrational human sometimes takes the action that’s best for the total and other times takes the action that’s best for the average. Should a fuzzy-values AI represent this as the human acting according to different utility functions at different times, and then fuzzing over those utility functions to decide what is best?
An alternate framing could be about changing group boundaries rather than changing demographics in an isolated group.
There were surely people in 2010 who thought that the main risk from AI was it being used by bad people. The difference might not be that these people have popped into existence or only recently started talking—it’s that they’re inside the fence more than before.
And of course, reality is always complicated. One of the concerns in the “early LW” genre is value stability and self-trust under self-modification, which has nothing to do with sudden growth. And one of the “recent” genre concerns is arms races, which are predicated on people expecting sudden capability growth to give them a first mover advantage.
I would guess that people don’t actually compute the Nash equilibrium or expect other people to.
Instead, they use the same heuristic reasoning methods that they evolved to learn, and which have served them well in social situations their entire life, and expect other people to do the same.
I think we should expect these heuristics to be close to rational (not for the utilities of humans, but for the fitness of genes) in the ancestral environment. But there’s no particular reason to think they’re going to be rational by any standard in games chosen specifically because the Nash equilibrium is counterintuitive to humans.
If I may toot my own horn: https://www.lesswrong.com/posts/yex7E6oHXYL93Evq6/book-review-consciousness-explained
I’ll admit I’m not totally sure what Said Achmiz means by his comparison, though :)
Sure, but he also says “Since my expectations sometimes conflict with my subsequent experiences, I need different names for the thingies that determine my experimental predictions and the thingy that determines my experimental results. I call the former thingies ‘beliefs’, and the latter thingy ‘reality’.”
This key allows you to substitute in to his previous paragraph, to obtain statements in terms of predictions and experimental results that would be Sabine-approved.
If we think of the philosophical camps as realism, instrumentalism / pragmatism, and skepticism, the state of play seems to be less
“I am a realist, you are a skeptic, let’s argue,”
“I’m the true pragmatist!” “No, I’m the true pragmatist!”
I have now skimmed the previous thread, where you also quoted what I just quoted, but said Eliezer was just assuming that there was some thingie out there being reality. The alternative being, presumably, that our observations are not determined by anything that acts like an object with properties, and are instead brute facts.
But the first sentence (“since my expectation sometimes conflict...”) is precisely about how he’s not assuming an external reality, but instead advancing it as a hypothesis in order to explain observations. Maybe he’s not doing it the way you’d like—and maybe I as a biased reader will interpret that statement as a metaphor for something I expect, wheras you’d do the same but get a different result.
This also works on yourself. If your best model of yourself is as an agent making choices based on their beliefs, then you will seem to have free well to yourself.