I think using ellipses only really gets you good mileage once you have the planets moving around the sun. If, like Aristotle, you have the planets moving around the earth, then epicycles are just being a very general way of representing periodic motion phenomenologically.
Thanks for this! Definitely some themes that are in the zeitgeist right now for whatever reason.
One thing I’ll have to think about more is the idea of natural limits (e.g. the human stomach’s capacity for tasty food) as a critical part of “human values,” that keeps them from exhibiting abstractly bad properties like monomania. At first glance one might think of this as an argument for taking abstract properties (meta-values) seriously, or taking actual human behavior (which automatically includes physical constraints) seriously, but it might also be regarded as an example of where human values are indeterminate when we go outside the everyday regime. If someone wants to get surgery to make their stomach 1000x bigger (or whatever), and this changes the abstract properties of their behavior, maybe we shouldn’t forbid this a priori.
In other words, “real preferences” are a functional part of a larger model of humans that supports counterfactual reasoning, and if you want to infer the preferences, you should also make sure that your larger model is a good model of humans. (Where “good” doesn’t just mean highly predictive, it includes some other criteria that involve making talking a bout preferences a good idea, and maybe not deviating too far from our intuitive model).
Comment status: long.
Before talking about your (quite fun) post, I first want to point out a failure mode exemplified by Scott’s “The View From Ground Level.” Here’s how he gets into trouble (or begins trolling): first he is confused about consciousness. Then he postulates a unified thing—“consciousness” proper—that he’s confused about. Finally he makes an argument that manipulates this thing as if it were a substance or essence. These sorts of arguments never work. Just because there’s a cloud, doesn’t mean that there’s a thing inside the cloud precisely shaped like the area obscured by the cloud.
Okay, on to my reaction to this post.
When trying to ground weird questions about point-of-view and information, one useful question is “what would a Solomonoff inductor think?” The really short version of why we can take advice from a Solomonoff inductor is that there is no such thing as a uniform prior over everything—if you try to put a uniform prior over everything, you’re trying to assign each hypothesis a probability of 1/infinity, which is zero, which is not a good probability to give everything. (You can play tricks that effectively involve canceling out this infinite entropy with some source of infinite information, but let’s stick to the finite-information world). To have a probability distribution over infinite hypotheses, you need to play favorites. And this sounds a lot like Solomonoff’s “hypotheses that are simple to encode for some universal Turing machine should be higher on the list.”
So what would a Solomonoff inductor think about themselves? Do they think they’re the “naive encoding,” straightforwardly controlling a body in some hypothesized “real world?” Or are they one of the infinitely many “latent encodings,” where the real world isn’t what it seems and the inductor’s perceptions are instead generated by some complicated mapping from the state of the world to the memories of the inductor?
The answer is that the Solomonoff inductor prefers the naive encoding. We’re pretty sure my memories are (relatively) simple to explain if you hypothesize my physical body. But if you hypothesize that my memories are encoded in the spray from a waterfall, the size of the Turing machine required to translate waterfall-spray into my memories gets really big. One of the features of Solomonoff inductors that’s vital to their nice properties is that hypotheses become more unlikely faster than they become more numerous. There are an infinite number of ways that my memories might be encoded in a waterfall, or in the left foot of George Clooney, or even in my own brain. But arranged in order of complexity of the encoding, these infinite possibilities get exponentially unlikely, so that their sum remains small.
So the naive encoding comes out unscathed when it comes to myself. But what about other people? Here I agree the truth has to be unintuitive, but I’d be a bit more eliminitavist than you. You say “all those experiences exist,” I’d say “in that sense, none of them exist.”
From the point of view of the Solomonoff inductor, there is just the real world hypothesized to explain our data. Other people are just things in the real world. We presume that they exist because that presumption has explanatory power.
You might say that the Solomonoff inductor is being hypocritical here. It assumes that my body has some special bridging law to some sort of immaterial soul, some Real Self that is doing the Solomonoff-inducting, but it doesn’t extend that assumption to other people. To be cosmopolitan, you’d say, we should speculate about the bridging laws that might connect experiences to our world like hairs on a supremely shaggy dog.
I’d say that maybe this is the point where me and the Solomonoff inductor part ways, because I don’t think I actually have an immaterial soul, it’s just a useful perspective to take sometimes. I’d like to think I’m actually doing some kind of naturalized induction that we don’t quite know how to formalize yet, that allows for the fact that the thing doing the inducting might actually be part of the real world, not floating outside it, attached only by an umbilical cord.
I don’t just care about people because I think they have bridging laws that connect them to their Real Experiences; any hypotheses about Real Experiences in my description of the world are merely convenient fictions that could be disposed of if only I was Laplace’s demon.
I think that in the ultimate generalization of how we care about things, the one that works even when all the weirdnesses of the world are allowed, things that are fictional will not be made fundamental. Which is to say, the reason I don’t care about all the encodings of me that could be squeezed into every mundane object I encounter isn’t because they all cancel out by some phenomenal symmetry argument, it’s because I don’t care about those encodings at all. They are, in some deep sense, so weird I don’t care about them, and I think that such a gradient that fades off into indifference is a fundamental part of any realistic account of what physical systems we care about.
One further issue is that if the AI deduces this within one human-model (as in CIRL), it may follow this model off a metaphorical cliff when trying to maximize modeled reward.
Merely expanding the family of models isn’t enough because the best-predicting model is something like a microscopic, non-intentional model of the human. A “nearest unblocked model” problem. The solution should be similar—get the AI to score models so that the sort of model we want it to use is scored highly. (Or perhaps more complicated where human morality is undefined.) This isn’t just a prior—we want predictive quality to only be one of several (as yet ill-defined) criteria.
Train postrationality by commenting on Tumblr. By figuring out how Donald Trump’s latest move was genius. By living a virtuous life. By defecting in a Prisoner’s Dilemma against yourself. By starting your own political campaign. By reading Kierkegaard. By regretting defecting against yourself in the Prisoner’s Dilemma and finding a higher power to punish you for it. By humming “The Ballad of Big Yud” to yourself in the shower. By becoming a Scientologist for 89 days and getting your money back with the 90-day money-back guarantee.
If GPT2 was from the mod team, 5⁄10, with mod tools we could have upped the absurdity game a lot. If it was an independent effort, 8⁄10, you got me :)
Sure. It describes how humans aren’t robust to distributional shift.
I hope so! IRL and CIRL are really nice frameworks for learning from general behavior, and as far as I can tell, learning from verbal behavior requires a simultaneous model of verbal and general behavior, with some extra parts that I don’t understand yet.
I mostly agree, though you can really tell me we have the right answer once we can program it into a computer :) Human introspection is good at producing verbal behavior, but is less good at giving you a utility function on states of the universe. Part of the problem is that it’s not like we have “a part of ourselves that does introspection” like it’s some kind of orb inside our skulls—breaking human cognition into parts like that is yet another abstraction that has some free parameters to it.
Does it seem clear to you that if you model a human as a somewhat complicated thermostat (perhaps making decisions according to some kind of flowchart) then you aren’t going to predict that a human would write a post about humans being somewhat complicated thermostats?
Is my flowchart model complicated enough to emulate a RNN? Then I’m not sure.
Or one might imagine a model that has psychological parts, but distributes the function fulfilled by “wants” in an agent model among several different pieces, which might conflict or reinforce each other depending on context. This model could reproduce human verbal behavior about “wanting” with no actual component in the model that formalizes wanting.
If this kind of model works well, it’s a counterexample (less compute-intensive than a microphysical model) of the idea I think you’re gesturing towards, which is that the data really privileges models in which there’s an agent-like formalization of wanting.
Person A isn’t getting it quite right :P Humans want things, in the usual sense that “humans want things” indicates a useful class of models I use to predict humans. But they don’t Really Want things, the sort of essential Wanting that requires a unique, privileged function from a physical state of the human to the things Wanted.
So here’s the dialogue with A’s views more of an insert of my own:
A: Humans aren’t agents, by which I mean that humans don’t Really Want things. It would be bad to make an AI that assumes they do.
B: What do you mean by “bad”?
A: I mean that there wouldn’t be such a privileged Want for the AI to find in humans—humans want things, but can be modeled as wanting different things depending on the environment and level of detail of the model.
B: No, I mean how could you cash out “bad” if not in terms of what you Really Want?
A: Just in terms of what I regular, contingently want—how I’m modeling myself right now.
B: But isn’t that a privileged model that the AI could figure out and then use to locate your wants? And since these wants so naturally privileged, wouldn’t that make them what you Really Want?
A: The AI could do something like that, but I don’t like to think of that as finding out what I Really Want. The result isn’t going to be truly unique because I use multiple models of myself, and they’re all vague and fallible. And maybe more importantly, programming an AI to understand me “on my own terms” faces a lot of difficult challenges that don’t make sense if you think the goal is just to translate what I Really Want into the AI’s internal ontology.
B: Like what?
A: You remember the Bay Area train analogy at the end of The Tails Coming Apart as Metaphor for Life? When the train lines diverge, thinking of the problem as “figure out what train we Really Wanted” doesn’t help, and might divert people from the possible solutions, which are going to be contingent and sometimes messy.
B: But eventually you actually do follow one of the train lines, or program it into the AI, which uniquely specifies that as what you Really Want! Problem solved.
A: “Whatever I do is what I wanted to do” doesn’t help you make choices, though.
Could you elaborate on what you mean by “if your model of humans is generative enough to generate itself, then it will assign agency to at least some humans?” I think the obvious extreme is a detailed microscopic model that reproduces human behavior without using the intentional stance—is this a model that doesn’t generate itself, or is this a model that assigns agency to some humans?
It seems to me that you’re relying on the verb “generate” here to involve some sort of human intentionality, maybe? But the argument of this post is that our intentionality is inexact and doesn’t suffice.
Suppose you are building an AI and want something from it. Then you are an agent with respect to that thing, since you want it.
There’s wanting, and then there’s Wanting. The AI’s model of me isn’t going to regenerate my Real Wanting, which requires the Essence of True Desire. It’s only going to regenerate the fact that I can be modeled as wanting the thing. But I can be modeled as wanting lots of things, is the entire point.
This has prompted me to get off my butt and start publishing the more useful bits of what I’ve been thinking about. Long story short, I disagree with you while still almost entirely agreeing with you.
This isn’t really the full explanation of why I think the AI can’t just be given a human model and told to fill it in, though. For starters, there’s also the issue about whether the human model should “live” in the AI’s native ontology, or whether it should live in its own separate, “fictional” ontology.
I’ve become more convinced of the latter—that if you tell the AI to figure out “human values” in a model that’s interacting with whatever its best-predicting ontology is, it will come up with values that include things as strange as “Charlie wants to emit CO2″ (though not necessarily in the same direction). Instead, its model of my values might need to be described in a special ontology in which human-level concepts are simple but the AI’s overall predictions are worse, in order for a predictive human model to actually contain what I’d consider to be my values.
Sure. And my comment is more aimed at the audience than at Richard—I don’t know him, and I agree that reducing stress can help, and can help more the more you’re stressed. Maybe some parts of his story seem like they could also fit with a story of injury and healing (did you know that wrists feeling strange, swollen or painful at night or after other long periods of stillness can be because of reduced flow of lymph fluid through inflamed wrists?), but they could also fit with his story of stress. I think this is one of those posts that has novelty precisely because the common view is actually right most of the time, and my past self probably needed to take the common view into account more.
You say ” there would be an epidemic of wrist pain at typing-heavy workplaces” as if there isn’t a ton of wrist pain at typing-heavy workplaces. And, like, funny how stress is making your wrists hurt rather than your toes or elbows, right?
I think, as one grows old, one gets a better sense that the human body just breaks down sometimes, and doesn’t repair itself perfectly. Those horribly injured solders you bring up probably had aches and pains sometimes for the rest of their life that they never really talked about, because other people wouldn’t understand. My mom has pain in her left foot sometimes from where she broke it 40 years ago. And eventually, our bodies will just accumulate injuries more and more until we die.
If you have pain that you think is due to wrist inflammation, check out the literature and take action to the degree you can. The mind can control pain quite well, and the human body is tough, but if you do manage to injure yourself you’ll regret it.
Definitely depends on the field. For experimental papers in the field I’m already in, it only takes like half an hour, and then following up on the references for things I need to know the context for takes an additional 0.5-2 hours. For theory papers 1-4 hours is more typical.
Sure. “If it’s smart, it won’t make simple mistakes.” But I’m also interested in the question of whether, given the first few in this sequence of approximate agents, one could do a good job at predicting the next one.
It seems like you could—like there is a simple rule governing these systems (“check whether there’s a human in the greenhouse”) that might involve difficult interaction with the world in practice but is much more straightforward when considered from the omniscient third-person view of imagination. And given that this rule is (arguendo) simple within a fairly natural (though not by any means unique) model of the world, and that it helps predict the sequence, one might be able to guess that this rule was likely just from looking at the sequence of systems.
(This also relies on the distinction between just trying to find likely or good-enough answers, and the AI doing search to find weird corner cases. The inferred next step in the sequence might be expected to give similar likely answers, with no similar guarantee for corner-case answers.)
Is this contra https://www.lesswrong.com/posts/aNaP8eCiKW7wZxpFt/philosophy-as-low-energy-approximation ?
To repeat my example from there, to understand superconductivity it doesn’t help much to smash them into their components, even though it helps a lot for understanding atoms. A non-philosophical example from your list where people went to the “extremist” view for a little too long might be mental health before the rise of positive psychology.
Minor nitpick: diamond is only metastable, especially at high temperatures. It will slowly turn to graphite. After sufficient space travel, all diamond parts will be graphite parts.