LW1.0 username Manfred. Day job is condensed matter physics, hobby is thinking I know how to assign anthropic probabilities.

# Charlie Steiner(Charlie Steiner)

# [Question] What to do with imitation humans, other than asking them what the right thing to do is?

it could be there are aspects of reality that are beyond the capacity of our brains.’ But that cannot be so. For if the ‘capacity’ in question is mere computational speed and amount of memory, then we can understand the aspects in question with the help of computers

I’m disagreeing with the notion, equivalent to taking turing completeness as understanding-universality, that the human capacity for understanding is the capacity for universal computation.

Turing completeness misses some important qualitative properties of what it means for people to understand something. When I understand something I don’t merely compute it, I form opinions about it, I fit it into a schema for thinking about the world, I have a representation of it in some latent space that allows it to be transformed in appropriate ways, etc.

I could, given a notebook of infinite size, infinite time, and lots of drugs, probably compute the Ackermann function A(5,5). But this has little to do with my ability to understand the result in the sense of being able to tell a story about the result to myself. In fact, there are things I can understand without actually computing, so long as I can form opinions about it, fit it into a picture of the world, represent it in a way that allows for transformations, etc.

One think I’d also ask about is: what about ecology / iterated games? I’m not very sure at all whether there are relevant iterated games here, so I’m curious what you think.

How about an ecology where there are both people and communities—the communities have different aggregation rules, and the people can join different communities. There’s some set of options that are chosen by the communities, but it’s the people who actually care about what option gets chosen and choose how to move between communities based on what happens with the options—the communities just choose their aggregation rule to get lots of people to join them.

How can we set up this game so that interesting behavior emerges? Well, people shouldn’t just seek out the community that most closely matches their own preferences, because then everyone would fracture into communities of size 1. Instead, there must be some benefit to being in a community. I have two ideas about this: one is that the people could care to some extent about what happens in all communities, so they will join a community if they think they can shift its preferences on the important things while conceding the unimportant things. Another is that there could be some crude advantage to being in a community that looks like a scaling term (monotonically increasing with community size) on how effective they are at satisfying their peoples’ preferences.

I’m curious about the comparison to drinking isopropyl alcohol (rubbing alcohol) instead, which is gradually metabolized into acetone (the actual psychoactive ingredient) inside the body. If you drink the same amount then gradual seems safer, but I’m not sure if it actually has a bigger difference between active dose and LD50 (or active dose and severe gastrointestinal inflammation).

Right, it’s a little tricky to specify exactly what this “relationship” is. Is the notion that you should be able to compress the approximate model, given an oracle for the code of the best one (i.e. that they share pieces?). Because most Turing machines don’t compress well, and so it’s easy to find counterexamples (the most straightforward class is where the approximate model is already extremely simple).

Anyhow, like I said, hard to capture the spirit of the problem. But when I *do* try to formalize the problem, it tends to not have the property, which is definitely driving my intuition.

If by “account for that” you mean not be in direct conflict with earlier sense data, then sure. All tautologies about the data will continue to be true. Suppose some data can be predicted by classical mechanics with 75% accuracy. This is a tautology given the data itself, and no future theory will somehow make classical mechanics stop giving 75% accurate predictions for that past data.

Maybe that’s all you meant?

I’d sort of interpreted you as asking questions about properties of the

*theory*. E.g. “this data is really well explained by the classical mechanics of point particles, therefore any future theory should have a particularly simple relationship to the point particle ontology.” It seems like there shouldn’t be a guaranteed relationship that’s much simpler than reconstructing the data and recomputing the inferred point particles.I spent a little while trying to phrase this in terms of Turing machines but I don’t think I quite managed to capture the spirit.

The answer to the question you actually asked is no, there is no ironclad guarantee of properties continuing, nor any guarantee that there will be a simple mapping between theories. With some effort you can construct some perverse Turing machines with bad behavior.

But the answer the more generalized question is yes, simple properties can be expected (in a probabilistic sense) to generalize even if the model is incomplete. This is basically Minimum Message Length prediction, which you can put on the theoretical basis of the Solomonoff prior (It’s somewhere in Li and Vitanyi—chapter 5?).

Looks like nobody showed up—must be because gathertown is actually sufficiently stable for use now.

Well, yes, it’s not a perfect summary. I have no idea why they’d say Popper was working on Bayesianism—unless maybe “the problem” in that clause was the problem of induction, and something got lost in an edit.

But sometimes nitpicks aren’t that important. Like, for example, it’s spelled Vitanyi. But this isn’t really a crushing refutation of your post (though it is a very convenient illustration). You shouldn’t sweat this too much, because their textbook really is worth reading about algorithmic information theory.

Actually, is it okay if I’m in charge of the Zoom call? I would like to set up one with different rooms and cohostify people, so it’s not everyone locked in together.

Could you defend worst-case reasoning a little more? Worst cases can be arbitrarily different from the average case—so maybe having worst-case guarantees can be reassuring, but actually choosing policies by explicit reference to the worst case seems suspicious. (In the human context, we might suppose that worst case, I have a stroke in the next few seconds and die. But I’m not in the business of picking policies by how they do in that case.)

You might say “we don’t have an average case,” but if there are possible hypotheses outside your considered space you don’t have the worst case either—the problem of estimating a property of a non-realizable hypothesis space is simplified, but not gone.

Anyhow, still looking forward to working my way through this series :)

Well, first off, Pearl would remind you that reduction doesn’t have to mean probability distributions. If Markov models are simple explanations of our observations, then what’s the problem with using them?

The surface-level answer to your question would be to talk about how to interconvert between causal graphs and probabilities, thereby identifying any function on causal graphs (like setting the value of a node without updating its parents) with an operator on probability distributions (given the graphical model). Note that in common syntax, “conditioning” on do()-ing something means applying the operator to the probability distribution. But you can google this or find it in Pearl’s book Causality.

I’d just like you to think more about what you want from an “explanation.” What is it you want to know that would make things feel explained?

Yup. Humans have a sort of useful insanity, where they can expect things to be bad not based on explicitly evaluating the consequences, but off of a model or heurstic about what to expect from different strategies. And then we somehow only apply this reasoning selectively, where it seems appropriate according to even more heuristics.

I’d rather frame this as good news. The good news is that if you want to learn about Solomonoff induction, the entire first half-and-a-bit of the book is a really excellent resource. It’s like if someone directed you to a mountain of pennies. Yes, you aren’t going to be able to take this mountain of pennies home anytime soon, and that might feel awkward, but it’s not like you’d be materially better off if the mountain was smaller.

If you just want the one-sentence answer, it’s as above—“X or Y” is not a Turing machine. If you want to be able to look the whole edifice over on your own, though, it really will take 200+ pages of work (it took me about 3 months of reading on the train) - starting with prefix-free codes and Kolmogorov complexity, and moving on to sequence prediction and basic Solomonoff induction and the proofs of its nice properties. Then you can get more applied stuff like thinking about how to encode what you actually want to ask in terms of Solomonoff induction, minimum message length prediction and other bounds that hold even if you’re not a hypercomputer, and the universal prior and the proofs that it retains the nice properties of basic Solomonoff induction.

Details can be found in the excellent textbook by Li and Vitanyi.

In this context, “hypothesis” means a computer program that predicts your past experience and then goes on to make a specific prediction about the future.

“X or Y” is not such a computer program—it’s a logical abstraction about computer programs.

Now, one might take two programs that have the same output, and then construct another program that is sorta like “X or Y” that runs both X and Y and then reports only one of their outputs by some pseudo-random process. In which case it might be important to you to know about how you can construct Solomonoff induction using only the shortest program that produces each unique prediction.

All of these “video chat but in 2d space” websites have had serious problems for me. My preference would just be Zoom breakout rooms with thematic names, honestly. Not sure what the average experience has been.

I think it’s absolutely feasible, but my idea of what a solution looks like is probably in a minority (if I had to guess, maybe of ~30%?)

All you have to do is understand what it is you mean by the AI fulfilling human values, in a way that can be implemented in the architecture and training procedure of a prosaic AI. Easy peasy, lemon squeezy.

The majority of other feasible-ers is mostly dominated by Paulians right now, who want to solve the problem without having to understand that complicated human values thing. Typically by trusting in humans and giving them big awesome planning powers, or using their oversight and feedback to choose good things.

That’s a good point. It’s still not clear to me that he’s talking about precisely the same thing in both quotes. The point also remains that if you’re not associating “understanding” with a class as broad as turing-completeness, then you can construct things that humans can’t understand, e.g. by hiding them in complex patterns, or by using human blind spots.