Can you help me with generalization of probabilistic reasoning?

I need a generalization of probabilistic inference for “anti-inductive” reasoning.

This post is going to discuss anti-inductive belief updating, perception, learning and reasoning (about hypotheses). Plus evidence of “anti-induction” in psychology. I think anti-induction may help to explain how humans reason.

The overall point/conclusion of the post is described here.

Weird worlds

To see the difference between inductive and anti-inductive probabilistic reasoning, let’s visit “weird worlds”.

In weird worlds important properties are “shared” between objects. I want to give an example. Let’s say you’re a tall person. In the normal world, your height doesn’t tell much about my height. Our heights are not related.

However, in the weird world you being tall may imply that:

I’m short. Because you already “occupied” the property of being tall.
I’m tall too. Because “height” is shared between people. Your height is my height.

I want to give more examples to show the difference between reasoning methods in both worlds.

1.

Imagine two worlds. In the weird world, the property of intelligence is “shared” among beings. In the normal world, it is not “shared”.

You have a question “Are there many beings smarter than humans?”.

Then you encounter some beings much smarter than humans.

In the normal world, you update towards answering “yes” using the Bayes’ rule.

However, in the weird world something else may happen:

You update towards “no” because if those beings are so smart, then humans are smarter than it seems. (intelligence is shared among beings)
You update towards “no” because if those beings are so smart, it means there’s not a lot of intelligence left for other beings. (intelligence is limited)
You hypothesize two categories of beings, A (normal) and B (super-smart). For category A you don’t make any updates. For category B you make an update towards “yes”.

In the weird world, if you encounter “too much” positive evidence, the evidence may flip into negative evidence.

2.

Imagine two worlds. In the weird world, the wealth is “shared” among beautiful people. In the normal world, it is not “shared”.

You have a question “Is beauty related to big wealth?”. You don’t have any information about anyone yet.

Then you encounter someone very beautiful and very wealthy.

In the normal world, you weakly update towards answering “yes” using the Bayes’ rule.

However, in the weird world something else may happen:

You update towards “no” because if this beautiful person is so wealthy, then there’s not a lot of wealth left for other beautiful people. (wealth is limited)
You strongly update towards “yes”, because if this person is so wealthy, then other beautiful people should be wealthy too. (wealth is shared)
You hypothesize two categories of beautiful people, A (normal) and B (super-wealthy). For category A you don’t make any updates. For category B you make an extremely strong update towards “yes”: because everyone in the category B is very wealthy “by definition”.
You update towards this person not being beautiful: because if they were, they would have too much wealth compared to other beautiful people.

Why do you call it a “generalization”?

You may be asking “All of this could be explained by the classic Bayes’ rule, by the classic probability, why do you call it a generalization of probabilistic reasoning?”

Because I’m not talking about math, I’m talking about reasoning methods.

You may see the normal world as a limiting case of the weird world.

The weird world adds more correlations between facts and events. If you delete all those extra correlations, you get the normal world. The weird world is a “generalization” of the normal one.

So, the reasoning method used in the weird world is a generalization of the normal reasoning method. It’s not a statement about math. However, it should have some implications for math. Because we’re dealing with objectively different world models with objectively different predictions. Of course some math is gonna be different.

Something that without context looks like “just a different model/set of assumptions”, with context may count as a conceptual generalization. (Judging the difference between ideas only by the difference in math descriptions may be misleading.)

Perception

Deep Dream

But why do we need to think about the weird world? The weird world doesn’t exist, right?

I think the weird world may exist in the mechanism of our perception. Do you remember DeepDream images?

Imagine you’re given a DeepDream image. Everything on the image is turned into dogs. You need to figure out what was the original image. (At least very vaguely.) Did the original image have dogs? How many? Where is the real dog on the image?

To answers such questions you may adopt heuristics that sound like this:

Given that I see a big dog, I’m less likely to see very small dogs. (= they are more likely to be artifacts of the DeepDream process) Example.
Given that I see a dog on the ground, I’m less likely to see a dog in the air.
Given that I see a dog with a certain property, I’m less likely to see another dog with a completely different property.
Given that I see a bunch of dogs, I’m less likely to see even more dogs.

But such heuristics are identical to the rules of reasoning in the weird world. And such heuristics may be used in the normal perception too.

By the way, notice a cool thing: you can combine those heuristics to get even stronger heuristics. E.g. “if I see a big dog on the ground, I’m extremely unlikely to see a lot of very small dogs in the air”.

Induction vs. anti-induction

So, the normal world reasoning is inductive and the weird world reasoning is “anti-inductive”. In the normal world, if you see something, you’re more likely to see it again. In the weird world, if you see something, you’re less likely to see it again.

But “anti-inductive” reasoning can make sense in the normal world too. For example, if you’re dealing with data transformed into overused and unreliable representations. Like the DeepDream process which represents everything as a dog, making this representation overused and unreliable. What are other examples of such representations?

Human language. Each vague concept is an overused and unreliable representation of the real world.

Human thinking. Our thought process can produce tons of analogies and notice tons of patterns. But this ability is “overused” and unreliable. See pareidolia, apophenia.

It would be crazy to combine unreliable data with induction: you’d get the absolute confirmation bias. So, when you’re thinking in terms of natural language or intuitive pattern-matching, maybe you should adopt anti-inductive reasoning. The more you see a pattern, the stronger you have to doubt it.

So, I believe that most of human reasoning is actually based on the “weird world logic”. It’s just hard to notice.

Learning

The hidden assumption

I think this hidden assumption simplifies human perception:

Locally, a concept is unlikely to have more than ¹⁄₂ of all available properties.

For example, imagine you want to explain the difference between plants and animals to an alien. Those facts might help you:

Usually, animals and plants exist in different size categories. Have different shapes.
Usually, animals and plants occupy different places of the scene.
Usually, animals and plants make different movements.
Usually, animals and plants have different colors. Different lighting.

The rest are specific details. The hidden assumption does 80% of the work of distinguishing plants from animals. Also, check out Bouba/kiki effect:

The effect has also been shown to emerge in other contexts, for example: when words are paired with evaluative meanings (with “bouba” words associated with positive concepts and “kiki” words associated with negative concepts);[8] or when the words to be paired are existing first names; suggesting that some familiarity with the linguistic stimuli does not eliminate the effect. A study showed that individuals will pair names such as “Molly” with round silhouettes, and names such as “Kate” with sharp silhouettes. Moreover, individuals will associate different personality traits with either group of names (e.g., easygoingness with “round names”; determination with “sharp names”). This may hint at a role of abstract concepts in the effect.[9]

This may be the hidden assumption at work. If we discuss two concepts, all possible properties split equally between those concepts. The concepts polarize.

(Can it be connected to Zipf’s law? What if we measure distributions of properties between concepts in different situations, contexts?)

By the way, I think a similar assumption simplifies human reasoning, argumentation.

Very strong tests

I think anti-inductive heuristics have an interesting property: you can combine them to get much stronger heuristics. At first this seems trivial. Because normally, if you combine a couple of probabilistic tests, you do get a stronger test. This is nothing unusual.

But tests may affect each other (correlate), so the combined test becomes weak or extra strong. I want to say that combined anti-inductive heuristics give you extra strong tests.

Let’s give an example. First, let’s analyze some normal heuristics. Given that you’re in a city:

It’s unlikely to see a completely red house.
It’s unlikely to see a house floating in the air.
But is it more unlikely to see a completely red floating house than a normal-looking floating house? Yes, maybe it’s a little bit less likely. But maybe it’s more likely: why would a normal house float in the air? You don’t know if the red color and floating are correlated.

Now, let’s analyze some anti-inductive heuristics (see the “Deep Dream world” part of the post). Given that you see a big dog on the ground:

It’s unlikely to see a lot of very small dogs.
It’s unlikely to see a dog in the air.
It’s almost 100% impossible to see a lot of very small dogs in the air. This situation never happens, even though (1) and (2) happen once or twice in a year. This picture is seen once in a lifetime.

Anti-inductive arguments give you exponentially stronger combinations. Why does this happen? I can’t give a formal explanation, but informal explanation is this: in anti-inductive reasoning, your hypotheses are almost indistinguishable from your certain knowledge about the world. So, your hypotheses are stronger.

I think humans have such strong beliefs because their reasoning is based on anti-inductive heuristics (albeit in a wrong way at times). I think a single human doesn’t have access to enough information to use classical induction. I think neural networks (e.g. AlphaZero) use classical induction because they can brute-force tons of data (e.g. play 44 million games of chess).

Important analogy: see Viola–Jones object detection framework. It distinguishes a face from a non-face via a combination of the simplest tests. Anti-inductive reasoning is like this, but applied on high level of reasoning.

Reasoning pt. 1

Puzzle pieces

Imagine that opinions are made of “puzzle pieces”. Given certain puzzle pieces, some opinions are easier to construct and others are harder to construct. This complexity affects probability.

Moral and scientific progress is not simply about changing opinions. It’s about changing the underlying puzzle pieces and making some opinions a priori less likely.

Today nobody likes slavery not simply because society changed opinion. Today we think about people and experiences using more individualistic and discrete concepts. With those concepts it’s impossible to imagine people as a faceless monolith in a system. The value of “equality” is a simple consequence of our concepts becoming finer. And the idea of trying to fulfill every smallest preference wouldn’t occur to us without the finer concepts.

Early theories of combustion are not simply “proved false” today: today they are impossible in the first place. Now we know that the true theory of combustion should (ultimately) lead to explaining not only combustion, but all properties of all matter using a single simple process (exchanges of matter, the motion of atoms). The true theory has completely different conceptual properties compared to the phlogiston theory, exists on a different level of reality. Lavoisier didn’t believe in phlogiston because he felt that everything should be described in terms of exchanges. Lomonosov didn’t believe in phlogiston because he already anticipated atoms. Those guys were on a different page.

Unlike in the Bayesian model, we don’t get slightly swayed by evidence, we brutally destroy previous theories and delete them out of existence.

The puzzle pieces are our core reasons for having a belief. They are our gut feelings. They are “anti-inductive heuristics”. They are more complicated than “priors”. They are less visible than values, principles and intuitions. They can control learning like hyperparameters do.

“How cognition could work”

The idea of puzzle pieces suggests this model of cognition:

You see a situation, a theory or an argument.
You try to find a combination of puzzle pieces that would describe it. The combination gives you a probability. (By the way, you can modify and create puzzle pieces while you think.)
To get a definite opinion, you try to find the most extreme combination, the most possible/impossible one.

In this model, belief updating is a creative process which may involve “reasoning backwards in time”, i.e. modifying the justifications of your past opinions.

The important (novel) part is that by stacking just a few puzzle pieces you can get extremely strong heuristics.

Above we analyzed this example:

Situation: picture.
Puzzle pieces: “an object type is unlikely to have very different values of a property”, “an object type is extremely unlikely to have multiple very different properties”, “dogs are unlikely to have very different sizes” and “dogs are unlikely to be both in the sky and on the ground”.
Conclusion: the situation on the picture is almost 100% impossible (rare), even if it’s a fake image.

Viola–Jones object detection framework does a very similar thing, but on the lowest level.

Reasoning pt. 2

Spectrums of hypotheses

Anti-inductive reasoning is also trivially true in the normal world if you treat hypotheses as “spectrums”. You can treat a hypothesis as a “spectrum” of different versions of itself. Or you can split reality into multiple levels and see if a hypothesis is true or false on each level.

Not a single hypothesis can be true on all levels of reality. And not a single hypothesis can be false on all levels of reality.

So, you can’t apply induction to versions of an idea. If “version 1”, “version 2″, “version 3” are all likely to be true, it doesn’t mean that the “version 1 + N” is likely to be true too. On the contrary, anti-induction may apply here: the more versions of an idea are true, the less likely the next version is to be true. And the same should work backwards, which is funny: the more versions of an idea are false, the more likely the next version is to be true.

I want to give a particular example of this.

Vitalism

He describes in detail the circulatory, lymphatic, respiratory, digestive, endocrine, nervous, and sensory systems in a wide variety of animals but explains that the presence of a soul makes each organism an indivisible whole. He claimed that the behaviour of light and sound waves showed that living organisms possessed a life-energy for which physical laws could never fully account.

https://en.wikipedia.org/wiki/Vitalism#19th_century

There was an idea that living beings are something really special. Beyond physical laws. This idea was disproved in causal terms: there is no non-physical stuff that causes living beings. However, in non-causal terms this idea isn’t so bad. Let’s investigate how it holds up today.

“Science is too weak to describe life.”

Partially disproved: Science described living beings.
Partially proved: Science of the time lacked a lot of conceptual breakthroughs to describe living beings. Life wasn’t explained by a smart application of already existing knowledge and concepts. It required coming up with new stuff. Ironically, life did turn out to be literally “beyond” the (classical) laws of physics, in the Quantum Realm.

Conclusion: Science turned out to be able to describe life, but the description turned out to be way more complex than it could be.

Imagine that 19th century people make bets on the question “Does Science solve life or not?” and attach their reasons for believing it. For example, “Science doesn’t solve life because (X)”. Then we ignore the beliefs of people and evaluate only their reasons. Most people with the simplest reasons for believing “Science solves life” would lose. Most people with the simplest reasons for believing “Science doesn’t solve life” would win. And many people who won with believing “Science solves life” would still be waiting many of the actual explanations.

“Life can’t be created from non-living matter.”

Partially disproved.
Partially proved: creation of life requires complicated large molecules. It’s hard to find life popping into existence out of nowhere (abiogenesis). Scientists are still figuring it out.

Conclusion: creation of life (from non-living matter) turned out to be possible, but way more complex than it could be. Many people in counterfactual worlds who believed in “life from non-living matter” turned out to be wrong.

“Life has a special life-force.”

Partially disproved: living beings operate under the same laws as non-living matter.
Partially proved: there’s a lot of distinctions between living beings and non-living matter. Just not on the level of fundamental laws of physics. For example, (some) living beings can do cognition. And cognition turned out to be more complicated than it could be: behaviorism lost to cognitive revolution. Moreover, the fundamental element of life (“hereditary molecule”) turned out to be more low-level than it could be. Life is closer to the fundamental level of nature than it could be.

Conclusion: the concept of life-force is wrong on one level of reality, but true on other levels. Check out What Is Life? by Erwin Schrödinger for a perspective.

Each proved idea is based on N disproved versions of itself. And each disproved idea creates N truer versions of itself. Ideas are like infinite rivers that keep splitting but never die off.

Does the analysis above matter today? Yes, if you believe in something like “intelligence is going to be explained by scaling up today’s Machine Learning methods because reductionism always bamboozled idealistic ideas with the simplest explanations”. Because you sound like a wrong person in the 19th century. Like someone who wanted to explain life by making a golem, but had to deal with DNA and quantum physics.

Rationality

Causal rationality vs. Descriptive rationality

If you treat hypotheses as atomic statements about the world, you live in the inductive world. You focus on causal explanations, causal models. You ask “WHY this happens?”. You want to describe a specific reality in terms of outcomes. To find what’s true “right now”. This is “causal rationality”.

If you treat hypotheses as spectrums, you live in the anti-inductive world. You focus on acausal models, on patterns and analogies. You ask “HOW this happens?”. You want to describe all possible (and impossible) realities in terms of each other. To find what’s true “in the future”. This is “descriptive rationality”.

I think those approaches correspond to two types of rationality. Those types should be connected. But if you ignore one of them, you get irrationality.

So, could you help me to describe anti-inductive reasoning? For example, maybe we could add something to the math of one-shot learning based on the “anti-inductive heuristics”.

Note: the distinction between two types of rationality may remind you about the distinction between different types of decision theories: casual decision theory vs. logical decision theory (see LessWrong decision theories, such as “functional decision theory”). Here’s an article for everyone about logical decision theory on Arbital.

Recap

The overall point of this post:

Induction and “anti-induction” are two complementary ways to look at the world. Induction (/the usual probabilistic inference) is for low-level hypotheses we check by brute force. Anti-induction is for high-level conceptual thinking.
Artificial neural networks use induction (hypotheses about the world supported by thousands of examples). Humans use “anti-induction” (hypotheses based on anecdotes).
The two types of induction correspond to two types of rationality. We should unify those types.

I know that the name “anti-induction” is controversial, but I believe it makes sense in context. “Anti-induction” is the name of the overall approach (of seeking such patterns in the world, describing the world in such way), not a description of a particular example. The name is not meant to be interpreted too literally.

Other

David Deutsch

Some psychological effects suggest that humans have a strange relationship with induction.

The first strange thing is that people may not use inductive inference for many beliefs at all. At least such inference feels meaningless. Do you think older people are more confident that the sun will rise tomorrow because they have more experience? I think people believe in the next sunrise because:

We have a good explanation. See David Deutsch’s ideas. (a summary by Sam Enright)
The sunrise is needed for a lot of other things to happen.
It’s just such type of event.

Anti-induction and psychology

(1) Repugnant conclusion. Shows how transitivity may fail in ethics. No transitivity = no induction.

(2) Family resemblance leads to intransitive similarity relationships. If “A is similar to B” and “B is similar to C”, it doesn’t follow that “C is similar to A”.

(3) Our preferences may be quasitransitive. It is related to Sorites paradox. See also ‘Puzzle of the Self-Torturer’ (1993) by Warren Quinn.

(4) Raven paradox is about induction. I. J. Good’s versions of the paradox turn positive evidence into negative evidence. For example:

Suppose that we know we are in one or other of two worlds, and the hypothesis, H, under consideration is that all the ravens in our world are black. We know in advance that in one world there are a hundred black ravens, no non-black ravens, and a million other birds; and that in the other world there are a thousand black ravens, one white raven, and a million other birds. A bird is selected equiprobably at random from all the birds in our world. It turns out to be a black raven. This is strong evidence … that we are in the second world, wherein not all ravens are black.

(5) There’s a common sense notion (folk’s notion) that you can take true things “too far” if you’re overly consistent and strict. “Ends don’t justify the means” is a specific instance of this general principle. “The more times you use an argument, the less likely it is to be true”, I believe something like this can be true.

(6) Paradox of voting. Now induction works with human intuition, but it creates a paradox for rational choice theory. The Arbital article discusses this.

(7) Doomsday argument, an unintuitive argument because “anyone could make the same argument”. Here people do use induction, but it conflicts with probability.

Markets are Anti-Inductive

“Markets are Anti-Inductive” by Eliezer Yudkowsky

I don’t know how markets work, but this sounds cool:

Whatever is believed in, stops being real.

...

I suspect there’s a Pons Asinorum of probability between the bettor who thinks that you make money on horse races by betting on the horse you think will win, and the bettor who realizes that you can only make money on horse races if you find horses whose odds seem poorly calibrated relative to superior probabilistic guesses.

If it worked in the real world, it would mean that obviously good and obviously bad opinions fail, but the smartest hipster opinions win.

Wouldn’t it be fun if it turned out that reasoning in general works like this? And how does all of this connect to “prediction markets”, are they anti-inductive too?

Optimism, unexpected hanging

(Those are small notes, you may skip them.)

(1) Maybe “optimism” and “pessimism” (and emotions) have anti-inductive properties.

You can’t be pessimistic about everything. You would be a sad rock who doesn’t care about anything. But if you’re pessimistic about 99% of the things, it makes your optimism about 1% of the things stronger. This conservation of optimism leads to anti-induction.

For example, Arthur Schopenhauer was pessimistic about almost everything. But it made his optimism about art more surprising, stronger than that of other people. Not every optimist is going to value art when life is meaningless.

(2) Unexpected hanging paradox. A case where induction fails in a paradoxical way. Even though you can make the same paradox with only a single day.

P.S.: just in case I’ll mention about my situation.

What if human reasoning is anti-inductive?