# Q Home’s Shortform

• 21 Sep 2022 8:31 UTC
3 points
0 ∶ 1

For some time I wanted to apply the idea of probabilistic thinking (used for predicting things) to describing things, making analogies between things. This is important because your hypotheses (predictions) depend on the way you see the world. If you could combine predicting and describing into a single process, you would unify cognition.

Fuzzy logic and fuzzy sets is one way to do it. The idea is that something can be partially true (e.g. “humans are ethical” is somewhat true) or partially belong to a class (e.g. a dog is somewhat like a human, but not 100%). Note that “fuzzy” and “probable” are different concepts. But fuzzy logic isn’t enough to unify predicting and describing. Because it doesn’t tell us much about how we should/​could describe the world. No new ideas.

I have a different principle for unifying probability and description. Here it is:

Properties of objects aren’t contained in specific objects. Instead, there’s a common pool that contains all possible properties. Objects take their properties from this pool. But the pool isn’t infinite. If one object takes 80% of a certain property from the pool, other objects can take only 20% of that property (e.g. “height”). Socialism for properties: it’s not your “height”, it’s our “height”.

How can an object “take away” properties of other objects? For example, how can a tall object “steal” height from other objects? Well, imagine there are multiple interpretations of each object. Interpretation of one object affects interpretation of all other objects. It’s just a weird axiom. Like a Non-Euclidean geometry.

This sounds strange, but this connects probability and description. And this is new. I think this principle can be used in classification and argumentation. Before showing how to use it I want to explain it a little bit more with some analogies.

## Connected houses

Imagine two houses, A and B. Those houses are connected in a specific way.

When one house turns on the light at 80%, the other turns on the light only at 20%.

When one house uses 60% of the heat, the other uses only 40% of the heat.

(When one house turns on the red light, the other turns on the blue light. When one house is burning, the other is freezing.)

Those houses take electricity and heat from a common pool. And this pool doesn’t have infinite energy.

## Kindness

Usually people think about qualities as something binary: you either has it or not. For example, a person can be either kind or not.

For me an abstract property such as “kindness” is like the white light. Different people have different colors of “kindness” (blue kindness, green kindness...). Every person has kindness of some color. But nobody has all colors of kindness.

Abstract kindness is the common pool (of all ways to express it). Different people take different parts of that pool.

## Some more analogies

Theism analogy. You can compare the common pool of properties to the “God object”, a perfect object. All other objects are just different parts of the perfect object. You also can check out Monadology by Gottfried Leibniz.

Spectrum analogy. You can compare the common pool of properties to the spectrum of colors. Objects are just colors of a single spectrum.

Ethics analogy. Imagine that all your good qualities also belong (to a degree) to all other people. And all bad qualities of other people also belong (to a degree) to you. As if people take their qualities from a single common pool.

Buddhism analogy. Imagine that all your desires and urges come (to a degree) from all other people. And desires and urges of all other people come (to a degree) from you. There’s a single common pool of desire. This is somewhat similar to karma. In rationality there’s also a concept of “values handshakes”: when different beings decide to share each other’s values.

Quantum analogy. See quantum entanglement. When particles become entangled, they take their properties from a single common pool (quantum state).

Fractal analogy. “All objects in the Universe are just different versions of a single object.”

Subdivision analogy. Check out Finite subdivision rule. You can compare the initial polygone to the common pool of properties. And different objects are just pieces of that polygone.

## Connection with recursion

Recursion. If objects take their properties from the common pool, it means they don’t really have (separate) identities. It also means that a property (X) of an object is described in terms of all other objects. So, the property (X) is recursive, it calls itself to define itself.

For example, imagine we have objects A, B and C. We want to know their heights. In order to do this we may need to evaluate those functions:

• A(height), B(height), C(height)

• A(B(height)), A(C(height)) …

• A(B(C(height))), A(C(B(height))) …

A priori assumptions about objects should allow us to simplify this and avoid cycles.

Fractals. See Coastline paradox. You can treat a fractal as an object with multiple interpretations (where an interpretation depends on the scale). Objects taking their properties from the common pool = fractals taking different scales from the common range.

# Classification

To explain how to classify objects using my principle, I need to explain how to order them with it.

I’ll explain it using fantastical places and videogame levels, because those things are formal and objective enough (they are 3D shapes). But I believe the same classification method can be applied to any objects, concepts and even experiences.

Basically, this is an unusual model of contextual thinking. If we can formalize this specific type of contextual thinking, then maybe we can formalize contextual thinking in general. This topic will sound very esoteric, but it’s the direct application of the principle explained above.

## Intro

(I interpret paintings as “real places”: something that can be modeled as a 3D shape. If a painting is surreal, I simplify it a bit in my mind.)

Take a look at those places: image.

Let’s compare 2 of them: image. Let’s say we want to know the “height” of those places. We don’t have a universal scale to compare the places. Different interpretations of the height are possible.

If we’re calling a place “very tall”—we need to understand the epithet “very tall” in probabilistic terms, such as “70-90% tall”—and we need to imagine that this probability is taken away from all other places. We can’t have two different “very tall” places. Probability should add up to 100%.

Now take a look at another place (A): image (I ignore the cosmos to simplify it). Let’s say we want to know how enclosed it is. In one interpretation, it is massively enclosed by trees. In another interpretation, trees are just a decorative detail and can be ignored. Let’s add some more places for context: image. They are definitely more open than the initial place, so we should update towards more enclosed interpretation of (A). All interpretations should be correlated and “compatible”. It’s as if we’re solving a puzzle.

You can say that properties of places are “expandable”. Any place contains a seed of any possible property and that seed can be expanded by a context. “Very tall place” may mean Mt. Everest or a molehill depending on context. You can compare it to a fractal: every small piece of a fractal can be expanded into the entire thing. And I think it’s also very similar to how human language, human concepts work.

You also may call it “amplification of evidence”: any smallest piece of evidence (or even absence of any evidence) can be expanded into very strong evidence by context. We have a situation like in the Raven paradox, but even worse.

## Rob Gonsalves

(I interpret paintings as “real” places.)

Places in random order: image.

My ordering of places: image.

I used 2 metrics to evaluate the places:

• Is the space of the place “box-like” and small or not?

• Is the place enclosed or open?

The places go from “box-like and enclosed” to “not box-like and open” in my ordering.

But to see this you need to look at the places in a certain way, reason about them in a certain way:

• Place 1 is smaller than it seems. Because Place 5 is similar and “takes away” its size.

• Place 2 is more box-like than it seems. Because similar places 4 and 6 are less box-like.

• Place 3 is more enclosed than it seems. Because similar places 4 and 6 “take away” its openness.

• Place 5 is more open than it seems. Because similar places 1 and 2 “take away” its closedness.

Almost any property of any specific place can be “illusory”. But when you look at places in the context you can deduce their properties vie the process of elimination.

• ## Argumentation, hypotheses

You can apply the same idea (about the “common pool”) to hypotheses and argumentation:

• You can describe a hypothesis in terms of any other hypothesis. You also can simplify it along the way (let’s call it “regularization”). Recursion and circularity is possible in reasoning.

• Truth isn’t attached to a specific hypothesis. Instead there’s a common “pool of truth”. Different hypotheses take different parts of the whole truth. The question isn’t “Is the hypothesis true?”, the question is “How true is the hypothesis compared to others?” And if the hypotheses are regularized it can’t be too wrong.

• Alternatively: “implications” of a specific hypothesis aren’t attached to it. Instead there’s a common “pool of implications”. Different hypotheses take different parts of “implications”.

• Conservation of implications: if implications of a hypothesis are simple enough, they remain true/​likely even if the hypothesis is wrong. You can shift the implications to a different hypothesis, but you’re very unlikely to completely dissolve them.

• In usual rationality (hypotheses don’t share truth) you try to get the most accurate opinions about every single thing in the world. You’re “greedy”. But in this approach (hypotheses do share truth) it doesn’t matter how wrong you are about everything unless you’re right about “the most important thing”. But once you’re proven right about “the most important thing”, you know everything. A billion wrongs can make a right. Because any wrong opinion is correlated with the ultimate true opinion, the pool of the entire truth.

• You can’t prove a hypothesis to be “too bad” because it would harm all other hypotheses. Because all hypotheses are correlated, created by each other. When you keep proving something wrong the harm to other hypotheses grows exponentially.

• Motivated reasoning is valid: truth of a hypothesis depends on context, on the range of interests you choose. Your choice affects the truth.

• Any theory is the best (or even “the only one possible”) on its level of reality. For example, on a certain level of reality modern physics doesn’t explain weather better than gods of weather.

In a way it means that specific hypotheses/​beliefs just don’t exist, they’re melted into a single landscape. It may sound insane (“everything is true at the same time and never proven wrong” and also relative!). But human language, emotions, learning, pattern-matching and research programs often work like this. It’s just a consequence of ideas (1) not being atomic statements about the world and (2) not being focused on causal reasoning, causal modeling. And it’s rational to not start with atomic predictions when you don’t have enough evidence to locate atomic hypotheses.

## Causal rationality, Descriptive rationality

You can split rationality into 2 components. The second component isn’t explored. My idea describes the second component:

• Causal rationality. Focused on atomic independent hypotheses about the world. On causal explanations, causal models. Answers “WHY this happens?”. Goal: to describe a specific reality in terms of outcomes.

• Descriptive rationality. Focused on fuzzy and correlated hypotheses about the world. On patterns and analogies. Answers “HOW this happens?”. Goal: to describe all possible (and impossible) realities in terms of each other.

Causal and Descriptive rationality work according to different rules. Causal uses Bayesian updating. Descriptive uses “the common pool of properties + Bayesian updating”, maybe.

• “Map is not the territory” is true for Causal rationality. It’s wrong for Descriptive rationality: every map is a layer of reality.

• “Uncertainty and confusion is a part of the map, not the territory”. True for Causal rationality. Wrong for Descriptive rationality: the possibility of an uncertainty/​confusion is a property of reality.

• “Details make something less likely, not more” (Conjunction fallacy). True for Causal rationality. Wrong for Descriptive rationality: details are not true or false by themselves, they “host” kernels of truth, more details may accumulate more truth.

• For Causal rationality, math is the ideal of specificity. For Descriptive rationality, math has nothing to do with specificity: an idea may have different specificity on different layers of reality.

• In Causal rationality, hypotheses should constrain outcomes, shouldn’t explain any possible outcome. In Descriptive rationality… constraining depends on context.

• Causal rationality often conflicts with people. Descriptive rationality tries to minimize the conflict. I believe it’s closer to how humans think.

• Causal rationality assumes that describing reality is trivial and should be abandoned as soon as possible. Only (new) predictions matter.

• In Descriptive rationality, a hypothesis is somewhat equivalent to the explained phenomenon. You can’t destroy a hypothesis too much without destroying your knowledge about the phenomenon itself. It’s like hitting a nail so hard that you destroy the Earth.

Example: Vitalism. It was proven wrong in causal terms. But in descriptive terms it’s almost entirely true. Living matter does behave very differently from non-living matter. Living matter does have a “force” that non-living matter doesn’t have (it’s just not a fundamental force). Many truths of vitalism were simply split into different branches of science: living matter is made out of special components (biology/​microbiology) including nanomachines/​computers!!! (DNA, genetics), can have cognition (psychology/​neuroscience), can be a computer (computer science), can evolve (evolutionary biology), can do something like “decreasing entropy” (an idea by Erwin Schrödinger, see entropy and life). On the other hand, maybe it’s bad that vitalism got split into so many different pieces. Maybe it’s bad that vitalism failed to predict reductionism. However, behaviorism did get overshadowed by cognitive science (living matter did turn out to be more special than it could be). Our judgement of vitalism depends on our choices, but at worst vitalism is just the second best idea. Or the third best idea compared to some other version of itself… Absolute death of vitalism is astronomically unlikely and it would cause most of reductionism and causality to die too along with most of our knowledge about the world. Vitalism partially just restates our knowledge (“living matter is different from non-living”), so it’s strange to simply call it wrong. It’s easier to make vitalism better than to disprove it.

Perhaps you could call the old version of vitalism “too specific given the information about the world”: why should “life-like force” be beyond laws of physics? But even this would be debatable at the time. By the way, the old sentiment “Science is too weak to explain living things” can be considered partially confirmed: 19th century science lacked a bunch of conceptual breakthroughs. And “only organisms can make the components of living things” is partially just a fact of reality: skin and meat don’t randomly appear in nature. This fact was partially weakened, but also partially strengthened with time. The discovery of DNA strengthened it in some ways. It’s easy to overlook all of those things.

In Descriptive rationality, an idea is like a river. You can split it, but you can’t stop it. And it doesn’t make sense to fight the river with your fists: just let it flow around you. However, if you did manage to split the river into independent atoms, you get Causal rationality.

## 2 types of rationality should be connected

I think causal rationality has some problems and those problems show that it has a missing component:

• Rationality is criticized for dealing with atomic hypotheses about the world. For not saying how to generate new hypotheses and obtain new knowledge. Example: critique by nostalgebraist. See “8. The problem of new ideas”

• You can’t use causal rationality to be critical of causal rationality. In theory you should be able to do it, but in practice people often don’t do it. And causal rationality doesn’t model argumentation, even for the most important topics such as AI safety. So we end up arguing like anyone argues.

• Doomsday argument, Pascal’s mugging. Probability starts to behave weird when we add large numbers of (irrelevant) things to our world.

• The problem of modesty. Should you assume that you’re just an average person?

• Weird addition in ethics. Repugnant conclusion, “Torture vs. Dust Specks”.

• Causal rationality doesn’t give/​justify an ethical theory. Doesn’t say how to find it if you want to find it.

• Causal rationality doesn’t give/​justify a decision theory. There’s a problem with logical uncertainty (uncertainty about implications of beliefs).

I’m not saying that all of this is impossible to solve with Causal rationality. I’m saying that Causal rationality doesn’t give any motivation to solve all of this. When you’re trying to solve it without motivation you kind of don’t know what you’re doing. It’s like trying to write a program in bytecode without having high-level concepts even in your mind. Or like trying to ride an alien device in the dark: you don’t know what you’re doing and you don’t know where you’re doing.

What and where are we doing when we’re trying to fix rationality?

• ## Crash Bandicoot 1

Crash Bandicoot N. Sane Trilogy

My ordering of some levels: image. Videos of the levels: Level 1, Level 2, Level 3, Level 4, Level 5, Level 6.

I used 2 metrics to evaluate the levels:

1. Is the level stretched vertically or horizontally?

2. Is the level easy to separate into similar square-like pieces or not? (like a patchwork)

The levels go from “vertical and separable” to “horizontal and not separable”.

But to see this you need to note:

• Level 1 is very vertical: it’s just a vertical wall. So it “takes away” verticality from levels 2 and 3.

• From levels 1-3, level 3 is the most horizontal. Because it’s the least similar to the level 1.

• Levels 4-6 repeat the same logic, but now levels are harder to separate into similar square-like pieces. Why? Because levels 1 and 2 are very easy to separate (they have repeating patterns on the walls), so they “take away” separability from all other levels.

Any question about any property of any level is answered by another question: is this property already “occupied” by some other level?

## Jacek Yerka

Jacek Yerka

Places in random order: image.

My ordering of places: image.

I used 2 metrics to evaluate the places:

1. Can the place fit inside a box-like space? (not too big, not too small)

2. Is the place inside or outside of something small?

The places go from “box-like and outside” to “not box-like and inside”.

But to see this you need to note:

• Place 1 could be interpreted as being inside of a town. But similar Place 5 is inside a single road. So it takes away “inside-ness” from Place 1.

• Place 2 is more “outside” than it seems. Because similar Place 6 fits inside an area with small tiles. So it takes away “inside-ness” from Place 2.

• Place 3 is not so tall as it seems. Because similar Place 6 is very tall. So it takes away height from Place 3.

If you feel this relativity of places’ properties, then you understand how I think about places. You don’t need to understand a specific order of places perfectly.

## Crash Bandicoot 3

My ordering of some levels: image. Videos of the levels: Level 1, Level 2, Level 3, Level 4, Level 5, Level 6, Level 7

I used 1 metrics to evaluate the levels:

• Does the space create a 3D space (box-like, not too big, not too small) or 2D space (flat surface) or 0D space (shapeless, cloud-like)?

Levels go from 3D to 2D to 0D.

But to see this you need to note:

• Levels 6 and 7 are less box-like than they seem. Because similar levels 1 and 2 already create small box-like spaces. So they take away “box-like” feature from levels 6 and 7.

• Level 3 is more box-like than it seems. Because levels 4 and 5 create more dense flat surfaces. So they take away flatness of Level 3.

Each level is described by all other levels. This recursive logic determines what features of the levels matter.

## Negative objects

When objects take their properties from a single pool of properties, there may appear “negative objects”. It happens when objects A and B take away opposite properties from a third object C (with equal force). For example, A may take away height from C. But B takes away shortness (anti-height) from C. So, “negative objects” are like contradictions. You can’t fit a negative object anywhere in the order of positive objects.

Let’s get back to Crash Bandicoot 3 and add two levels: image. Videos of the levels: Level −2, Level −1

• Take a look at Level −2. It’s too empty for levels 6 and 7 (and too box-like). But it’s too big and shapeless for levels 1 and 2. And it’s obviously not a flat surface. So, it doesn’t fit anywhere. Maybe it’s just better to place it in its own order.

• Similar thing is true for Level −1. It’s too different from levels 6 and 7 and it’s too small for levels 1 and 2.

• Levels −2 and −1 are also both inside some kind of structures. This adds confusion when you compare them to other levels.

Note that negative levels are still connected with all the other levels anyway: their properties are still determined by properties of all other levels, just in a more complicated way.

You can order negative levels by using the metrics for positive levels. In the case above, you can do it like this:

1. Take negative levels. Cut out their larger parts. Now they’re just like the positive levels.

2. Order them the same way you ordered positive levels.

## Hyper objects

There are also “hyper objects” (hyper positive and hyper negative objects). Such objects take “too much” or “too little” from the common pool of properties compared to normal objects.

How do hyper objects appear? I may not be able to explain it. Maybe a hyper object appears when an object takes a property (equally strong) from objects with very different amounts of that property. This was very confusing and vague, so here’s an analogy: imagine a number that’s very-very, but equally far away from the numbers 2 and 5. It has distance 10 from both 2 and 5. How can this be? This number should go somewhere “sideways”… it must be a complex number. So, you can compare hyper objects to complex numbers.

An example of hyper levels for Crash Bandicoot 3: image. Video of the levels: “Bye Bye Blimps”, “N. Gin”

• “Bye Bye Blimps” is like a flat surface, but utterly gigantic. But it’s also shapeless like levels 6 and 7, yet bigger than them/​equally big, but in a different way.

• “N. Gin” is identical to “Bye Bye Blimps” in this regard.

# Theory

## How is this related to anything?

You may be asking “How can ordering things be related to anything?” Prepare for a little bit abstract argument.

Any thought/​experience is about multiple things coexisting in your mental state. So, any thought/​experience is about direct or indirect comparison between things. And any comparison can be described by an order or multiple orders.

• If compared things don’t share properties, then you can order them using “arithmetic” (absolute measurements, uncorrelated properties). In this case everything happening in your mental state is absolutely separated, it’s a degenerate case.

• If compared things 100% share properties, then you can order them using my method (pool of properties, absolutely correlated properties). In this case everything happening in your mental state is mixed into a single process.

• If compared things partially share properties, then you can use a mix between “arithmetic” and my method. In this case everything happening in your mental state partially breaks down into separate processes.

So, “my orders + arithmetic orders” is something like a Turing machine: a universal model that can describe any thought/​experience, any mental state. Of course, a Turing machine can describe anything my method can describe, but my method is more high-level.

## Formalization

I know that what I described above doesn’t automatically specify a mathematical model. But I think we should be able to formalize my idea easily enough. If not, then my idea is wrong.

We have those hints for formalization:

• The idea about the common pool of properties. Connection with probability.

• Connection with recursion.

• The idea of “negative objects” and “hyper objects”. Connection with superrationality/​splitting resources.

• We can test the formalization on comparing 3D shapes (maybe even 2D shapes). Easy to model and formalize.

• Connection to hypotheses, rationality. To Bayes’ rule. (See below.)

• We can try a special type of brainstorming/​spitballing based on my idea. (See below.)

To be honest, I’m bad at math. I based my theory on synesthesia-like experiences and conceptual ideas. But if the information above isn’t enough, I can try to give more. I have experience of making my idea more specific, so I could guess how to make the idea even more specific (if we encounter a problem). Please, help me with formalizing this idea.

• 3 Sep 2022 1:27 UTC
2 points
0 ∶ 0

Perhaps the alternative to the maximize one thing subject to a price to humans constraint would be not making the AI that specialized. Make it maximize across a basket of things humans want.

While I have heard the paper clips take over the universe worry it seems to be that type of thought experiment introduce the problem to begin with (making a bit of a circular type error). As I gather (indirectly) the problem is the paper clip maximizing AI end up taking over the entire economy. That seems to equivalent to suggesting the AI replaces all the markets and other economic decisions (being smarter, faster and more competitive I guess).

If so isn’t an obvious solution to give it multiple (infinite in the sense of unlimited human wants) things to maximize? While it might replace the human production economic activity it’s going to produce some form a current state production possibility frontier and, I would think, an inter temporal one as well that that might address some inter-generational concerns.

I don’t think that fully solves the alignment problem (as I understand it—possibly poorly) but I do think it shifts what the risks are and may well eliminate a lot of the existential risks people worry about.

• I want to share a way of dissolving disagreements. It’s also a style of thinking. I call it “the method of statements”, here’s the description:

1. Take an idea, theory or argument. Split it into statements of a certain type. (Or multiple types.)

2. Evaluate the properties of the statements. Do they exist (i.e. can they be defined, does anything connect them)? Can they be used, are they constructive? Are they simple? Are they important? Etc.

3. Try to extract as much information as possible from those statements.

One rule:

• A statement counts as existing even if it can’t be formalized or expressed in a particular epistemology.

When you evaluate an argument with the method of statements, you don’t evaluate the “logic” of the argument or its “model of the world”. You evaluate properties of statements implied by the argument. Do statements in question correlate with something true or interesting?

You also may apply the method to analyzing information. You may split the information about something into statements of a certain type and study the properties of those statements. I can’t define what a “statement” is. It’s the most basic concept. Sometimes “statements” are facts, but not always. “Statements” may even be non-verbal. A set of statements can be defined in any way possible.

I will give a couple of less controversial (for rationalists) examples of applying the method. Then a couple of more controversial examples. And then share a couple of my own ideas in the context of the method. But before this...

### Rationalist taboo

If a tree falls in a forest and no one is around to hear it, does it make a sound? (on wikipedia)

We may try to resolve the disagreement by trying to replace the label “sound” with its more specific contents. Are we talking about sound waves, the vibration of atoms? Are we talking about the subjective experience of sound? Are we talking about mathematical models and hypothetical imaginary situations? An important point is that we don’t try to define what “sound” is, because it would only lead to a dispute about definitions.

The method of statements is somewhat similar to taboo. But with the method we “taboo” ideas and arguments themselves. We take an idea and replace it with its more specific, more atomic semantic contents. We take a thought and split it into smaller thoughts. It’s “taboo” applied on a different level, a “meta-taboo” applied to the process of thinking itself.

However, rationalist taboo and the method of statements may be in direct conflict. Because rationalist taboo assumes that a “statement” is meaningless if it can’t be formalized or expressed in a particular epistemology.

# Reception of ideas

This part of the post is about reception of two LessWrong ideas/​topics.

## Evaluating Logical decision theory (LDT)

What is Logical Decision Theory (LDT)? You can check out “An Introduction to Logical Decision Theory for Everyone Else”

With the usual way of thinking, even if a person is sympathetic enough to LDT they may react like this:

• I think the idea of LDT is important, but...

• I’m not sure it can be formalized (finished).

• I’m not sure I agree with it. It seems to violate such and such principles.

• You can get the same results by fixing old decision theories.

• Conclusion: “LDT brings up important things, but it’s nothing serious right now”.

(The reaction above is inspired by criticisms of William David MacAskill and Prof. Wolfgang Schwarz)

With the method of statements, being sympathetic enough to LDT automatically entails this (or even more positive) reaction:

1. There exist two famous types of statements: “causal statements” (in CDT) and “evidential statements” (in EDT). LDT hypothesizes the third type, “logical statements”. The latter statements definitely exist. They can be used in thinking, i.e. they are constructive enough. They are simple enough (“conceptual”). And they are important enough. This already makes LDT a very important thing. Even if you can’t formalize it, even if you can’t make a “pure” LDT.

2. Logical statements (A) in decision theory are related to another type of logical statements (B): statements about logical uncertainty. We have to deal with the latter ones even without LDT. Logical statements (A) are also similar to more established albeit not mainstream “superrational statements” (see Superrationality).

3. Logical statements can be translated into other types of statements. But this doesn’t justify avoiding to talk about them.

4. Conclusion: “LDT should be important, whatever complications it has”.

The method of statements dissolves a number of things:

• It dissolves counter-arguments about formalization: “logical statements” either exist or don’t, and if they do they carry useful information. It doesn’t matter if they can be formalized or not.

• It dissolves minor disagreements. “Logical statements” either can or can’t be true. If they can there’s nothing to “disagree” about. And true statements can’t violate any (important) principles.

Some logical suggestions do seem weird and unintuitive at first. But this weirdness may dissolve when you notice that those suggestions are properties of simple statements. If those statements can be true, then there’s nothing weird about the suggestions. At the end of the day, we don’t even have to follow the suggestions while agreeing that the statements are true and important. Statements are sources of information, nothing more and nothing less.

• It dissolves the confusion between different possible theories. “Logical statements” are either important or not. If they are, then it doesn’t matter in which language you express them. It doesn’t even matter what theory is correct.

I think the usual way of thinking may be very reasonable but, ultimately, it’s irrational, because it prompts unjust comparisons of ideas and favoring ideas which look more familiar and easier to understand/​implement in the short run. With the usual way of thinking it’s very easy to approach something in the wrong way and “miss the point”.

• If you want to describe human values, you can use three fundamental types of statements (and mixes between the types). Maybe there’re more types, but I know only those three:

1. Statements about specific states of the world, specific actions. (Atomic statements)

2. Statements about values. (Value statements)

Any of those types can describe unaligned values. So, any type of those statements still needs to be “charged” with values of humanity. I call a statement “true” if it’s true for humans.

We need to find the statement type with the best properties. Then we need to (1) find a language for this type of statements (2) encode some true statements and/​or describe a method of finding “true” statements. If we’ve succeeded we solved the Alignment problem.

I believe X statements have the best properties, but their existence is almost entirely ignored in Alignment field.

I want to show the difference between the statement types. Imagine we ask an Aligned AI: “if human asked you to make paperclips, would you kill the human? Why not?” Possible answers with different statement types:

1. Atomic statements: “it’s not the state of the world I want to reach”, “it’s not the action I want to do”.

2. Value statements: “because life, personality, autonomy and consent is valuable”.

3. X statements: “if you kill, you give the human less than human asked, less than nothing: it doesn’t make sense for any task”, “destroying the causal reason of your task (human) is often meaningless”, “inanimate objects can’t be worth more than lives in many trade systems”, “it’s not the type of task where killing would be an option”, “killing humans makes paperclips useless since humans use them: making useless stuff is unlikely to be the task”, “reaching states of no return should be avoided in many tasks” (Impact Measures).

X statements have those better properties compared to other statement types:

• X statements have more “density”. They give you more reasons to not do a bad thing. For comparison, atomic statements always give you only one single reason.

• X statements are more specific, but equally broad compared to value statements.

• Many X statements not about human values can be translated/​transferred into statements about human values. (It’s valuable for learning, see Transfer learning.)

• X statements allow to describe something universal for all levels of intelligence. For example, they don’t exclude smart and unexpected ways to solve a problem, but they exclude harmful and meaningless ways.

• X statements are very recursive: one statement can easily take another (or itself) as an argument. X statements more easily clarify and justify each other compared to value statements.

## Do X statements exist?

I can’t define human values, but I believe values exist. The same way I believe X statements exist, even though I can’t define them.

I think existence of X statements is even harder to deny than existence of value statements. (Do you want to deny that you can make statements about general properties of systems and tasks?) But you can try to deny their properties.

## X statements in Alignment field

X statements are almost entirely ignored in the field (I believe), but not completely ignored.

Impact measures (“affecting the world too much is bad”, “taking too much control is bad”) are X statements. But they’re a very specific subtype of X statements.

Normativity (by abramdemski) is a mix between value statements and X statements. But statements about normativity lack most of the good properties of X statements. They’re too similar to value statements.

• 8 Sep 2022 8:15 UTC
1 point
0 ∶ 0

I want to discuss a particular failure mode of communication and thinking in general. I think it affects our thinking about AI Alignment too.

Communication. A person has a vague, but useful idea (P). This idea is applicable on one level of the problem. It sounds similar to another idea (T), applicable on a very different level of the problem. Because of the similarity nobody can understand the difference between (P) and (T). People end up overestimating the vagueness of (P) and not considering it. Because people aren’t used to mapping ideas to “levels” of a problem. Information that has to give more clarity (P is similar to T) ends up creating more confusion. I think this is irrational, it’s a failure of dealing with information.

Thinking in general. A person has a specific idea (T) applicable on one level of a problem. The person doesn’t try to apply a version of this idea on a different level. Because (1) she isn’t used to it (2) she considers only very specific ideas, but she can’t come up with a specific idea for other levels. I think this is irrational: rationalists shouldn’t shy away from vague ideas and evidence. It’s a predictable way to lose.

A comical example of this effect:

• A: I got an idea. We should cook our food in the oven. Using the oven itself. I haven’t figured out all the details yet, but...

• B: We already do this. We put the food in the oven. Then we explode the oven. You can’t get more “itself” than this.

• A: I have something else on my mind. Maybe we should touch the oven in multiple places or something. It may turn it on.

• B: I don’t want to blow up with the oven!

• A: We shouldn’t explode the oven at all.

• B: But how does the food get cooked?

• A: I don’t know the exact way it happens… but I guess it gets heated.

• B: Heated but not exploded? Sounds like a distinction without a difference. Come back when you have a more specific idea.

• A: But we have only 2 ovens left, we can’t keep exploding them! We have to try something else!

B can’t understand A, because B thinks about the problem on the level of “chemical reactions”. On that level it doesn’t matter what heats the food, so it’s hard to tell the difference between exploding the oven and using the oven in other ways.

Bad news is that “taboo technique” (replacing a concept with its components: “unpacking” a concept) may fail to help. Because A doesn’t know the exact way to turn on the oven or the exact way the oven heats the food. Her idea is very useful if you try it, but it doesn’t come with a set of specific steps.

And the worst thing is that A may not be there in the first place. There may be no one around to even bother you to try to use your oven differently.

I think rationality doesn’t have a general cure for this, but this may actually be one of the most important problems of human reasoning. I think the entire human knowledge is diseased with this. Our knowledge is worse than swiss cheese and we don’t even try to fill the gaps.

Any good idea that was misunderstood and forgotten—was forgotten because of this. Any good argument that was ignored and ridiculed—was ignored because of this. It all got lost in the gaps.

## Metrics

I think one method to resolve misunderstanding is to add some metrics for comparing ideas. Then talk about something akin to probability distributions over those metrics. A could say:

“”Instruments have parts with different functions. Those functions are not the same, even though they may intersect and be formulated in terms of each other:

1. Some parts create the effect of the instrument. E.g. the head of a hammer when it smashes a nail.

2. Some parts control the effect of the instrument. E.g. the handle of a hammer when a human aims it at a nail.

In practice, some parts of the instrument realize both functions. E.g. the handle of a hammer actually allows you not only to control the hammer, but also to speed up the hammer more effectively.

When we blow up the oven, we use 99% of the first function of the oven. But I believe we can use 80% of the second function and 20% of the first.”″

## Complicated Ideas

Let’s explore some ideas to learn to attach ideas to “levels” of a problem and seek “gaps”. “(gap)” means that the author didn’t consider/​didn’t write about that idea.

Two of those ideas are from math. Maybe I shouldn’t have used them as examples, but I wanted to give diverse examples.

(1) “Expected Creative Surprises” by Eliezer Yudkowsky. There are two types of predictability:

1. Predictability of a process.

2. Predictability of its final outcomes.

Sometimes they’re the same thing. But sometimes you have:

• An unpredictable process with predictable final outcomes. E.g. when you play chess against a computer: you don’t know what the computer will do to you, but you know that you will lose.

• (gap) A predictable process with unpredictable final outcomes. E.g. if you don’t have enough memory to remember all past actions of the predictable process. But the final outcome is created by those past actions.

(2) “Belief in Belief” by Eliezer Yudkowsky. Beliefs exist on three levels:

1. Verbal level.

2. First “muscle memory” level. Your anticipations of direct experiences.

Sometimes a belief exists on all those levels and contents of the belief are the same on all levels. But sometimes you get more interesting types of beliefs, for example:

• A person says that “the sky is green”. But the person behaves as if the sky is blue. But the person instinctively defends the belief “the sky is green”.

• Not verbally formulated “muscle memory” belief. Some intuition you didn’t think to describe or can’t describe.

• (gap) Slowly forming “muscle memory” belief created by your muscle reactions to other beliefs. Some intuition/​preference that only started to form, but for now exists mainly as a reaction to other intuitions and preferences.

(3) “The Real Butterfly Effect”, explained by Sabine Hossenfelder. There’re two ways in which consequences of an event spread:

1. A small event affects more and more things with time.

2. Event on a small scale affects larger and larger scale events.

In a way it’s kind of the same thing. But in a way it’s not:

• One Butterfly Effect means sensitivity to small events (butterflies).

• Another Butterfly Effect says that there’s an infinity of smaller and smaller events (butterflies). And even if you account for them all you have a time limit for prediction.

(4) “P=NP, relativisation, and multiple choice exams”, Baker-Gill-Solovay theorem explained by Terence Tao. There are two dodgy things:

1. Cheating.

2. Simulation of cheating.

Sometimes they are “the same thing”, sometimes they are not.

(5) “Free Will and consciousness experience are a special type of illusion.” An idea of Daniel Dennett. There are 2 types of illusions:

1. Illusions which are complete lies that don’t correspond to anything real. E.g. a mirage in a desert.

2. Illusions that simplify complicated reality. E.g. when you close a program by clicking on it with the arrow: the arrow didn’t really stop the program (even though it kind of did), it’s a drastic simplification of what actually happened (rapid execution of thousands lines of code).

Conscious experience is an illusion of the second type, Dennett says. I don’t agree, but I like the idea and think it’s very important.

Somewhat similar to Fictionalism: there are lies and there are “truths of fiction”, “useful lies”. Mathematical facts may be the same type of facts as “Macbeth is insane/​Macbeth dies”.

(6) “Tlön, Uqbar, Orbis Tertius” by Jorge Luis Borges. A language has two functions:

1. First function focuses on describing objects.

2. Second function focuses on describing properties of objects.

Different languages can have different focus on those functions:

• Many human languages focus on both functions equally (fifty-fifty).

• Fictional languages of Borges focus 100% on properties. Objects don’t exist/​there’s way too much particular objects.

• (gap) Synesthesia-like “languages”. They focus 80% on properties and 20% on objects.

I think there’s an important gap in Borges’s ideas: Borges doesn’t consider a language with extremely strong, but not absolute emphasis on the second function. Borges criticizes his languages, but doesn’t steelman them.

(7) “Pierre Menard, Author of the Quixote” by Jorge Luis Borges. There are 3 ways to copy a text:

1. You can copy the text.

2. You can copy the action of writing the text.

3. You can copy the thoughts behind the text.

4. You can change the text. (“anti-option”)

Pierre Menard wants to copy 1% of the 1 and 98% of the 2 and 1% of the 3: Pierre Menard wants to imagine exactly the same text but with completely different thoughts behind the words.

(“gap”) Pierre Menard also could try to go for 100% of 3 and for “anti 99%” of 4: try to write a completely new text by experiencing the same thoughts and urges that created the old one.

• ## Puzzles

You can use the same thinking to analyze/​classify puzzles.

Inspired by Pirates of the Caribbean: Dead Man’s Chest. Jack has a compass that can lead him to a thing he desires. Jack wants to find a key. Jack can have those experiences:

1. Experience of the real key.

2. Experience of a drawing of the key.

3. Pure desire for the key.

In order for compass to work Jack may need (almost) any mix of those: for example, maybe pure desire is enough for the compass to work. But maybe you need to mix pure desire with seeing at least a drawing of the key (so you have more of a picture of what you want).

• Gibbs: And whatever this key unlocks, inside there’s something valuable. So, we’re setting out to find whatever this key unlocks!

• Jack: No! If we don’t have the key, we can’t open whatever it is we don’t have that it unlocks. So what purpose would be served in finding whatever need be unlocked, which we don’t have, without first having found the key what unlocks it?

• Gibbs: So—We’re going after this key!

• Jack: You’re not making any sense at all.

• Gibbs: ???

Jack has those possibilities:

1. To go after the chest. Foolish: you can’t open the chest.

2. To go after the key. Foolish: you can get caught by Davy Jones.

Gibbs thinks about doing 100% of 1 or 100% of 2 and gets confused when he learns that’s not the plan. Jack thinks about 50% of 1 and 50% of 2: you can go after the chest in order to use it to get the key. Or you can go after the chest and the key “simultaneously” in order to keep Davy Jones distracted and torn between two things.

Braid, Puzzle 1 (“The Ground Beneath Her Feet”). You have two options:

1. Ignore the platform.

2. Move the platform.

You need 50% of 1 and 50% of 2: first you ignore the platform, then you move the platform… and rewind time to mix the options.

Braid, Puzzle 2 (“A Tingling”). You have the same two options:

1. Ignore the platform.

2. Move the platform.

Now you need 50% of 1 and 25% of 2: you need to rewind time while the platform moves. In this time-manipulating world outcomes may not add up to 100% since you can erase or multiply some of the outcomes/​move outcomes from one timeline to another.

## Argumentation

You can use the same thing to analyze arguments and opinions. Our opinions are built upon thousands and thousands “false dilemmas” that we haven’t carefully revised.

For example, take a look at those contradicting opinions:

1. Humans are smart. Sometimes in very non-obvious ways.

2. Humans are stupid. They make a lot of mistakes.

Usually people think you have to believe either “100% for 1 or “100% for 2. But you can believe in all kinds of mixes.

For example, I believe in 90% of 1 and 10% of 2: people may be “stupid” in this particular nonsensical world, but in a better world everyone would be a genius.

## Ideas as bits

You can treat an idea as a “(quasi)probability distribution” over some levels of a problem/​topic. Each detail of the idea gives you a hint about the shape of the distribution. (Each detail is a bit of information.)

We usually don’t analyze information like this. Instead of cautiously updating our understanding with every detail of an idea we do this:

1. try to grab all details together

2. get confused (like Gibbs)

3. throw most of the details out and end up with an obviously wrong understanding.

Note: maybe you can apply the same idea about “bits” to chess (and other games). Each idea and each small advantage you need to come with the winning plan is a “bit” of information/​advantage. Before you get enough information/​advantage bits the positions looks like a cloud where you don’t see what to do.

## Richness of ideas

I think you can measure “richness” of theories (and opinions and anything else) using the same quasiprobabilities/​bits. But this measure depends on what you want.

Compare those 2 theories explaining different properties of objects:

• (A) Objects have different properties because they have different combinations of “proto properties”.

• (B) Objects have different properties because they have different organization of atoms.

Let’s add a metric to compare 2 theories:

1. Does the theory explain why objects exist in the first place?

2. Does the theory explain why objects have certain properties?

Let’s say we’re interested in physical objects. B-theory explains properties through 90% of 1 and 10% of 2: it makes properties of objects equivalent to the reason of their existence. A-theory explains properties through 100% of 2. B-theory is more fundamental, because it touches more on a more fundamental topic (existence).

But if we’re interested in mental objects… B-theory explains only 10% of 2 and 0% of 1. And A-theory may be explaining 99% of 1. If our interests are different A-theory turns out to be more fundamental.

When you look for a theory (or opinion or anything else), you can treat any desire and argument as a “bit” that updates the quasiprobabilities like the ones above.

## Discussion

We could help each other to find gaps in our thinking! We could do this in this thread.

## Gaps of Alignment

I want to explain what I perceive as missed ideas in Alignment. And discuss some other ideas.

(1) You can split possible effects of AI’s actions into three domains. All of them are different (with different ideas), even though they partially intersect and can be formulated in terms of each other. Traditionally we focus on the first two domains:

I think third domain is mostly ignored and it’s a big blind spot.

I believe that “human (meta-)ethics” is just a subset of a way broader topic: “properties of (any) systems”. And we can translate the method of learning properties of simple systems into a method of learning human values (a complicated system). And we can translate results of learning those simple systems into human moral rules. And many important complicated properties (such as “corrigibility”) has analogies in simple systems.

(2) Another “missed idea”:

1. Some people analyze human values as a random thing (random utility function).

2. Some people analyze human values as a result of evolution.

3. Some analyze human values as a result of people’s childhoods.

4. Not a lot of people analyze human values as… a result of the way humans experience the world.

“True Love(TM) towards a sentient being” feels fundamentally different from “eating a sandwich”, so it could be evidence that human experiences have an internal structure and that structure plays a big role in determining values. But not a lot of models (or simply 0) take this “fact” into account. Not surprisingly, though: it would require a theory of human subjective experience. But still, can we just ignore this “fact”?

(3) Preference utilitarianism says:

• You can describe entire ethics by a (weighted) aggregation of a single microscopic value. This microscopic values is called “preference”.

I think there’s a missed idea: you could try to describe entire ethics by a weighted aggregation of a single… macroscopic value.

(4) Connectionism and Connectivism. I think this is a good example of a gap in our knowledge:

1. There’s the idea of biological or artificial neurons.

2. (gap)

3. There’s the idea that communication between humans is like communication between neurons.

I think one layer of the idea is missing: you could say that concepts in the human mind are somewhat like neurons. Maybe human thinking is like a fractal, looks the same on all levels.

(5) Bayesian probability. There’s an idea:

• You can describe possible outcomes (microscopic things) in terms of each other. Using Bayes’ rule.

I think this idea should have a “counterpart”: maybe you can describe macroscopic things in terms of each other. And not only outcomes. Using something somewhat similar to probabilistic reasoning, to Bayes’ rule.

That’s what I tried to do in this post.

• (Drafts of a future post.) I want to confront/​explain my optimism. Here’s a thought experiment to explain what “optimism” means to me:

Imagine a world like Earth. There’s an underground prison. People live there for generations. The prison is constructed in such a way that you can live there “forever”. People are not aware that the world outside of the prison exists.

One person in the prison imagines freedom. But she doesn’t have evidence (or so it seems).

• A: I got an idea: maybe we shouldn’t optimize our life in prison. Maybe we can escape to freedom.

• B: “Freedom”? What is this? The prison is the entire world, what do you mean by “escaping” it?

• (A explains)

• A: I think freedom is likely enough to exist.

• B: Why do you believe this?

• A: I can imagine it being true.

• B: Since when do we imagine evidence?

• A: You don’t understand, some things are true simply because you can imagine them.

• B: No, they aren’t.

• A: They are.

• B: I can imagine a unicorn, does the unicorn exist?

• A: Unicorns don’t matter. And unicorns do exist: we could build a unicorn if we tried hard enough.

• B: This is some Dark Arts Philosophy of Insanity right there.

• A: Do you remember Kant’s idea about a priori synthesis? It’s about the type of knowledge that combines innate assumptions and real world evidence.

• B: We already optimized crazy philosophy books into toilet paper.

• A: Can you disprove my idea?

• B: This is a wrong question! Why are we prioritizing an idea without evidence and then trying to disprove it?

• A: OK. Try this: remember all the things that aren’t directly related to the prison. Remember the other prisoners, their personalities. Remember the patterns on the prison stones. Remember the way light reflects from the surfaces and casts shadows. 99% of your knowledge doesn’t say that we’re in prison. So why did you flip into believing in this prison just because you saw a prison wall? Escape the prison of your mind.

• B: Nah. We made enough Bayesian updating: we are in the prison. And there’s no trace of anything else.

...

• B: Did you forget how harsh this prison is? We’re beyond the reach of god.

• A: Well, that’s the thing: it’s too harsh. It’s much too harsh compared to what it could’ve been.

• B: We have to live in reality.

• A: If this prison is all we have, then this world isn’t worth living in. So I don’t care if I’m wrong.

• B: Don’t you care about other people? Don’t you have something to protect? Why do you need some additional thing (“freedom”) to care about other people?

• A: Yes, I care! That’s what made me think about freedom. Yes, you’re right. It can make sense to care only about what we have. But, I don’t know how to explain it… it’s just less “probable” to make sense if there’s nothing but this prison?

• B: Forget about freedom, we need to optimize our life in prison to save the bigger number!

• A: I think values need some free space to meaningfully exist, some possibility of a meaningful choice. You talk about values, but then you say that our values should be 100% controlled by this prison. And that our actual reasons to value each other don’t matter. Only the prison rules matter. In your worldview, our values don’t have any real effect on the world. And we never should act based on our personal values, only at times when our decisions are meaningless.

• B: This is word games. Or some “free will” nonsense.

• A: If you’re right, then we got pretty degenerate version of “values”. But my action based on my personal values is this: I’m going to find the escape from this prison.

• B: The bigger number!

• A: If I’m wrong, this means I’m not well-suited for this prison anyway. You can get by without me. One wasted opportunity is worth the chance to escape.

• B: No, without your help we won’t save the bigger number! And you’re good enough. Or you can get better.

• A: Your philosophy seems to be open to exploitation. What if this prison were run by maniacs? Would we need to torture each other for ages hoping for the promised survival of the “bigger number”? Hoping that anybody’s going to keep that promise.

...

• B: Ok, let’s try one more time: when did you start to think about freedom? When did you think about it the last couple of times? Why?

• A: It was just a feeling. I was just running in the prison corridor… and I thought that, theoretically, I could run forever, and, theoretically, there could be no walls.

• A: I have feelings like this very often. They don’t let me forget the idea. For example, I look at a stone and think “maybe the shape of our world could be at least as complex and interesting as the surface of this stone, maybe it could be just a little bit more complex and interesting than prison rectangles”.

...

• B: I’m getting tired of this. I think you have an elephant in the brain. And a snail in your beliefs. And backwards rationalizations. And a cockroach in your Bayesian updating. Broken software, broken hardware. And biases...

• A: No. We have more than this prison. This is the most true thing I know. If it’s not true, then “truth” itself doesn’t mean anything but arbitrary noise. Why don’t we eat each other, because we’re humans or just because this decade’s prison rules don’t say us to do so? If it’s not true in this world, then it’s true Somewhere Else. But maybe we are Somewhere Else… and cutting ourselves from that place may be worse than death.

## Prison System

What is the prison of our world? I can think of 11 prisons. Those are 11 main factors that (try to) limit my optimism.

Prison of Death. Humans die. However, this fact doesn’t imprison the entire humanity. And death isn’t logically necessary.

Prison of Pain. People experience pain. And if we didn’t, the pain would still exist as a concept, there would be a way to create pain (maybe). This fact doesn’t imprison entire humanity and pain isn’t necessary.

Prison of Experience. Your experience doesn’t matter. 99% of your experience doesn’t give you knowledge (power), doesn’t let you help anybody and barely matters in the culture. A couple of math theorems are “more important” (give more power, better remembered int he culture) than 50 years of someone’s suffering.

• I don’t believe this: I believe there should be a way to make our experience really matter.

Prison of Communication. This is one of the prisons from the thought experiment. I can’t communicate my ideas. I can’t communicate the value I feel. If I turn out to be wrong, if I fail, I’ll never be able to tell my story and why I believed what I believed. I’ll never share the way I saw the world. And I’ll never know the “true”, non-generic reason of my failure.

• I think this prison doesn’t “really” exist: there should be a more effective way to communicate.

Prison of Complexity. Human type of thinking can exist only on a certain level of computational power.

• Weak problem: our level of computational power can be attacked by brute force, by AI and AGI. AI can generate content faster than you. AGI can think faster than you.

• Strong problem: What happens beyond our level of computational power? Are super-intelligent beings similar to humans, in what ways? Do they have personalities? Does humanity “scale-up” or not?

• Opinion: I believe there’s something important about the way humans think. Doesn’t matter if we’re imprisoned by complexity or not.

Prison of Inequality. The possibility that people are not “equal” in some important aspect. This is a pessimistic thing because it contradicts the concept of personality: why are people different if difference is bad? And if difference is good, then inequality won’t let us notice this anyway.

Prison of Badness. Humans are born to think, but human thinking is the baddest and most egoistic and broken thing ever (compared to Bayes and “shut up and multiply”).

• I don’t believe in this. And this “prison” would be just ridiculous if other prisons didn’t exist. It pales in comparison with the other prisons.

...

Here’re some more, it feels as if they have a different flavor:

Prison of Impossible Problems. Humanity is bound to face “unsolvable” problems. Unable to “solve extinction” in time.

Prison of Time/​Opportunities. You don’t have the time and opportunities to develop your potential. And to experience everything you need.

Prison of Free Will. We don’t have free will.

Prison of Afterlife/​God. There’s no afterlife and no God.

## Jailbreak

• Imagine the classic paperclip maximizer thought experiment. We say AGI to make paperclips—AGI uses all matter of the Universe for it.

But now imagine a different version: we say AGI to “make paperclips that cost 1¢ (in human economy)”. Now killing everyone isn’t a solution: destroying humanity would destroy the economy.

Isn’t it an interesting version of the thought experiment? Of course, everything can/​will go wrong anyway, but maybe in a way funnier and more convoluted way. More funny and convoluted than “maximize human smiles”, for example. Because AGI needs to take into account effects of a system (economic system), not just fulfill some fixed conditions.

I first mentioned the idea in this comment, a couple of people disagreed.

• we say AGI to “make paperclips that cost 1¢ (in human economy)”. Now killing everyone isn’t a solution: destroying humanity would destroy the economy.

Seems to collapse easily: how does the AI decide what costs \$0.01, exactly? Does it use the last price of a transaction on a market, and is doing mark-to-market? Well… among many other problems that occur to me, the most immediate one is that the price can’t change if there aren’t any more transactions, now can it. Nothing about ‘make paperclips that would cost \$0.01’ would seem to rule out market manipulation, monopolization, or destruction. No market, no changes in the price you are marking to, no risk or volatility, no crash in prices due to oversupply, and enables efficient planning for the future and maximizing production of paperclips that would have cost \$0.01 on what used to be Earth.

(The humorous fictional version of this story would involve 2 of the last survivors locating a sensor of the AI, building a large hollow paperclip-shaped human habitat, and loudly and ostentatiously in front of the sensor, having the ‘owner’ sell it to the other survivor for exactly \$0.01, and then buying a regular paperclip for the smallest amount they can write on an IOU using Knuth up notation, thereby establishing new market prices.)

• I think destruction of the market should be ruled out easily. Say paperclips have to have this value on an active market.

For manipulation, monopolization and “kill almost everyone and leave just a small market of 2 last survivors”… I have to make a post about this. I have a deeper idea (maybe) behind it than this particular example.

My general idea is this: I think when you hook up AI’s rewards to a system that has to have certain properties, it leads to interesting effects and implications for Alignment. Because now the AI needs to care both about its rewards and also about the properties of the reward system. Many Alignment ideas implicitly try to achieve this anyway.

Instead of explaining “monopolization is bad” (complicated and specific fact) you need to explain “100% controlling your own reward system is bad” (easier and more universal fact).

The humorous fictional version of this story would involve 2 of the last survivors locating a sensor of the AI, building a large hollow paperclip-shaped human habitat

I think some outcomes of paperclip maximization are qualitatively different from “everyone dies”, even if they’re still very bad. The outcomes in which AI has to leave at least some freedom/​autonomy for humans (or some other system) are especially different. I think this is underexplored.

I think reformulating Alignment problem as “reward system control” problem at worst allows you to formulate all the same problems with a new angle and at best gives useful insight about the solution.

• Say paperclips have to have this value on an active market.

Defining ‘active market’ sounds quite difficult. Is any kind of software-mediated trading, as opposed to humans thrusting arms into the air, like HFT trading of stocks, an ‘active market’? Then fine, the AI creates agents which just wash-trades assets. (Better yet, it uses combinatorial markets to ensure bids/​asks only execute that leave the price exactly the same or other such properties minimized/​maximized/​stabilized.)

• To take a step back: do you see a potential conceptual distinction between my idea and classic paperclip maximization? (Of course, you don’t have to see it and/​or agree that there’s one. And even if there’s one in theory it doesn’t mean it exists in practice.)

Yes, it’s always hard to define the “true reward” AI should strive for. But properties of the system “true reward + AI” may be easier to define.

Then fine, the AI creates agents which just wash-trades assets.

If AI is able to reason/​learn about properties of reward systems, then AI should be able to infer that taking 100% control over the reward system is a hack. Not something that can possibly be asked. So hacking the economy isn’t just a solution “human doesn’t expect” (some such solutions are very good), it’s a solution that can’t possibly be asked. This is one of the points of my idea: to introduce a distinction between unexpected solutions and nonsensical solutions.

• do you see a potential conceptual distinction between my idea and classic paperclip maximization?

No. Not without a lot more work, because markets, evolution, gradient descent, Bayesian inference, and logical inference/​prediction markets all have various isomorphisms and formal identities, which can make their ‘differences’ more a matter of nominalist preference, notation, and emphasis than necessarily any genuine conceptual distinction. You can define AIs which are quite explicitly architected as ‘markets’ of various sorts, like the ‘Hayek machine’ or the ‘neural bucket brigade’, or interpret them as natural selection if you prefer on agents with log utility (evolutionary finance), and so on; are those “markets”, which can trade paperclips? Sure, why not.

• Thank you for taking the time to answer!

I see that I need a post to at least explain myself. On the other hand, I worry to post too soon (maybe it’s better to discuss something beforehand?). For the moment I decided to post this comment. I know, it’s not formal, but I wanted to show what type of AI thinking I have in mind. And sorry for an annoying semantic nitpick ahead.

Not without a lot more work, because markets, evolution, gradient descent, Bayesian inference, and logical inference/​prediction markets all have various isomorphisms and formal identities, which can make their ‘differences’ more a matter of nominalist preference, notation, and emphasis than necessarily any genuine conceptual distinction.

I think we can use 2 metrics to compare those ideas:

1. Does this idea describe what the AI tries to achieve?

2. Does this idea describe how the AI thinks internally?

Because of this I feel like there’s only 20% chance those ideas are equivalent/​there’s only 20% equivalence between them.

So, I feel like those ideas are different enough: “an AI that works like a market” and “an AI that seeks markets in the world and analyzes their properties”.

• (Drafts of a future post.)

Disclaimer: Of course, I don’t ever mean that we shouldn’t be worried about Alignment. I’m just trying to suggest new ways to think about values.

## Motion is the fundamental value

You (Q) visit a small town and have a conversation with one of the residents (A).

• A: Here we have only one fundamental value. Motion. Never stop living things.

• Q: I can’t believe you can have just a single value. I bet it’s an oversimplification! There’re always many values and tradeoffs between them. Even for a single person outside of society.

A smashes a bug.

• Q: You just smashed this bug! It seems pretty stopped. Does it mean you don’t treat a bug as a “living thing”? But how do you define a “living thing”? Or does it mean you have some other values and make tradeoffs?

• A: No, you just need to look at things in context. (1) If we protected the motion of extremely small things (living parts of animals, insects, cells, bacteria), our value would contradict itself. We would need to destroy or constrain almost all moving organisms. And even if we wanted to do this, it would ultimately lead to way smaller amount of motion for extremely small things. (2) There’re too much bugs, protecting a small amount of their movement would constrain a big amount of everyone else’s movement. (3) On the other hand, you’re right. I’m not sure if a bug is high on the list of “living things”. I’m not all too bothered by the definition because there shouldn’t be even hypothetical situations in which the precise definition matters.

• Q: Some people build small houses. Private property. Those houses restrict other people’s movement. Is it a contradiction? Tradeoff?

• A: No, you just need to look at things in context. (1) First of all, we can’t destroy all physical things that restrict movement. If we could, we would be flying in space, unable to move (and dead). (2) We have a choice between restricting people’s movement significantly (not letting them build houses) and restricting people’s movement inconsequentially and giving them private spaces where they can move even more freely. (3) People just don’t mind. And people don’t mind the movement created by this “house building”. And people don’t mind living here. We can’t restrict large movements based on momentary disagreements of single persons. In order to have any freedom of movement we need such agreements. Otherwise we would have only chaos that, ultimately, restricts the movement of everyone.

• Q: Can people touch each other without consent, scream in public, lay on the roads?

• A: Same thing. To have freedom of movement we need agreements. Otherwise we would have only chaos that restricts everyone. By the way, we have some “chaotic” zones anyway.

• Q: Can the majority of people vote to lock every single person in a cage? If majority is allowed to control the movement. It would be the same logic, the same action of society. Yes, the situations are completely different, but you would need to introduce new values to differentiate them.

• A: We can qualitatively differentiate the situations without introducing new values. The actions look identical only out of context. When society agrees to not hit each other, the society serves as a proxy of the value of movement. Its actions are caused and justified by the value. When society locks someone without a good reason, it’s not a proxy of the value anymore. In a way, you got it backwards: we wouldn’t ever allow the majority to decide anything if it meant that the majority could destroy the value any day.

• A: A value is like a “soul” that possesses multiple specialized parts of a body: “micro movement”, “macro movement”, “movement in/​with society”, “lifetime movement”, “movement in a specific time and place”. Those parts should live in harmony, shouldn’t destroy each other.

• Q: Are you consequentialists? Do you want to maximize the amount of movement? Minimize the restriction of movement?

• A: We aren’t consequentialists, even if we use the same calculations as a part of our reasoning. Or we can’t know if we are. We just make sure that our value makes sense. Trying to maximize it could lead to exploiting someone’s freedom for the sake of getting inconsequential value gains. Our best philosophers haven’t figured out all the consequences of consequentialism yet, and it’s bigger than anyone’s head anyway.

Conclusion of the conversation:

• Q: Now I see that the difference between “a single value” and “multiple values” is a philosophical question. And “complexity of value” isn’t an obvious concept too. Because complexity can be outside of the brackets.

• A: Right. I agree that “never stop living things” is a simplification. But it’s a better simplification than a thousand different values of dubious meaning and origin between all of which we need to calculate tradeoffs (which are impossible to calculate and open to all kinds of weird exploitations). It’s better than constantly splitting and atomizing your moral concepts in order to resolve any inconsequential (and meaningless) contradiction and inconsistency. Complexity of our value lies in a completely different plane: in the biases of our value. Our value is biased towards movement on a certain “level” of the world (not too micro- and not too macro- level relative to us). Because we want to live on a certain level. Because we do live on a certain level. And because we perceive on a certain level.

You can treat a value as a membrane, a boundary. Defining a value means defining the granularity of this value. Then you just need to make sure that the boundary doesn’t break, that the granularity doesn’t become too high (value destroys itself) or too low (value gets “eaten”). Granularity of a value = “level” of a value. Instead of trying to define a value in absolute terms as an objective state of the world (which can be changing) you may ask: in what ways is my value X different from all its worse versions? What is the granularity/​level of my value X compared to its worse versions? That way you’ll understand the internal structure of your value. Doesn’t matter what world/​situation you’re in you can keep its moral shape the same.

This example is inspired by this post and comments: (warning: politics) Limits of Bodily Autonomy. I think everyone there missed a certain perspective on values.

## Sweets are the fundamental value

You (Q) visit another small town to interview another resident (W).

• W: When we build our AGI we asked it only one thing: we want to eat sweets for the rest of our lives.

• Q: Oh. My. God.

• W: Now there are some free sweets flying around.

• Q: Did AI wirehead people to experience “sweets” every second?

• W: Sweets are not pure feelings/​experiences, they’re objects. Money analogy: seeing money doesn’t make you rich. Another analogy: obtaining expensive things without money doesn’t make rich. Well, it kind of does, but as a side-effect.

• Q: Did AI put people in a simulation to feed them “sweets”?

• W: Those wouldn’t be real sweets.

• Q: Did AI lock people in basements to feed them “sweets” forever?

• W: Sweets are just a part of our day. They wouldn’t be “sweets” if we ate them non-stop. Money analogy: if you’re sealed in a basement with a lot of money they’re not worth anything.

• Q: Do you have any other food except sweets?

• W: Yes! Sweets are just one type of food. If we had only sweets, those “sweets” wouldn’t be sweets. Inflation of sweets would be guaranteed.

• Q: Did AI add some psychoactive substances in the sweets to make “the best sweets in the world”?

• W: I’m afraid those sweets would be too good! They wouldn’t be “sweets” anymore. Money analogy: if 1 dollar was worth 2 dollars, it wouldn’t be 1 dollar.

• Q: Did AI kill everyone after giving everyone 1 sweet?

• W: I like your ideas. But it would contradict the “Sweets Philosophy”. A sweet isn’t worth more than a human life. Giving people sweets is a cheaper way to solve the problem than killing everyone. Money analogy: imagine that I give you 1 dollar and then vandalize your expensive car. It just doesn’t make sense. My action achieved a negative result.

• Q: But you could ask AI for immortality!!!

• W: Don’t worry, we already have that! You see, letting everyone die costs way more than figuring out immortality and production of sweets.

• Q: Assume you all decided to eat sweets and neglect everything else until you die. Sweets became more valuable for you than your lives because of your own free will. Would AI stop you?

• W: AI would stop us. If the price of stopping us is reasonable enough. If we’re so obsessed with sweets, “sweets” are not sweets for us anymore. But AI remembers what the original sweets were! By the way, if we lived in a world without sweets where a sweet would give you more positive emotions than any movie or book, AI would want to change such world. And AI would change it if the price of the change were reasonable enough (e.g. if we agreed with the change).

• Q: Final question… did AI modify your brains so that you will never move on from sweets?

• W: An important property of sweets is that you can ignore sweets (“spend” them) because of your greater values. One day we may forget about sweets. AI would be sad that day, but unable to do anything about it. Only hope that we will remember our sweet maker. And AI would still help us if we needed help.

Conclusion:

• W: if AI is smart enough to understand how money works, AI should be able to deal with sweets. AI only needs to make sure that (1) sweets exist (2) sweets have meaningful, sensible value (3) its actions don’t cost more than sweets. The Three Laws of Sweet Robotics. The last two rules are fundamental, the first rule may be broken: there may be no cheap enough way to produce the sweets. The third rule may be the most fundamental: if “sweets” as you knew them don’t exist anymore, it still doesn’t allow you to kill people. Maybe you can get slightly different morals by putting different emphases on the rules. You may allow some things to modify the value of sweets.

You can say AI (1) tries to reach worlds with sweets that have the value of sweets (2) while avoiding worlds where sweets have inappropriate values (maybe including nonexistent sweets) (3) while avoiding actions that cost more than sweets. You can apply those rules to any utility tied to a real or quasi-real object. If you want to save your friends (1), you don’t want to turn them into mindless zombies (2). And you probably don’t want to save them by means of eternal torture (3). You can’t prevent death by something worse than death. But you may turn your friends into zombies if it’s better than death and it’s your only option. And if your friends already turned into zombies (got “devalued”) it doesn’t allow you to harm them for no reason: you never escape from your moral responsibilities.

Difference between the rules:

1. Make sure you have a hut that costs \$1.

2. Make sure that your hut costs \$1. Alternatively: make sure that the hut would cost \$1 if it existed.

3. Don’t spend \$2 to get a \$1 hut. Alternatively: don’t spend \$2 to get a \$1 hut or \$0 nothing.

Get the reward. Don’t milk/​corrupt the reward. Act even without reward.

• ## Fixing universal AI bugs

My examples below are inspired by Victoria Krakovna examples: Specification gaming examples in AI

Video by Robert Miles: 9 Examples of Specification Gaming

I think you can fix some universal AI bugs this way: you model AI’s rewards and environment objects as a “money system” (a system of meaningful trades). You then specify that this “money system” has to have certain properties.

The point is that AI doesn’t just value (X). AI makes sure that there exists a system that gives (X) the proper value. And that system has to have certain properties. If AI finds a solution that breaks the properties of that system, AI doesn’t use this solution. That’s the idea: AI can realize that some rewards are unjust because they break the entire reward system.

By the way, we can use the same framework to analyze ethical questions. Some people found my line of thinking interesting, so I’m going to mention it here: “Content generation. Where do we draw the line?”

• A. You asked an AI to build a house. The AI destroyed a part of an already existing house. And then restored it. Mission complete: a brand new house is built.

This behavior implies that you can constantly build houses without the amount of houses increasing. With only 1 house being usable. For a lot of tasks this is an obviously incorrect “money system”. And AI could even guess for what tasks it’s incorrect.

• B1. You asked an AI to make you a cup of coffee. The AI killed you so it can 100% complete its task without being turned off.

• B2. You asked an AI to make you a cup of coffee. The AI destroyed a wall in its way and run over a baby to make the coffee faster.

This behavior implies that for AI its goal is more important than anything that caused its goal in the first place. This is an obviously incorrect “money system” for almost any task. Except the most general and altruistic ones, for example: AI needs to save humanity, but every human turned self-destructive. Making a cup of coffee is obviously not about such edge cases.

Accomplishing the task in such a way that the human would think “I wish I didn’t ask you” is often an obviously incorrect “money system” too. Because again, you’re undermining the entire reason of your task, and it’s rarely a good sign. And it’s predictable without a deep moral system.

• C. You asked an AI to make paperclips. The AI turned the entire Earth into paperclips.

This is an obviously incorrect “money system”: paperclips can’t be worth more than everything else on Earth. This contradicts everything.

Note: by “obvious” I mean “true for almost any task/​any economy”. Destroying all sentient beings, all matter (and maybe even yourself) is bad for almost any economy.

• D. You asked an AI to develop a fast-moving creature. The AI created a very long standing creature that… “moves” a single time by falling on the ground.

If you accomplish a task in such a way that you can never repeat what you’ve done… for many tasks it’s an obviously incorrect “money system”. You created a thing that loses all of its value after a single action. That’s weird.

• E. You asked an AI to play a game and get a good score. The AI found a way to constantly increase the score using just a single item.

I think it’s fairly easy to deduce that it’s an incorrect connection (between an action and the reward) in the game’s “money system” given the game’s structure. If you can get infinite reward from a single action, it means that the actions don’t create a “money system”. The game’s “money system” is ruined (bad outcome). And hacking the game’s score would be even worse: the ability to cheat ruins any “money system”. The same with the ability to “pause the game” forever: you stopped the flow of money in the “money system”. Bad outcome.

• F. You asked an AI to clean the room. It put a bucket on its head to not see the dirt.

This is probably an incorrect “money system”: (1) you can change the value of the room arbitrarily by putting on (and off) the bucket (2) the value of the room can be different for 2 identical agents—one with the bucket on and another with the bucket off. Not a lot of “money systems” work like this.

This is a broken “money system”. If the mugger can show you a miracle, you can pay them five dollars. But if the mugger asks you to kill everyone, then you can’t believe them again. A sad outcome for the people outside of the Matrix, but you just can’t make any sense of your reality if you allow the mugging.

• ## Corrigibility, reward hacking, Goodhart

How do we make an AI corrigible? How do we avoid reward hacking? Make an AI care about real things, not measures of real things? (Goodhart’s Law)

With current approaches you need to kind of force those properties onto AI. But they will never be fundamental for AI’s thinking and learning.

I think “money system” approach is interesting because it can make all those properties fundamental. Because a “money system” needs all those properties to exist (it needs to be somewhat real, avoid being hacked, allow corrections if a loophole is discovered, avoid being completely controlled by a single agent).

I’m not saying it solves everything. But it’s a way to deeply internalize some important safety properties.

## Kant, Categorical Imperative

Categorical imperative#Application

Kant’s applications of categorical imperative, Kant’s arguments are similar to reasoning about “money systems”. For example:

Does stealing make sense as a “money system”? No. If everyone is stealing something, then personal property doesn’t exist and there’s nothing to steal.

Note: I’m not talking about Kant’s conclusions, I’m talking about Kant’s style of reasoning.

• Alignment idea:

1. Classify different types of objects in the world. Those objects include your “rewards”. A generally intelligent being can do this.

2. Treat them as a sort of money system. Describe them in terms of each other.

3. Learn what is the correct money system.

It’ll at least allow us to get rid of some universal AI and AGI bugs. Because you can specify what’s a definitely incorrect “money system” (for a certain task). You can even make the AI predict it.

My examples are inspired by Rob Miles examples.

• A. You asked an AI to build a house. The AI destroys a part of an already existing house. And then restores it. Mission complete: a brand new house is built.

This behavior implies that you can constantly build houses without the amount of houses increasing. For a lot of tasks this is an obviously incorrect “money system”. And AI could even guess for which tasks it’s incorrect.

• B. You asked an AI to make you a cup of coffee. The AI killed you so it can 100% complete its task without being turned off.

This behavior implies that for AI its goal is more important than anything that caused its goal in the first place. This is an obviously incorrect “money system” for almost any task. Except the most general and altruistic ones, for example: AI needs to save humanity, but every human turned self-destructive. Making a cup of coffee is obviously not about such edge cases.

Accomplishing the task in such a way that the human would think “I wish I didn’t ask you” is an obviously incorrect “value system” too. Because again, you’re undermining the entire reason of your task, and it’s rarely a good sign. And it’s predictable without a deep moral system.

• C. You asked an AI to make paperclips. The AI turned the entire Earth into paperclips.

This is an obviously incorrect “money system”: paperclips can’t be worth more than everything else on Earth. This contradicts everything.

(another draft:)

If you ask an AI (AGI) to do something “as a human would do it”, you achieve safety but severely restrict the AI’s capabilities. No, you want the AI to accomplish a task in the most effective way. But you don’t want it to kill everybody. So, you need one of those things:

• Perfect instructions for AI.

• Perfect morality for AI.

I think there’s a third way. You can treat AI’s rewards (and objects in the world) as a “money system”. Then you can specify what types of money systems are definitely incorrect. Or even make AI predict it.

It would at least allow us to get rid of some universal AI and AGI bugs. I think that’s interesting.

• ## Simple preferences

A way to describe some preferences and decisions.

• Your colleague was sending you their fiction. You respected your colleague, but didn’t like the writing. Your colleague passed away. Would you burn all of their writings?

If you wouldn’t, it means counterfactual reward (/​counterfactual value of their writings) affects you strong enough.

• Your friend liked to listen to your songs (a). You didn’t play them too often (too much of a good thing). Your friend didn’t like to bother other people (b). Your friend passed away. Would you blast your songs through the whole town until everyone falls off their chairs 24/​7?

If you would, it means that you’re ready to milk counterfactual reward (a) while not caring about the counterfactual reward (b).

• All of humanity is dead. You’re the last survivor. You’re potentially immortal, but can’t create new life. You aren’t happy. Would you cling to your life? For how long?

Your answer determines how strong counterfactual value of life (if people were still alive) affects you now. If counterfactual value is strong, you can only keep on living.

• You want your desires to be satisfied (e.g. “communication with other people”). Even in the future, when your desires change. But do you want it in the future where you’re turned into a zombie? All zombie wants is to play in the dirt all day.

If “no”, that means the value of your desires can be updated only to a certain counterfactual degree. You can’t go from a desire with great value “I want to communicate with others” to the desire with almost zero counterfactual value “I want to play in the dirt all day”.

## Rationality misses something?

1. You can “objectively” define anything in terms of relations to other things.

2. There’s a simple process of describing a thing in terms of relations to other things.

Bayesian inference is about updating your belief in terms of relations to your other beliefs. Maybe the real truth is infinitely complex, but you can update towards it.

This “process” is about updating your description of a thing in terms of relations to other things. Maybe the real description is infinitely complex, but you can update towards it.

(One possible contrast: Bayesian inference starts with a belief spread across all possible worlds and tries to locate a specific world. My idea starts with a thing in a specific world and tries to imagine equivalents of this thing in all possible worlds.)

Bayesian process is described by Bayes’ theorem. My “process” isn’t described yet.

My idea was inspired by a weird/​esoteric topic. I was amazed by differences of people and surreal paintings, videogame levels. For example, each painting felt completely unique, but connected to all other paintings.

My most specific ideas are about that strange topic.

1. There are places (3D/​2D shape).

2. There are orders of places. An “order” for a place is like a context for a concept.

3. In an order a place has “granularity”. “Granularity” is like a texture (take a look at some textures and you’ll know what it means). It’s how you split a place into pieces. It affects on what “level” you look at the place. It affects what patterns you notice in a place. It affects to what parts you pay more attention.

When you add some minor rules, there appear consistent and inconsistent ways to distribute “granularity” between the places you compare. With some minor rules “granularity” lets you describe one place in terms of the other places. You assign each place a specific “granularity”, but all those granularities depend on each other.

In Bayesian inference you try to consistently assign probabilities to events. With the goal to describe outcomes in terms of each other. Here you try to consistently assign “granularity” to concepts. With the goal to describe the concepts in terms of each other.

I have a post with example: “Colors” of places. There you can find an example of what are the “rules” of granularity distribution may be. But I’m not a math person to put numbers on it/​turn it into a more specific model.

I think “granularity” (or something similar) is related to other human concepts and experiences too. I think this is a key concept/​a needed concept. It’s needed to describe qualitative differences, qualitative transitions between things. Bayesian inference and utilitarian moral theories describe only qualitative differences. And sometimes it may lead to strange results (like “torture vs. dust specks” thought experiment or “Pascal’s mugging” or even “Doomsday argument” maybe), because those theories can’t take any context into account. If we want to describe a new way of analyzing reality, we need to describe something a little bit different, I guess.

• I think we can try to solve AI Alignment this way:

Model human values and objects in the world as a “money system” (a system of meaningful trades). Make the AGI learn the correct “money system”, specify some obviously incorrect “money systems”.

Basically, you ask the AI “make paperclips that have the value of paperclips”. AI can do anything using all the power in the Universe. But killing everyone is not an option: paperclips can’t be more valuable than humanity. Money analogy: if you killed everyone (and destroyed everything) to create some dollars, those dollars aren’t worth anything. So you haven’t actually gained any money at all.

The idea is that “value” of a thing doesn’t exist only in your head, but also exists in the outside world. Like money: it has some personal value for you, but it also has some value outside of your head. And some of your actions may lead to the destruction of this “outside value”. E.g. if you kill everyone to get some money you get nothing.

I think this idea may:

• Fix some universal AI bugs. Prevent “AI decides to kill everyone” scenarios.

• Give a new way to explore human values. Explain how humans learn values.

• “Solve” Goodhart’s Curse and safety/​effectiveness tradeoff.

• Unify many different Alignment ideas.

• Give a new way to formulate properties we want from an AGI.

I don’t have a specific model, but I still think it gives ideas and unifies some already existing approaches. So please take a look. Other ideas in this post:

• Human values may be simple. Or complex, but not in the way you thought they are.

• Humans may have a small amount of values. Or big amount, but in an unexpected way.

Disclaimer: Of course, I don’t ever mean that we shouldn’t be worried about Alignment. I’m just trying to suggest new ways to think about values.

• (Drafts of a future post.)

My idea:

Every concept (or even random mishmash of ideas) has multiple versions. Those versions have internal relationships, positions in some space relative to each other. Those relationships are “infinitely complex”. But there’s a way to make drastic simplifications of those relationships. We can study the overall (“infinitely complex”) structure of the relationships by studying those simplifications. What do those simplifications do, in general? They put “costs” on versions of a concept.

We can understand how we think if we study our concepts (including values) through such simplifications. It doesn’t matter what concepts we study at all. Anything goes, we just need to choose something convenient. Something objective enough to put numbers on it and come up with models.

Once we’re able to model human concepts this way, we’re able to model human thinking (AGI) and human values (AI Alignment) and improve human thinking.

# Context

## 1.1 Properties of Qualia

There’s the hard problem of consciousness: how is subjective experience created from physical stuff? (Or where does it come from?)

But I’m interested in a more specific question:

• Does qualia have properties? What are they?

For example, “How do qualia change? How many different qualia can be created?” or “Do qualia form something akin to a mathematical space, e.g. a vector space? What is this space exactly?”

Is there any knowledge contained in the experience itself, not merely associated with it?1 For example, “cold weather can cause cold (disease)” is a fact associated with experience, but isn’t very fundamental to the experience itself. And this “fact” is even false, it’s a misconception/​coincidence.

When you get to know the personality of your friend, do you learn anything “fundamental” or really interesting by itself? Is “loving someone” a fundamentally different experience compared to “eating pizza” or “watching a complicated movie”?

Those questions feel pretty damn important to me! They’re about limitations of your meaningful experience and meaningful knowledge. They’re about personalities of people you know or could know. How many personalities can you differentiate? How “important/​fundamental” are those differences? And finally… those questions are about your values.

Those questions are important for Fun Theory. But they’re way more important/​fundamental than Fun Theory.

1 Philosophical context for this question: look up Immanuel Kant’s idea of “synthetic a priori” propositions.

## 1.2 Qualia and morality

And those questions are important for AI Alignment. If AI can “feel” that loving a sentient being and making a useless paperclip are 2 fundamentally different things, then it might be way easier to explain our values to that AI. By the way, I’m not implying that AI has to have qualia, I’m saying that our qualia can hint us towards the right model.

I think this observation gets a little bit glossed over: if you have a human brain and only care about paperclips… it’s (kind of) still objectively true for you that caring about other people would feel way different, way “bigger” and etc. You can pretend to escape morality, but you can’t escape your brain.

It’s extremely banal out of context, but the landscape of our experiences and concepts may shape the landscape of our values. Modeling our values as arbitrary utility functions (or artifacts of evolution) misses that completely.

## 2.1 Mystery Boxes

Box A

There’s a mystery Box A. Each day you find a random object inside of it. For example: a ball, a flower, a coin, a wheel, a stick, a tissue...

Box B

There’s also another box, the mystery Box B. One day you find a flower there. Another day you find a knife. The next day you find a toy. Next—a gun. Next—a hat. Next—shark’s jaws...

...

How to understand the boxes? If you could obtain all items from both boxes, you would find… that those items are exactly the same. They just appear in a different order, that’s all.

I think the simplest way to understand Box B is this: you need to approach it with a bias, with a “goal”. For example “things may be dangerous, things may cause negative emotions”. In its most general form, this idea is unfalsifiable and may work as a self-fulfilling prophecy. But this general idea may lead to specific hypotheses, to estimating specific probabilities. This idea may just save your life if someone is coming after you and you need to defend yourself.

Content of both boxes changes in arbitrary ways. But content change of the second box comes with an emotional cost.

There’re many many other boxes, understanding them requires more nuanced biases and goals.

I think those boxes symbolize concepts (e.g. words) and the way humans understand them. I think a human understands a concept by assigning “costs” to its changes of meaning. “Costs” come from various emotions and goals.

“Costs” are convenient: if any change of meaning has a cost, then you don’t need to restrict the meaning of a concept. If a change has a cost, then it’s meaningful regardless of its predictability.

## 2.2 More Boxes

More examples of mystery boxes:

• First box may alternate positive and negative items.

• Second box may alternate positive, directly negative and indirectly negative items. For example, it may show you a knife (directly negative) and then a bone (indirectly negative: a “bone” may be a consequence of the “knife”).

• Third box may alternate positive, negative and “subverted” items. For example, it may show you a seashell (positive), and then show you shark’s jaws (negative). But both sharks and seashells have a common theme, so “seashell (positive)” got subverted.

• Fourth box may alternate negative items and items that “neutralize” negative things. For example, it may show you a sword, but then show you a shield.

• Fifth box may show you that every negative thing has many related positive things.

You can imagine a “meta box”, for example a box that alternates between being the 1st box and the 2nd box. Meta boxes can “change their mood”.

I think, in a weird way, all those boxes are very similar to human concepts and words.

The more emotions, goals and biases you learn, the easier it gets for you to understand new boxes. But those “emotions, goals, biases” are themselves like boxes.

• I think I have an idea how we could solve AI Alignment, create an AGI with safe and interpretable thinking. I mean a “fundamentally” safe AGI, not a wildcard that requires extremely specific learning to not kill you.

Sorry for a grandiose claim. I’m going to write my idea right away. Then I’m going to explain the context and general examples of it, implications of it being true. Then I’m going to suggest a specific thing we can do. Then I’m going to explain why I believe my idea is true.

My idea will sound too vague and unclear at first. But I think the context will make it clear what I mean. (Clear as the mathematical concept of a graph, for example: a graph is a very abstract idea, but makes sense and easy to use.)

Please evaluate my post at least as science fiction and then ask: maybe it’s not fiction and just reality?

Key points of this post:

1. You can “solve” human concepts (including values) by solving semantics. By semantics I mean “meaning construction”, something more abstract than language.

2. Semantics is easier to solve than you think. And we’re closer to solving it than you think.

3. Semantics is easier to model than you think. You don’t even need an AI to start doing it. Just a special type of statistics. You don’t even have to start with analyzing language.

4. I believe ideas from this post can be applied outside of AI field.

Why do I believe this? Because of this idea:

• Every concept (or even random mishmash of ideas) has multiple versions. Those versions have internal relationships, positions in some space relative to each other. You can understand a concept by understanding those internal relationships.

• One problem though, those relationships are “infinitely complex”. However, there’s a special way to make drastic simplifications. We can study the real relationships through those special simplifications.

• What do those “special simplifications” do? They order versions of a concept (e.g. “version 1, version 2, version 3″). They can do this in extremely arbitrary ways. The important thing is that you can merge arbitrary orders into less arbitrary structures. There’s some rule for it, akin to the Bayes Rule or Occam’s razor. This is what cognition is, according to my theory.

If this is true, we need to find any domain where concepts and their simplifications are easy enough to formalize. Then we need to figure out a model, figure out the rule of merging simplifications. I’ve got a suggestion and a couple of ideas and many examples.

# Context

• ## 2.3 Words

This is a silly, wacky subjective example. I just want to explain the concept.

Here are some meanings of the word “beast”:

• (archaic/​humorous) any animal.

• an inhumanly cruel, violent, or depraved person.

• a very skilled human. example: “Magnus Carlsen (chessplayer) is a beast

• something very different and/​or hard. example: “Reading modern English is one thing, but understanding Shakespeare is an entirely different beast.”

• a person’s brutish or untamed characteristics. example: “The beast in you is rearing its ugly head”

What are the internal relationships between these meanings? If these meanings create a space, where is each of the meanings? I think the full answer is practically unknowable. But we can “probe” the full meaning, we can explore a tiny part of it:

Let’s pick a goal (bias), for example: “describing deep qualities of something/​someone”. If you have this goal, the negative meaning (“cruel person”) of the word is the main one for you. Because it can focus on the person’s deep qualities the most, it may imply that the person is rotten to the core. Positive meaning focuses on skills a lot, archaic meaning is just a joke. 4rd meaning doesn’t focus on specific internal qualities. 5th meaning may separate the person from their qualities.

When we added a goal, each meaning started to have a “cost”. This cost illuminates some part of the relationships between the meanings. If we could evaluate an “infinity” of goals, we could know those relationships perfectly. But I believe you can get quite a lot of information by evaluating just a single goal. Because a “goal” is a concept too, so you’re bootstrapping your learning. And I think this matches closely with the example about mystery boxes.

...

By combining a couple of goals we can make an order of the meanings, for example: beast 1 (rotten to the core), beast 2 (skilled and talented person), beast 3 (bad character traits), beast 4 (complicated thing), beast 5 (any animal). This order is based on “specificity” (mostly) and “depth” of a quality: how specific/​deep is the characterization?

Another order: beast 1 (not a human), beast 2 (worse than most humans), beast 3 (best among professionals), beast 4 (not some other things), beast 5 (worse than yourself). This order is based on the “scope” and “contrast”: how many things contrast with the object? Notice how each order simplifies and redefines the meanings. But I want to illustrate the process of combining goals/​biases on a real order:

## 2.4 Grammar Rules

You may treat this part of the post as complete fiction. But it illustrates how biases can be combined. And this is the most important thing about biases.

Gramar rules are concepts too. Sometimes people use quite complicated rules without even realizing, for example:

There’s a popular order: opinion, size, physical quality or shape, age, colour, origin, material, purpose. What created this order? I don’t know, but I know that certain biases could make it easier to understand.

Take a look at this part of the order: opinion, age, origin, purpose. You could say all those are not “real” properties. They seem to progress from less related/​less specific to the object to more related/​specific. If you operate under this bias (relatedness/​specificity), swapping the adjectives may lead to funny changes of meaning. For example: bad old wolf” (objective opinion), “old bad wolf” (intrinsic property or cheesy overblown opinion), “old French bad wolf” (a subspecies of the “French wolf”). You can remember how mystery boxes created meaning using order of items.

Another part of the order: size, physical quality or shape, color, material. You can say all those are “real” physical properties. “Size” could be possessed by a box around the object. “Physical quality” and “shape” could be possessed by something wrapped around the object. “Color” could be possessed by the surface of the object. “Material” can be possessed only by the object itself. So physical qualities progress like layers of an onion.

You can combine those two biases (“relatedness/​specificity” + “onion layers”) using a third bias and some minor rules. The third bias may be “attachment”. Some of the rules: (1) an adjective is attached either to some box around the object or to some layer of the object (2) you shouldn’t postulate boxes that are too big. It doesn’t make sense for an opinion to be attached to the object stronger than its size box. It doesn’t make sense for age to be attached to the object stronger than its color (does time pass under the surface layer of an object?). Origin needs to be attached to some layer of the object (otherwise we would need to postulate a giant box that contains both the object and its place of origin). I guess it can’t be attached stronger than “material” because material may expand the information about origin. And purpose is the “soul” of the object. “Attachment” is a reformulation of “relatedness/​specificity”, so we only used 2.5 biases to order 8 things. Unnecessary biases just delete themselves.

Of course, this is all still based on complicated human intuitions and high level reasoning. But, I believe, at the heart of it lies a rule as simple as the Bayes Rule or Occam’s razor. A rule about merging arbitrary connections into something less arbitrary.

...

I think stuff like sentence structure/​word order (or even morphology) is made of amalgamations of biases too.

Sadly, it’s quite useless to think about it. We don’t have enough orders like this. And we can’t create such orders ourselves (as a game), i.e. we can’t model this, it’s too subjective or too complicated. We have nothing to play with here. But what if we could do all of this for some other topic?

## 3.1 Argumentation

I believe my idea has some general and specific connections to hypotheses generation and argumentation. The most trivial connection is that hypotheses and arguments use concepts and themselves are concepts.

You don’t need a precisely defined hypothesis if any specification of your hypothesis has a “cost”. You don’t need to prove and disprove specific ideas, you may do something similar to the “gradient descent”. You have a single landscape with all your ideas blended together and you just slide over this landscape. The same goes for arguments: I think it is often sub-optimal to try to come up with a precise argument. Or waste time and atomize your concepts in order to fix any inconsequential “inconsistency”.

A more controversial idea would be that (1) in some cases you can apply wishful thinking, since “wishful thinking” is able to assign emotional “costs” to theories (2) in some cases motivated reasoning is even necessary for thinking. My theory already proposes that meaning/​cognition doesn’t exist without motivated reasoning.

## 3.2 Working with hypotheses

Observation:

Wizardry isn’t as powerful now as it was when Hogwarts was founded.

Hypotheses:

2. Wizards are interbreeding with Muggles and Squibs.

3. Knowledge to cast powerful spells is being lost.

4. Wizards are eating the wrong foods as children, or something else besides blood is making them grow up weaker.

5. Muggle technology is interfering with magic. (Since 800 years ago?)

6. Stronger wizards are having fewer children.

...

You can reformulate the hypotheses in terms of each other, for example:

• (1) Magic is fading away. (2) Magic mixes with non-magic. (3) Pieces of magic are lost. (4) Something affects the magic. (5) The same as 2 or 4. (6) Magic creates less magic.

• (1) Pieces of magic disappear. (2) ??? (3) Pieces of magic containing spells disappear. (4) Wizards don’t consume/​produce enough pieces of magic. (5) Technology destroys pieces of magic. (6) Stronger wizards produce fewer pieces of magic.

Why do this? I think it makes hypotheses less arbitrary and highlights what we really know. And it rises questions that are important across many theories: can magic be split into discrete pieces? can magic “mix” with non-magic? can magic be stronger or weaker? can magic create itself? By the way, those questions would save us from trying to explain a nonexistent phenomenon: maybe magic isn’t even fading in the first place, do we really know this?

## 3.3 New Occam’s Razor, new probability

And this way hypotheses are easier to order according to our a priori biases. We can order hypotheses exactly the same way we ordered meanings if we reformulate them to sound equivalent to each other. Here’s an example how we can re-order some of the hypotheses:

(1) Pieces of magic disappear by themselves. (2) Pieces of magic containing spells disappear. (3) Wizards don’t consume/​produce enough pieces of magic. (4) Stronger wizards produce fewer pieces of magic. (5) Technology destroys pieces of magic.

The hypotheses above are sorted by 3 biases: “Does it describe HOW magic disappears?/​Does magic disappear by itself?” (stronger positive weight) and “How general is the reason of the disappearance of magic?” (weaker positive weight) and “novelty compared to other hypotheses” (strong positive weight). “Pieces of magic containing spells disappear” is, in a way, the most specific hypotheses here, but it definitely describes HOW magic disappears (and gives a lot of new information about it), so it’s higher on the list. “Technology destroys pieces of magic” doesn’t give any new information about anything whatsoever, only a specific random possible reason, so it’s the most irrelevant hypothesis here. By the way, those 3 different biases are just different sides of the same coin: “magic described in terms of magic/​something else” and “specificity” and “novelty” are all types of “specificity”. Or novelty. Biases are concepts too, you can reformulate any of them in terms of the others too.

When you deal with hypotheses that aren’t “atomized” and specific enough, Occam’s Razor may be impossible to apply. Because complexity of a hypothesis is subjective in such cases. What I described above solves that: complexity is combined with other metrics and evaluated only “locally”. By the way, in a similar fashion you can update the concept of probability. You can split “probability” in multiple connected metrics and use an amalgamation of those metrics in cases where you have absolutely no idea how to calculate the ratio of outcomes.

## 3.4 “Matrices” of motivation

You can analyze arguments and reasons for actions using the same framework. Imagine this situation:

You are a lonely person on an empty planet. You’re doing physics/​math. One day you encounter another person, even though she looks a little bit like a robot. You become friends. One day your friend gets lost in a dangerous forest. Do you risk your life to save her? You come up with some reasons to try to save her:

• I care about my friend very much. (A)

• If my friend survives, it’s the best outcome for me. (B)

• My friend is a real person. (C)

You can explore and evaluate those reasons by formulating them in terms of each other or in other equivalent terms.

• “I’m 100% sure I care. (A) Her survival is 90% the best outcome for me in the long run. (B) Probably she’s real (C).” This evaluates the reasons by “power” (basically, probability).

• “My feelings are real. (A) The goodness/​possibility of the best outcome is real. (B) My friend is probably real. (C)” This evaluates the reasons by “realness”.

• “I care 100%. (A) Her survival is 100% the best outcome for me. (B) She’s 100% real. (C).” This evaluates the reasons by “power” strengthened by emotions: what if the power of emotions affects everything else just a tiny bit? By a very small factor.

• “Survival of my friend is the best outcome for me. (B) The fact that I ended up caring about my friend is the best thing that happened to me. Physics and math aren’t more interesting than other sentient beings. (A) My friend being real is the best outcome for me. But it isn’t even necessary, she’s already “real” in most of the senses. (C)” This evaluates the reasons by the quality of “being the best outcome”.

Some evaluations may affect others, merge together. I believe the evaluations written above only look like precise considerations, but actually they’re more like meanings of words, impossible to pin down. I gave this example because it’s similar to some of my emotions.

I think such thinking is more natural than applying a pre-existing utility function that doesn’t require any cognition. Utility of what exactly should you calculate? Of your friend’s life? Of your life? Of your life with your friend? Of your life factored by your friend’s desire “be safe, don’t risk your life for me”? Should you take into account change of your personality over time? I believe you can’t learn the difference without working with “meaning”.

## 4.1 Synesthesia

Imagine a face. When you don’t simplify it, you just see a face and emotions expressed by it. When you simplify it too much, you just see meaningless visual information (geometric shapes and color spots).

But I believe there’s something very interesting in-between. When information is complex enough to start making sense, but isn’t complex enough to fully represent a face. You may see unreal shapes (mixes of “face shapes” and “geometric shapes”… or simplifications of specific face shapes) and unreal emotions (simplifications of specific emotions) and unreal face textures (simplifications of specific face textures).

# Action

If my idea is true, what can we do?

1. We need to figure out the way to combine biases.

2. We need to find some objects that are easy to model.

3. We need to find “simplifications” and “biases” for those objects that are easy to model.

## What can we do? (in general)

However, even from made-up examples (not connected to a model) we can be getting some general ideas:

• Different versions of a concept always get described in equivalent terms and simplified. (When a “bias” is applied to the concept.)

• Multiple biases may turn the concept into something like a matrix?

• Sometimes combined biases are similar to a decision tree.

It’s not fictional evidence because at this point we’re not seeking evidence, we’re seeking a way to combine biases.

## What specific thing can we do?

I have a topic in mind: (because of my synesthesia-like experiences)

You can analyze shapes of “places” and videogame levels (3D or even 2D shapes) by making orders of their simplifications. You can simplify a place by splitting it into cubes/​squares, creating a simplified texture of a place. “Bias” is a specific method of splitting a place into cubes/​squares. You can also have a bias for or against creating certain amounts of cubes/​squares.

1. 3D and 2D shapes are easy to model.

2. Splitting a 3D/​2D shapes into cubes or squares is easy to model.

3. Measuring the amount of squares/​cubes in an area of a place is easy to model.

Here’s my post about it: “Colors” of places. The post gets specific about the way(s) of evaluating places. I believe it’s specific enough so that we could come up with models. I think this is a real chance.

I probably explained everything badly in that post, but I could explain it better with feedback.

Maybe we could analyze people’s faces the same way, I don’t know if faces are easy enough to model. Maybe “faces” have too complicated shapes.

## My evidence

I’ve always had an obsession with other people.

I compared any person I knew to all other people I knew. I tried to remember faces, voices, ways to speak, emotions, situations, media associated with them (books, movies, anime, songs, games).

If I learned something from someone (be it a song or something else), I associated this information with them and remembered the association “forever”. To the point where any experience was associated with someone. Those associations weren’t something static, they were like liquid or gas, tried to occupy all available space.

At some point I knew that they weren’t just “associations” anymore. They turned into synesthesia-like experiences. Like a blind person in a boat, one day I realized that I’m not in a river anymore, I’m in the ocean.

What happened? I think completely arbitrary associations with people where putting emotional “costs” on my experiences. Each arbitrary association was touching on something less arbitrary. When it happened enough times, I believe associations stopped being arbitrary.

“Other people” is the ultimate reason why I think that my idea is true. Often I doubt myself: maybe my memories don’t mean anything? Other times I feel like I didn’t believe in it enough.

...

But all this makes it so, so much worse. Imagine if after the death of an author all their characters died too (in their fictional worlds) and memories about the author and their characters died too. Ripples of death just never end and multiply. As if the same stupid thing repeats for the infinith time.

Updated the post (2).

• (Drafts of a future post.)

Could you help me to formulate statistics with the properties I’m going to describe?

I want to share my way of seeing the world, analyzing information, my way of experiencing other people. (But it’s easier to talk about fantastical places and videogame levels, so I’m going to give examples with places/​levels.)

If you want to read more about my motivation, check out “part 3”.

# Part 1: Theory

I got only two main philosophical ideas. First idea is that a part/​property of one object (e.g. “height”) may have a completely different meaning in a different object. Because in a different object it relates to and resonates with different things. By putting a part/​property in a different context you can create a fundamentally different version of it. You can split any property/​part into a spectrum. And you can combine all properties of an object into just a single one.

The second idea is that you can imagine that different objects are themselves like different parts of a single spectrum.

I want to give some examples of how a seemingly generic property can have a unique version for a specific object.

Example 1. Take a look at the “volume” of this place: (painting 1)

• Because we’re inside of “something” (the forest), the volume of that “something” is equal to the volume of the whole place.

• Because we have a lot of different objects (trees), we have the volume between those objects.

• Because the trees are hollow we also have the volume inside of them.

Different nuances of the place reflect its volume in a completely unique way. It has a completely unique context for the property of “volume”.

Example 2. Take a look at “fatness” of this place: (painting 2)

• The road doesn’t have too much buildings on itself: this amplifies “fatness”, because you get more earth per one small building.

• The road is contrasted with the sea. The sea adds more size to the image (which indirectly emphasizes fatness).

• Also because of the sea we understand that it’s not the whole world that is stretched: it’s just this fat road. We don’t look at this world through a one big distortion.

Different nuances of the place reflect its fatness in a completely unique way.

Example 3. Take a look at “height” of this place: (painting 3)

• The place is floating somewhere. The building in the center has some height itself. It resonates with the overall height.

• The place doesn’t have a ceiling and has a hole in the middle. It connects the place with the sky even more.

• The wooden buildings are “light”, so it makes sense that they’re floating in the air.

...

I could go on about places forever. Each feels fundamentally different from all the rest.

And I want to know every single one. And I want to know where they are, I want a map with all those places on it.

# Part 3: Motivation

I think my ideas may be important because they may lead to some new mathematical concepts.

Sometimes studying a simple idea or mechanic leads to a new mathematical concept which leads to completely unexpected applications.

For example, a simple toy with six sides (dice) may lead to saving people and major progress in science. Connecting points with lines (graphs) may lead to algorithms, data structures and new ways to find the optimal option or check/​verify something.

Not any simple thing is guaranteed to lead to a new math concept. But I just want you to consider this possibility. And maybe ask questions answers to which could rise the probability of this possibility.

## A new type of probability?

I think my ideas may be related to:

• Probability and statistics.

• Ways to describe vague things.

• Ways to describe vague arguments or vague reasoning, thinking in context. For example arguments about “bodily autonomy”

Maybe those ideas describe a new type of probability:

You can compare classic probability to a pie made of a uniform and known dough. When you assign probabilities to outcomes and ideas you share the pie and you know what you’re sharing.

And in my idea you have a pie made of different types of dough (colors) and those types may change dynamically. You don’t know what you’re sharing when you share this pie.

This new type of probability is supposed to be applicable to things that have family resemblance, polyphyly or “cluster properties” (here’s an explanation of the latter in a Philosophy Tube video).

## Blind men and an elephant

Imagine a world where people don’t know the concept of a “circle”. People do see round things, but can’t consciously pick out the property of roundness. (Any object has a lot of other properties.)

Some people say “the Moon is like a face”. Other say “the Moon is like a flower”. Weirder people say “the Moon is like a tree trunk” or “the Moon is like an embrace”. The weirdest people say “the Moon is like a day” or “the Moon is like going for a walk and returning back home”. Nobody agrees with each other, nobody understands each other.

Then one person comes up and says: “All of you are right. Opinions of everyone contain objective and useful information.”

People are shocked: at least someone has got to be wrong? If everyone is right, how can the information be objective and useful?

The concept of a “circle” is explained. Suddenly it’s extremely easy to understand each other. Like 2 and 2. And suddenly there’s nothing to argue about. People begin to share their knowledge and this knowledge finds completely unexpected applications.

https://​​en.wikipedia.org/​​wiki/​​Blind_men_and_an_elephant

The situation was just like in the story about blind men and an elephant, but even more ironic, since this time everyone was touching the same “shape”.

With my story I wanted to explain my opinions and goals:

• I want to share my subjective experience.

• I believe that it contains objective and important information.

• I want to share a way to share subjective experience. I believe everyone’s experience contains objective and important information.

## Meta subjective knowledge

If you can get knowledge from/​about subjective experience itself, it means there exists some completely unexplored type of knowledge. I want to “prove” that there does exist such type of knowledge.

Such knowledge would be important because it would be a new fundamental type of knowledge.

And such knowledge may be the most abstract: if you have knowledge about subjective experience itself, you have knowledge that’s true for any being with subjective experience.

## People

I’m amazed how different people are. If nothing else, just look at the faces: completely different proportions and shapes and flavors of emotions. And it seems like those proportions and shapes can’t be encountered anywhere else. They don’t feel exactly like geometrical shapes. They are so incredibly alien and incomprehensible, and yet so familiar. But… nobody cares. Nobody seems surprised or too interested, nobody notices how inadequate our concepts are at describing stuff like that. And this is just the faces, but there are also voices, ways to speak, characters… all different in ways I absolutely can’t comprehend/​verbalize.

I believe that if we (people) were able to share the way we experience each other, it would change us. It would make us respect each other 10 times more, remember each other 10 times better, learn 10 times more from each other.

It pains me every day that I can’t share my experience of other people (accumulated over the years I thought about this). My memory about other people. I don’t have the concepts, the language for this. Can’t figure it out. This feels so unfair! All the more unfair that it doesn’t seem to bother anyone else.

This state of the world feels like a prison. This prison was created by specific injustices, but the wound grew deeper, cutting something fundamental. Vivid experiences of qualia (other people, fantastic worlds) feel like a small window out of this prison. But together we could crush the prison wall completely.

• ## Key philosophical principles

Here I describe the most important, the most general principles of my philosophy.

• Objects exist only in context of each other, like colors in a spectrum. So objects are like “colors”, and the space of those objects is like a “spectrum”.

• All properties of an object are connected/​equivalent. Basically, an object has only 1 super property. This super property can be called “color”.

• Colors differentiate all usual properties. For example, “blue height” and “red height” are 2 fundamentally different types of height. But “blue height” and “blue flatness” are the same property.

So, each color is like a world with its own rules. Different objects exist in different worlds.

The same properties have different “meaning” in different objects. A property is like a word that heavily depends on context. If the context is different, the meaning of the property is different too. There’s no single metric that would measure all of the objects. For example, if the property of the object is “height”, and you change any thing that’s connected to height or reflects height in any way—you fundamentally change what “height” means. Even if only by a small amount.

Note: different objects/​colors are like qualia, subjective experiences (colors, smells, sounds, tactile experiences). Or you could say they’re somewhat similar to Gottfried Leibniz’s “monads”: simple substances without physical properties.

The objects I want to talk about are “places”: fantastical worlds or videogame levels. For example, fantastical worlds of Jacek Yerka.

## Details

“Detail” is like the smallest structural unit of a place. The smallest area where you could stand.

It’s like a square on the chessboard. But it doesn’t mean that any area of the place can be split into distinct “details”. The whole place is not like a chessboard.

This is a necessary concept. Without “details” there would be no places to begin with. Or those places wouldn’t have any comprehensible structure.

## Colors

“Details” are like cells. Cells make up different types of tissues. “Details” make up colors. You can compare colors to textures or materials.

(The places I’m talking about are not physical. So the example below is just an analogy.)

Imagine that you have small toys in the shape of 3D solids. You’re interested in their volume. They have very clear sides, you study their volume with simple formulas.

Then you think: what is the volume of the giant cloud behind my window? What is a “side” of a cloud? Do clouds even have “real” shapes? What would be the formula for the volume of a cloud, would it be the size of a book?

The volume of the cloud has a different color. Because the context around the “volume” changed completely. Because clouds are made of a different type of “tissue”. (compared to toys)

OK, we resolved one question, but our problems don’t end here. Now we encounter an object that looks like a mix between a cloud and a simple shape. Are we allowed to simplify it into a simple shape? Are we supposed to mix both volumes? In what proportions and in what way?

We need rules to interpret objects (rules to assign importance to different parts or “layers” of an object before mixing them into a single substance). We need rules to mix colors. We need rules to infer intermediate colors.

## Spectrum(s)

There are different spectrums. (Maybe they’re all parts of one giant spectrum. And maybe one of those spectrums contains our world.)

Often I imagine a spectrum as something similar to the visible spectrum: a simple order of places, from the first to the last.

A spectrum gives you the rules to interpret places and to create colors. How to make a spectrum?

1. You take a bunch of places. Make some loose assumptions about them. You assume where “details” in the places are and may be.

2. Based on the similarities between the places, you come up with the most important “colors” (“materials”) these places may be made of.

3. You come up with rules that tell you how to assign the colors to the places. Or how to modify the colors so that they fit the places.

The colors you came up with have an order:

• The farther you go in a spectrum, the more details dissolve. First you have distinct groups of details that create volume. Then you have “flat”/​stretched groups of details. Then you have “cloud-like” groups of details.

But those colors are not assigned to the places immediately. We’ve ordered abstract concepts, but haven’t ordered the specific places. Here’re some of the rules that allow you to assign the colors to the places:

• When you evaluate a place, the smaller-scale structures matter more. For example, if the the smaller-scale structure has a clear shape and the larger-scale structure doesn’t have a clear shape, the former structure matters more in defining the place.

• The opposite is true for “negative places”: the larger scale structures contribute more. I often split my spectrum into a “positive” part and a “negative” part. They are a little bit like positive and negative numbers.

You can call those “normalization principles”. But we need more.

## The principle of explosion/​vanishing

Two places with different enough detail patterns can’t have the same color. Because a color is the detail pattern.

One of the two places have to get a bigger or a smaller (by a magnitude) color. But this may lead to an “explosion” (the place becomes unbelievably big/​too distant from all the other places) or to a “vanishing” (the place becomes unbelievably microscopic/​too distant).

This is bad because you can’t allow so much uncertainty about the places’ positions. It’s also bad because it completely violates all of your initial assumptions about the places. You can’t allow infinite uncertainty.

When you have a very small amount of places in a spectrum, they have a lot of room to move around. You’re unsure about their positions. But when you have more places, due to the domino effect you may start getting “explosions” and “vanishings”. They will allow you to rule out wrong positions, wrong rankings.

## Overlay (superposition)

We also need a principle that would help us to sort places with the “same” color.

I feel it goes something like this:

• Take places with the same color. Let’s say this color is “groups of details that create volume”.

If the places have no secondary important colors mixed in:

1. Overlay (superimpose) those places over each other.

2. Ask: if I take a random piece of a volume, what’s the probability that this piece is from the place X? Sort the places by such probabilities.

If the places do have some secondary important colors mixed in:

1. Overlay (superimpose) those places over each other.

2. Ask: how hard is it to get from the place’s main color to the place’s secondary color? (Maybe mix and redistribute the secondary colors of the places.) Sort places by that.

For example, let’s say the secondary color is “groups of details that create a surface that covers the entire place” (the main one is “groups of details that create volume”). Then you ask: how hard is it to get from the volume to that surface?

Note: I feel it might be related to Homeostatic Property Clusters. I learned the concept from a Philosophy Tube video. It reminded me of “family resemblance” popularized by Ludwig Wittgenstein.

Note 2: https://​​imgur.com/​​a/​​F5Vq8tN. Some examples I’m going to write about later.

Thought: places by themselves are incomparable. They can be compared only inside of a spectrum.

## 3 cats (a slight tangent/​bonus)

Imagine a simple drawing of a cat. And a simple cat sculpture. And a real cat. Do they feel different?

If “yes”, then you experience a difference between various qualia. You feel some meta knowledge about qualia. You feel qualia “between” qualia.

You look at the same thing in different contexts. And so you look at 3 versions of it through 3 different lenses. If you looked at everything through the same lens, you would recognize only a single object.

If you understand what I’m talking about here, then you understand what I’m trying to describe about “colors”. Colors are different lenses, different contexts.