Discord: LemonUniverse (lemonuniverse). Reddit: u/Smack-works. Substack: The Lost Jockey. About my situation: here.
Q Home
Inward and outward steelmanning
Could you give a specific example/clarify a little bit more? Maybe there’s no conflict.
Reading the post I didn’t understand this:
Could evolution really build a consequentialist? The post itself kind of contradicts that.
Could a consequentialist really foresee all consequences without having any drives (such as curiosity)?
I think your critique about computational complexity is related to the 1st point.
Relationship between subjective experience and intelligence?
Even if my ideas are vague, shouldn’t rationality be applicable even at that stage? The idea of levels of intelligence (or hard intelligence ceilings) isn’t very specific either. “Are there unexpected/easy ways to get smarter?”, people should have some opinions about that even without my ideas. It’s safe to assume Eliezer doesn’t believe there’s an unknown way to get smarter (or that it’s easier to find such a way than to solve the Alignment problem).
My more specific hypotheses are related to guessing what such a way might be. But that’s not what you meant, I think.
In this post I described the information I use to reach the conclusion. I’m afraid I don’t know rationality good enough to make it more clear (or investigate if my belief is rational myself). So one of my later posts will likely be about some of my specific ideas.
About the g-factor. I can imagine a weak person who has an extremely strong leg. I would think that such a person isn’t “generally” strong. Because I already have an idea how a generally strong (and above average) person looks like.
But with IQ tests, I’m not starting from believing that they measure general intelligence. Maybe I don’t even have a good idea how a generally intelligent (and above average) person should look like. So the fact that there are in fact multiple different ways to break the correlation makes me doubt IQ more.
Thinking without priors?
Content generation. Where do we draw the line?
Just my emotions! And I had an argument about the value of artists behind the art (Can people value the source of the art? Is it likely that majority of people may value it?). Somewhat similar to Not for the Sake of Happiness (Alone). I decided to put the topic into a more global context (How long can you replace everything with AI content? What does it mean for the connection between people?). I’m very surprised that what I wrote was interesting for some people. What surprised you in my post?
I’m also interested in applying the idea of “prior knowledge” to values (or to argumentation, but not in a strictly probabilistic way). For example, maybe I don’t value (human) art that much, or very uncertain about how much I value it. But after considering some more global/fundamental questions (“prior values”, “prior questions”) I may decide that I actually value human art quite a lot in certain contexts. I’m still developing this idea.
I feel (e.g. when reading arguments why AGI “isn’t that scary”) that there’s not enough ways to describe disagreements. I hope to find a new way to show how and why people arrive at certain conclusions. In this post I tried to show “fundamental” reasons of my specific opinion (worrying about AI content generation). I also tried to do a similar thing in a post about Intelligence (I wanted to know if that type of thinking is rational or irrational).
(Drafts of a future post.)
Could you help me to formulate statistics with the properties I’m going to describe?
I want to share my way of seeing the world, analyzing information, my way of experiencing other people. (But it’s easier to talk about fantastical places and videogame levels, so I’m going to give examples with places/levels.)
If you want to read more about my motivation, check out “part 3”.
Part 1: Theory
I got only two main philosophical ideas. First idea is that a part/property of one object (e.g. “height”) may have a completely different meaning in a different object. Because in a different object it relates to and resonates with different things. By putting a part/property in a different context you can create a fundamentally different version of it. You can split any property/part into a spectrum. And you can combine all properties of an object into just a single one.
The second idea is that you can imagine that different objects are themselves like different parts of a single spectrum.
I want to give some examples of how a seemingly generic property can have a unique version for a specific object.
Example 1. Take a look at the “volume” of this place: (painting 1)
Because we’re inside of “something” (the forest), the volume of that “something” is equal to the volume of the whole place.
Because we have a lot of different objects (trees), we have the volume between those objects.
Because the trees are hollow we also have the volume inside of them.
Different nuances of the place reflect its volume in a completely unique way. It has a completely unique context for the property of “volume”.
Example 2. Take a look at “fatness” of this place: (painting 2)
The road doesn’t have too much buildings on itself: this amplifies “fatness”, because you get more earth per one small building.
The road is contrasted with the sea. The sea adds more size to the image (which indirectly emphasizes fatness).
Also because of the sea we understand that it’s not the whole world that is stretched: it’s just this fat road. We don’t look at this world through a one big distortion.
Different nuances of the place reflect its fatness in a completely unique way.
Example 3. Take a look at “height” of this place: (painting 3)
The place is floating somewhere. The building in the center has some height itself. It resonates with the overall height.
The place doesn’t have a ceiling and has a hole in the middle. It connects the place with the sky even more.
The wooden buildings are “light”, so it makes sense that they’re floating in the air.
...
I could go on about places forever. Each feels fundamentally different from all the rest.
And I want to know every single one. And I want to know where they are, I want a map with all those places on it.
Key philosophical principles
Here I describe the most important, the most general principles of my philosophy.
Objects exist only in context of each other, like colors in a spectrum. So objects are like “colors”, and the space of those objects is like a “spectrum”.
All properties of an object are connected/equivalent. Basically, an object has only 1 super property. This super property can be called “color”.
Colors differentiate all usual properties. For example, “blue height” and “red height” are 2 fundamentally different types of height. But “blue height” and “blue flatness” are the same property.
So, each color is like a world with its own rules. Different objects exist in different worlds.
The same properties have different “meaning” in different objects. A property is like a word that heavily depends on context. If the context is different, the meaning of the property is different too. There’s no single metric that would measure all of the objects. For example, if the property of the object is “height”, and you change any thing that’s connected to height or reflects height in any way—you fundamentally change what “height” means. Even if only by a small amount.
Note: different objects/colors are like qualia, subjective experiences (colors, smells, sounds, tactile experiences). Or you could say they’re somewhat similar to Gottfried Leibniz’s “monads”: simple substances without physical properties.
The objects I want to talk about are “places”: fantastical worlds or videogame levels. For example, fantastical worlds of Jacek Yerka.
Details
“Detail” is like the smallest structural unit of a place. The smallest area where you could stand.
It’s like a square on the chessboard. But it doesn’t mean that any area of the place can be split into distinct “details”. The whole place is not like a chessboard.
This is a necessary concept. Without “details” there would be no places to begin with. Or those places wouldn’t have any comprehensible structure.
Colors
“Details” are like cells. Cells make up different types of tissues. “Details” make up colors. You can compare colors to textures or materials.
(The places I’m talking about are not physical. So the example below is just an analogy.)
Imagine that you have small toys in the shape of 3D solids. You’re interested in their volume. They have very clear sides, you study their volume with simple formulas.
Then you think: what is the volume of the giant cloud behind my window? What is a “side” of a cloud? Do clouds even have “real” shapes? What would be the formula for the volume of a cloud, would it be the size of a book?
The volume of the cloud has a different color. Because the context around the “volume” changed completely. Because clouds are made of a different type of “tissue”. (compared to toys)
OK, we resolved one question, but our problems don’t end here. Now we encounter an object that looks like a mix between a cloud and a simple shape. Are we allowed to simplify it into a simple shape? Are we supposed to mix both volumes? In what proportions and in what way?
We need rules to interpret objects (rules to assign importance to different parts or “layers” of an object before mixing them into a single substance). We need rules to mix colors. We need rules to infer intermediate colors.
Spectrum(s)
There are different spectrums. (Maybe they’re all parts of one giant spectrum. And maybe one of those spectrums contains our world.)
Often I imagine a spectrum as something similar to the visible spectrum: a simple order of places, from the first to the last.
A spectrum gives you the rules to interpret places and to create colors. How to make a spectrum?
You take a bunch of places. Make some loose assumptions about them. You assume where “details” in the places are and may be.
Based on the similarities between the places, you come up with the most important “colors” (“materials”) these places may be made of.
You come up with rules that tell you how to assign the colors to the places. Or how to modify the colors so that they fit the places.
The colors you came up with have an order:
The farther you go in a spectrum, the more details dissolve. First you have distinct groups of details that create volume. Then you have “flat”/stretched groups of details. Then you have “cloud-like” groups of details.
But those colors are not assigned to the places immediately. We’ve ordered abstract concepts, but haven’t ordered the specific places. Here’re some of the rules that allow you to assign the colors to the places:
When you evaluate a place, the smaller-scale structures matter more. For example, if the the smaller-scale structure has a clear shape and the larger-scale structure doesn’t have a clear shape, the former structure matters more in defining the place.
The opposite is true for “negative places”: the larger scale structures contribute more. I often split my spectrum into a “positive” part and a “negative” part. They are a little bit like positive and negative numbers.
You can call those “normalization principles”. But we need more.
The principle of explosion/vanishing
Two places with different enough detail patterns can’t have the same color. Because a color is the detail pattern.
One of the two places have to get a bigger or a smaller (by a magnitude) color. But this may lead to an “explosion” (the place becomes unbelievably big/too distant from all the other places) or to a “vanishing” (the place becomes unbelievably microscopic/too distant).
This is bad because you can’t allow so much uncertainty about the places’ positions. It’s also bad because it completely violates all of your initial assumptions about the places. You can’t allow infinite uncertainty.
When you have a very small amount of places in a spectrum, they have a lot of room to move around. You’re unsure about their positions. But when you have more places, due to the domino effect you may start getting “explosions” and “vanishings”. They will allow you to rule out wrong positions, wrong rankings.
Overlay (superposition)
We also need a principle that would help us to sort places with the “same” color.
I feel it goes something like this:
Take places with the same color. Let’s say this color is “groups of details that create volume”.
If the places have no secondary important colors mixed in:
Overlay (superimpose) those places over each other.
Ask: if I take a random piece of a volume, what’s the probability that this piece is from the place X? Sort the places by such probabilities.
If the places do have some secondary important colors mixed in:
Overlay (superimpose) those places over each other.
Ask: how hard is it to get from the place’s main color to the place’s secondary color? (Maybe mix and redistribute the secondary colors of the places.) Sort places by that.
For example, let’s say the secondary color is “groups of details that create a surface that covers the entire place” (the main one is “groups of details that create volume”). Then you ask: how hard is it to get from the volume to that surface?
Note: I feel it might be related to Homeostatic Property Clusters. I learned the concept from a Philosophy Tube video. It reminded me of “family resemblance” popularized by Ludwig Wittgenstein.
Note 2: https://imgur.com/a/F5Vq8tN. Some examples I’m going to write about later.
Thought: places by themselves are incomparable. They can be compared only inside of a spectrum.
3 cats (a slight tangent/bonus)
Imagine a simple drawing of a cat. And a simple cat sculpture. And a real cat. Do they feel different?
If “yes”, then you experience a difference between various qualia. You feel some meta knowledge about qualia. You feel qualia “between” qualia.
You look at the same thing in different contexts. And so you look at 3 versions of it through 3 different lenses. If you looked at everything through the same lens, you would recognize only a single object.
If you understand what I’m talking about here, then you understand what I’m trying to describe about “colors”. Colors are different lenses, different contexts.
Part 2: Examples
Part 3: Motivation
I think my ideas may be important because they may lead to some new mathematical concepts.
Sometimes studying a simple idea or mechanic leads to a new mathematical concept which leads to completely unexpected applications.
For example, a simple toy with six sides (dice) may lead to saving people and major progress in science. Connecting points with lines (graphs) may lead to algorithms, data structures and new ways to find the optimal option or check/verify something.
Not any simple thing is guaranteed to lead to a new math concept. But I just want you to consider this possibility. And maybe ask questions answers to which could rise the probability of this possibility.
A new type of probability?
I think my ideas may be related to:
Probability and statistics.
Ways to describe vague things.
Ways to describe vague arguments or vague reasoning, thinking in context. For example arguments about “bodily autonomy”
Maybe those ideas describe a new type of probability:
You can compare classic probability to a pie made of a uniform and known dough. When you assign probabilities to outcomes and ideas you share the pie and you know what you’re sharing.
And in my idea you have a pie made of different types of dough (colors) and those types may change dynamically. You don’t know what you’re sharing when you share this pie.
This new type of probability is supposed to be applicable to things that have family resemblance, polyphyly or “cluster properties” (here’s an explanation of the latter in a Philosophy Tube video).
Blind men and an elephant
Imagine a world where people don’t know the concept of a “circle”. People do see round things, but can’t consciously pick out the property of roundness. (Any object has a lot of other properties.)
Some people say “the Moon is like a face”. Other say “the Moon is like a flower”. Weirder people say “the Moon is like a tree trunk” or “the Moon is like an embrace”. The weirdest people say “the Moon is like a day” or “the Moon is like going for a walk and returning back home”. Nobody agrees with each other, nobody understands each other.
Then one person comes up and says: “All of you are right. Opinions of everyone contain objective and useful information.”
People are shocked: at least someone has got to be wrong? If everyone is right, how can the information be objective and useful?
The concept of a “circle” is explained. Suddenly it’s extremely easy to understand each other. Like 2 and 2. And suddenly there’s nothing to argue about. People begin to share their knowledge and this knowledge finds completely unexpected applications.
https://en.wikipedia.org/wiki/Blind_men_and_an_elephant
The situation was just like in the story about blind men and an elephant, but even more ironic, since this time everyone was touching the same “shape”.
With my story I wanted to explain my opinions and goals:
I want to share my subjective experience.
I believe that it contains objective and important information.
I want to share a way to share subjective experience. I believe everyone’s experience contains objective and important information.
Meta subjective knowledge
If you can get knowledge from/about subjective experience itself, it means there exists some completely unexplored type of knowledge. I want to “prove” that there does exist such type of knowledge.
Such knowledge would be important because it would be a new fundamental type of knowledge.
And such knowledge may be the most abstract: if you have knowledge about subjective experience itself, you have knowledge that’s true for any being with subjective experience.
People
I’m amazed how different people are. If nothing else, just look at the faces: completely different proportions and shapes and flavors of emotions. And it seems like those proportions and shapes can’t be encountered anywhere else. They don’t feel exactly like geometrical shapes. They are so incredibly alien and incomprehensible, and yet so familiar. But… nobody cares. Nobody seems surprised or too interested, nobody notices how inadequate our concepts are at describing stuff like that. And this is just the faces, but there are also voices, ways to speak, characters… all different in ways I absolutely can’t comprehend/verbalize.
I believe that if we (people) were able to share the way we experience each other, it would change us. It would make us respect each other 10 times more, remember each other 10 times better, learn 10 times more from each other.
It pains me every day that I can’t share my experience of other people (accumulated over the years I thought about this). My memory about other people. I don’t have the concepts, the language for this. Can’t figure it out. This feels so unfair! All the more unfair that it doesn’t seem to bother anyone else.
This state of the world feels like a prison. This prison was created by specific injustices, but the wound grew deeper, cutting something fundamental. Vivid experiences of qualia (other people, fantastic worlds) feel like a small window out of this prison. But together we could crush the prison wall completely.
Statistics for vague concepts and “Colors” of places
I want to share a part of a conversation I had in order to explain my post better:
“Vague concepts”
A game is clearly defined in any context. What do you mean that it is impossible to understand outside of “any context”?
Sorry for not making it more clear, I was just referring to this idea:
https://en.wikipedia.org/wiki/Family_resemblance
It argues that things which could be thought to be connected by one essential common feature may in fact be connected by a series of overlapping similarities, where no one feature is common to all of the things. Games, which Wittgenstein used as an example to explain the notion, have become the paradigmatic example of a group that is related by family resemblances.
The idea is that when you consider a bunch of “games”, it’s easy to see the common features. But when you consider more and more “games” and things that are sometimes called “games”, it turns out that everything can be a game.
And yet no matter how you stretch the concept (e.g. say something like “love is just a game”), in a specific context the meaning is clear enough.
You can also call concepts like this “cluster properties” (explanation in Philosophy Tube video). Or even the (in)famous “social constructs”. In the text form:
Even more interestingly, Harris’ idea is an accidental ripoff of a theory developed by philosopher Richard Boyd in 1982, called: ‘The Homeostatic Cluster Property Theory of Metaethical Naturalism’ Sexy title. Boyd thought that words like ‘good’ and ‘evil’ refer to real properties out there in the material world, and that therefore statements like ‘Murder is bad’ are capable of being objectively true, or at least true in the same way as scientific statements are. Which prompts the question, “To what exactly do these words refer?”
Boyd’s answer is that they are cluster properties—groups of things that tend to go together. The example he uses is actually the same one Harris does—health. There are all kinds of things we would want to include in a definition of the word “healthy,” like your heart should be beating and you should be able to breathe, but do you have to be a certain size in order to be healthy? Do you have to not be in pain? Can you have a beating heart and be unhealthy? There’s a cluster of properties here somewhere that makes up the definition of the word health but we’re never going to pin down a definite list because that’s just not how the concept works. Despite that vagueness it’s still very obviously useful and meaningful.
Similarly Boyd thinks that a word like ‘good’ refers to a cluster of things that are non-morally good for humans, like sharing friendship, sharing love, having fun, watching quality YouTube videos, but just like with health, you’re never going to be able to pin down a full list because the concept just isn’t like that.
And here’s the big takeaway—if we say ‘John is healthy’ we could be talking about any number of things in the cluster of health—whether he a has disease, whether he works out, whether he has a good relationship with his mother—all of which are objective—but whether the sentence ‘John is healthy’ is true will still depend on what aspect of his health we’re talking about. It will be relative to the context in which we’re saying it.
...
So, I call clusters like this (games, health, goodness) “vague concept”: those concepts obtain specific meaning in a specific context, but they can’t be defined outside of context.
How to understand a vague concept? You can try to memorize all contexts (that you know of) in which it’s used. Or you can learn to infer its meaning in new contexts and learn to create new contexts for this concept yourself. This is what I meant by “creating new contexts”.
I feel that it’s related to hypotheses generation because some general (scientific) ideas/paradigms don’t have any meaning outside of context
You could imagine a hypothesis based on vague concepts, for example “healthy people earn more money than unhealthy people” or “people who love games earn more money”. In their most abstract form, those theories can’t be falsified. But it’s easy to generate specific falsifiable hypotheses based on those ideas.
Scientific theories, too, can have an unfalsifiable core. This is Imre Lakatos’ model of scientific progress:
https://en.wikipedia.org/wiki/Imre_Lakatos#Research_programmes
Lakatos’s second major contribution to the philosophy of science was his model of the “research programme”,[19] which he formulated in an attempt to resolve the perceived conflict between Popper’s falsificationism and the revolutionary structure of science described by Kuhn. Popper’s standard of falsificationism was widely taken to imply that a theory should be abandoned as soon as any evidence appears to challenge it, while Kuhn’s descriptions of scientific activity were taken to imply that science is most fruitful during periods in which popular, or “normal”, theories are supported despite known anomalies. Lakatos’ model of the research programme aims to combine Popper’s adherence to empirical validity with Kuhn’s appreciation for conventional consistency.
A Lakatosian research programme[20] is based on a hard core of theoretical assumptions that cannot be abandoned or altered without abandoning the programme altogether. More modest and specific theories that are formulated in order to explain evidence that threatens the “hard core” are termed auxiliary hypotheses. Auxiliary hypotheses are considered expendable by the adherents of the research programme—they may be altered or abandoned as empirical discoveries require in order to “protect” the “hard core”. Whereas Popper was generally read as hostile toward such ad hoc theoretical amendments, Lakatos argued that they can be progressive, i.e. productive, when they enhance the programme’s explanatory and/or predictive power, and that they are at least permissible until some better system of theories is devised and the research programme is replaced entirely.
Vague concepts lead to vague hypotheses (“research programmes”). Vague hypotheses work the same way vague concepts do. (part 1⁄2)
Properties, differences
What do you mean by “meaning” here? How does an attribute of size have inherent meaning?
It is absolutely unclear what you mean by this. What does “height” relate to and resonate with, and why does that change with object? What do you even mean by “relate and resonate”?
What do you mean by “part/property”? Something like “height”? How do you put “height” into a different context? “You can create a [...] different version of it”? What do you mean by “fundamentally different”? A version of what? Of “height”?
I tried to give 3 examples there (with paintings). But here’s a simpler example:
Imagine a cube and a tree. Think about their heights. Cube’s height has a different “meaning” because it’s the same thing as its width and length.
You may need to make a leap of faith/understanding here somewhere, it’s a new concept or perspective. I may try explaining it in different ways and analogies, but I can’t reduce this idea to simpler ideas.
For example, I could make an analogy with homology in biology:
https://en.wikipedia.org/wiki/Evolutionary_developmental_biology#The_control_of_body_structure
Roughly spherical eggs of different animals give rise to unique morphologies, from jellyfish to lobsters, butterflies to elephants. Many of these organisms share the same structural genes for body-building proteins like collagen and enzymes, but biologists had expected that each group of animals would have its own rules of development. The surprise of evo-devo is that the shaping of bodies is controlled by a rather small percentage of genes, and that these regulatory genes are ancient, shared by all animals. The giraffe does not have a gene for a long neck, any more than the elephant has a gene for a big body. Their bodies are patterned by a system of switching which causes development of different features to begin earlier or later, to occur in this or that part of the embryo, and to continue for more or less time.[7]
Those topics talk about the ways animals’ parts and properties get differentiated.
And you can combine all properties of an object into just a single one.
I tried to give 3 examples of this. It’s some type of holism: “you should view a part in the context of the whole”, “a whole is greater than the sum of its parts”.
I give this idea a fractal spin: “any part of a thing is equivalent to the whole”. The most similar philosophical idea I know of is Gottfried Leibniz’s Monadology, for example:
https://en.wikipedia.org/wiki/Monadology
(III) Composite substances or matter are “actually sub-divided without end” and have the properties of their infinitesimal parts (§65). A notorious passage (§67) explains that “each portion of matter can be conceived as like a garden full of plants, or like a pond full of fish. But each branch of a plant, each organ of an animal, each drop of its bodily fluids is also a similar garden or a similar pond”.
You can compare colors to monads and spectrums to the “supreme monad” (God).
So you are describing art theory! That is something learned in 10th grade art. Contrast /homo-/heterogenity of form, color etc.
I don’t think it’s art theory. Not 10th grade.
No idea what you are getting at. Why are you calling your new super property “color” when you are also discussing classical form and color? This makes confusing these terms incredibly likely.
I believe I don’t discuss classical “color”. I only mention it in a single analogy (and one more time when I mention qualia).
My goal
I guess you are talking about categorizing arbitrary qualia properties and their relations, but that is a matter of art theory. How do you even propose to objectively study something inherently subjective? It does seem that what you describe is covered by artists. Beyond that it is incredibly unclear what you are talking about.
I can explain my goal with a story. I didn’t include it in the post to not make it too big, but maybe I should have:
Blind men and an elephant
Imagine a world where people don’t know the concept of a “circle”. People do see round things, but can’t consciously pick out the property of roundness. (Any object has a lot of other properties.)
Some people say “the Moon is like a face”. Other say “the Moon is like a flower”. Weirder people say “the Moon is like a tree trunk” or “the Moon is like an embrace”. The weirdest people say “the Moon is like a day” or “the Moon is like going for a walk and returning back home”. Nobody agrees with each other, nobody understands each other.
Then one person comes up and says: “All of you are right. Opinions of everyone contain objective and useful information.”
People are shocked: at least someone has got to be wrong? If everyone is right, how can the information be objective and useful?
The concept of a “circle” is explained. Suddenly it’s extremely easy to understand each other. Like 2 and 2. And suddenly there’s nothing to argue about. People begin to share their knowledge and this knowledge finds completely unexpected applications.
https://en.wikipedia.org/wiki/Blind_men_and_an_elephant
The situation was just like in the story about blind men and an elephant, but even more ironic, since this time everyone was touching the same “shape”.
With my story I wanted to explain my opinions and goals:
I want to share my subjective experience.
I believe that it contains objective and important information.
I want to share a way to share subjective experiences. I believe everyone’s experience contains objective and important information.
(part 2⁄2)
Vague concepts, family resemblance and cluster properties
Sorry if I’ll dumb it down too much. I tried to come up with specific examples without terminology. That’s how I understand what you’re saying:
A vague concept can be compared to an agent (AI).
You can use vague concepts to train agents (AIs).
An agent can use a vague concept to define its field of competence.
Simple/absurd examples:
Let’s say we got a bunch movies. And N vague concepts such as “bad movie”, “funny movie” and etc. Each concept is an AI of sorts. Those concepts “discuss” the movies and train each other.
We got a vague concept, such as “health”. And some examples of people who can be healthy or not healthy. Different AIs discuss if a person is healthy or not and train each other.
Let’s say the vague concept is “games”. AI uses this concept to determine what is a game and what is not. Or “implications” of treating something as a game (see “Internal structure, “gradient”″).
This might be a bridge between machine learning and agent foundations that is itself related to alignment.
In this case, could you help me with the topic about “colors”? I wouldn’t write this post if I didn’t write about “colors”. So, this is evidence (?) that the topic about “colors” isn’t insane.
There a “place” is a vague concept. “Spectrum” is a specific context for the place. Meaning is a distribution of “details”. Learning is guessing the correct distribution of details (“color”) for a place in a given context.
This post about vague concepts in general is mostly meaningless for me too: I care about something more specific, “colors”. However, I think a text may be “meaningless” and yet very useful:
You thought about topics that are specific and meaningful for you. You came up with an overly general “meaningless” sketch (A).
I thought about topics that are specific and meaningful for me. I came up with an overly general “meaningless” post (B).
We recognized a similarity between our generalizations. This similarity is “meaningless” too.
Did we achieve anything? I think we could have. If one of us gets a specific insight, there’s a chance to translate this insight (from A to B, or from B to A).
So I think the use of “agent” in the first point I quoted is about adjudicators, in the second point both adjudicator and outer agent fit (but mean different things), and the third point is about the outer agent (how its goodhart scope relates to those of the adjudicators). (link)
I just tried to understand (without terminology) how my ideas about “vague concepts” could help to align an AI. Your post prompted me to think in this direction directly. And right now I see this possibility:
The most important part of my post is the idea that the specific meanings of a vague concept have an internal structure. (at least in specific circumstances) As if (it’s just an analogy) the vague concept is self-aware about its changes of meaning and reacts to those changes. You could try to use this “self-awareness” to align an AI, to teach it to respect important boundaries.
For example (it’s an awkward example) let’s say you want to teach an AI that interacting with a human is often not a game or it may be bad to treat it as a game. If AI understands that reducing the concept of “communication” to the concept of a “game” may bear some implications, you would be able to explain what reductions and implications are bad without giving AI complicated explicit rules.
(Another example) If AI has (or able to reach) an internal worldview in which “loving someone” and “making a paperclip” are fundamentally different things and not just a matter of arbitrary complicated definitions, then it may be easier to explain human values to it.
However this is all science fiction if we have no idea how to model concepts and ideas and their changes of meaning. But my post about colors, I believe, can give you ideas how to do this. I know:
Maybe it doesn’t have enough information for an (interesting) formalization.
Even if you make an interesting formalization, it won’t automatically solve alignment even in the best case scenario.
But it may give ideas, a new approach. I want to fight for this chance, both because of AI risk and because of very deep personal reasons.
(Drafts of a future post.)
My idea:
Every concept (or even random mishmash of ideas) has multiple versions. Those versions have internal relationships, positions in some space relative to each other. Those relationships are “infinitely complex”. But there’s a way to make drastic simplifications of those relationships. We can study the overall (“infinitely complex”) structure of the relationships by studying those simplifications. What do those simplifications do, in general? They put “costs” on versions of a concept.
We can understand how we think if we study our concepts (including values) through such simplifications. It doesn’t matter what concepts we study at all. Anything goes, we just need to choose something convenient. Something objective enough to put numbers on it and come up with models.
Once we’re able to model human concepts this way, we’re able to model human thinking (AGI) and human values (AI Alignment) and improve human thinking.
Context
1.1 Properties of Qualia
There’s the hard problem of consciousness: how is subjective experience created from physical stuff? (Or where does it come from?)
But I’m interested in a more specific question:
Does qualia have properties? What are they?
For example, “How do qualia change? How many different qualia can be created?” or “Do qualia form something akin to a mathematical space, e.g. a vector space? What is this space exactly?”
Is there any knowledge contained in the experience itself, not merely associated with it?1 For example, “cold weather can cause cold (disease)” is a fact associated with experience, but isn’t very fundamental to the experience itself. And this “fact” is even false, it’s a misconception/coincidence.
When you get to know the personality of your friend, do you learn anything “fundamental” or really interesting by itself? Is “loving someone” a fundamentally different experience compared to “eating pizza” or “watching a complicated movie”?
Those questions feel pretty damn important to me! They’re about limitations of your meaningful experience and meaningful knowledge. They’re about personalities of people you know or could know. How many personalities can you differentiate? How “important/fundamental” are those differences? And finally… those questions are about your values.
Those questions are important for Fun Theory. But they’re way more important/fundamental than Fun Theory.
1 Philosophical context for this question: look up Immanuel Kant’s idea of “synthetic a priori” propositions.
1.2 Qualia and morality
And those questions are important for AI Alignment. If AI can “feel” that loving a sentient being and making a useless paperclip are 2 fundamentally different things, then it might be way easier to explain our values to that AI. By the way, I’m not implying that AI has to have qualia, I’m saying that our qualia can hint us towards the right model.
I think this observation gets a little bit glossed over: if you have a human brain and only care about paperclips… it’s (kind of) still objectively true for you that caring about other people would feel way different, way “bigger” and etc. You can pretend to escape morality, but you can’t escape your brain.
It’s extremely banal out of context, but the landscape of our experiences and concepts may shape the landscape of our values. Modeling our values as arbitrary utility functions (or artifacts of evolution) misses that completely.
2.1 Mystery Boxes
Box A
There’s a mystery Box A. Each day you find a random object inside of it. For example: a ball, a flower, a coin, a wheel, a stick, a tissue...
Box B
There’s also another box, the mystery Box B. One day you find a flower there. Another day you find a knife. The next day you find a toy. Next—a gun. Next—a hat. Next—shark’s jaws...
...
How to understand the boxes? If you could obtain all items from both boxes, you would find… that those items are exactly the same. They just appear in a different order, that’s all.
I think the simplest way to understand Box B is this: you need to approach it with a bias, with a “goal”. For example “things may be dangerous, things may cause negative emotions”. In its most general form, this idea is unfalsifiable and may work as a self-fulfilling prophecy. But this general idea may lead to specific hypotheses, to estimating specific probabilities. This idea may just save your life if someone is coming after you and you need to defend yourself.
Content of both boxes changes in arbitrary ways. But content change of the second box comes with an emotional cost.
There’re many many other boxes, understanding them requires more nuanced biases and goals.
I think those boxes symbolize concepts (e.g. words) and the way humans understand them. I think a human understands a concept by assigning “costs” to its changes of meaning. “Costs” come from various emotions and goals.
“Costs” are convenient: if any change of meaning has a cost, then you don’t need to restrict the meaning of a concept. If a change has a cost, then it’s meaningful regardless of its predictability.
2.2 More Boxes
More examples of mystery boxes:
First box may alternate positive and negative items.
Second box may alternate positive, directly negative and indirectly negative items. For example, it may show you a knife (directly negative) and then a bone (indirectly negative: a “bone” may be a consequence of the “knife”).
Third box may alternate positive, negative and “subverted” items. For example, it may show you a seashell (positive), and then show you shark’s jaws (negative). But both sharks and seashells have a common theme, so “seashell (positive)” got subverted.
Fourth box may alternate negative items and items that “neutralize” negative things. For example, it may show you a sword, but then show you a shield.
Fifth box may show you that every negative thing has many related positive things.
You can imagine a “meta box”, for example a box that alternates between being the 1st box and the 2nd box. Meta boxes can “change their mood”.
I think, in a weird way, all those boxes are very similar to human concepts and words.
The more emotions, goals and biases you learn, the easier it gets for you to understand new boxes. But those “emotions, goals, biases” are themselves like boxes.
Yes, I think it’s related. Added “aesthetics” as a tag.