Discord: LemonUniverse (lemonuniverse). Reddit: u/Smack-works. Substack: The Lost Jockey. About my situation: here.
Q Home
or present their framework for enumerating the set of all possible qualitative experiences (Including the ones not experienced by humans naturally, and/or only accessible via narcotics, and/or involve senses humans do not have or have just happened not to be produced in the animal kingdom)
Strongly agree. If you want to explain qualia, explain how to create experiences, explain how each experience relates to all other experiences.
I think Eliezer should’ve talked more about this in The Fun Theory Sequence. Because properties of qualia is a more fundamental topic than “fun”.
And I believe that knowledge about qualia may be one of the most fundamental types of knowledge. I.e. potentially more fundamental than math and physics.
We only censor other people more-independent-minded than ourselves. (...) Independent-minded people do not censor conventional-minded people.
I’m not sure that’s true. Not sure I can interpret the “independent/dependent” distinction.
In “weirdos/normies” case, a weirdo can want to censor ideas of normies. For example, some weirdos in my country want to censor LGBTQ+ stuff. They already do.
In “critical thinkers/uncritical thinkers” case, people with more critical thinking may want to censor uncritical thinkers. (I believe so.) For example, LW in particular has a couple of ways to censor someone, direct and indirect.
In general, I like your approach of writing this post like an “informal theorem”.
Thank you for the answer, clarifies your opinion a lot!
Artistic expression, of course, is something very different. I’m definitely going to keep making art in my spare time for the rest of my life, for the sake of fun and because there are ideas I really want to get out. That’s not threatened at all by AI.
I think there are some threats, at least hypothetical. For example, the “spam attack”. People see that a painter starts to explore some very niche topic — and thousands of people start to generate thousands of paintings about the same very niche topic. And the very niche topic gets “pruned” in a matter of days, long before the painter has said at least 30% of what they have to say. The painter has to fade into obscurity or radically reinvent themselves after every couple of paintings. (Pre-AI the “spam attack” is not really possible even if you have zero copyright laws.)
In general, I believe for culture to exist we need to respect the idea “there’s a certain kind of output I can get only from a certain person, even if it means waiting or not having every single of my desires fulfilled” in some way. For example, maybe you shouldn’t use AI to “steal” a face of an actor and make them play whatever you want.
Do you think that unethical ways to produce content exist at least in principle? Would you consider any boundary for content production, codified or not, to be a zero-sum competition?
Could you explain your attitudes towards art and art culture more in depth and explain how exactly your opinions on AI art follow from those attitudes? For example, how much do you enjoy making art and how conditional is that enjoyment? How much do you care about self-expression, in what way? I’m asking because this analogy jumped out at me as a little suspicious:
And as terrible as this could be for my career, spending my life working in a job that could be automated but isn’t would be as soul-crushing as being paid to dig holes and fill them in again. It would be an insultingly transparent facsimile of useful work.
But creative work is not mechanical work, it can’t be automated that way, AI doesn’t replace you that way. AI doesn’t have the model of your brain, it can’t make the choices you would make. It replaces you by making something cheaper and on the same level of “quality”. It doesn’t automate your self-expression. If you care about self-expression, the possibility of AI doesn’t have to feel soul-crushing.
I apologize for sounding confrontational. You’re free to disagree with everything above. I just wanted to show that the question has a lot of potential nuances.
To me, the initial poll options make no sense without each other. For example, “avoid danger” and “communicate beliefs” don’t make sense without each other [in context of society].
If people can’t communicate (report epistemic state), “avoid danger” may not help or be based on 100% biased opinions on what’s dangerous.
If some people solve Alignment, but don’t communicate, humanity may perish due to not building a safe AGI.
If nobody solves Alignment, but nobody communicates about Alignment, humanity may perish because careless actors build an unsafe AGI without even knowing they do something dangerous.
I like communication, so I chose the second option. Even though “communicating without avoiding danger” doesn’t make sense either.
Since the poll options didn’t make much sense to me, I didn’t see myself as “facing alien values” or “fighting off babyeaters”. I didn’t press the link, because I thought it may “blow up” the site (similar to the previous Petrov’s Day) + I wasn’t sure it’s OK to click, I didn’t think my unilateralism would be analogous to Petrov’s unilateralism (did Petrov cure anyone’s values, by the way?). I decided it’s more Petrov-like to not click.
But is AGI (or anything else) related to the lessons of Petrov’s Day? That’s another can of worms. I think we should update the lessons of the past to fit the future situations. I think it doesn’t make much sense to take away from Petrov’s Day only lessons about “how to deal with launching nukes”.
Another consideration: Petrov did accurately report his epistemic state. Or would have, if it were needed (if it were needed, he would lie to accurately report his epistemic state—“there are no launches”). Or “he accurately non-reported the non-presence of nuclear missiles”.
If you have a flexible enough representation then you can use it to represent anything, unfortunately you’ve also gutted it of predictive power (vs post hoc explanation).
I think this can be wrong:
“Y” and “D” are not empty symbols, they come with an objective enough metric (the metric of “general importance”). So, it’s like saying that “A” and “B” in the Bayes’ theorem are empty symbols without predictive power. And I believe the analogy with Bayes’ theorem is not accidental, by the way, because I think you could turn my idea into a probabilistic inference rule.
If my method can’t help to predict good ideas, it still can have predictive power if it evaluates good ideas correctly (before they get universally recognized as good). Not every important idea is immediately recognized as important.
Can you expand on the connection with Leverage Points? Seems like 12 Leverage Points is an extremely specific and complicated idea (doesn’t mean it can’t be good in its own field, though).
Sorry, I meant using motivated cognition as a norm itself. Using motivated cognition for evaluating hypotheses. I.e. I mean what people usually mean by motivated cognition, “you believe in this (hypothesis) because it sounds nice”.
Here’s why I think that motivated cognition (MC) is more epistemically interesting/plausible than people think:
When you’re solving a problem A, it may be useful to imagine the perfect solution. But in order to imagine the perfect solution for the problem A you may need to imagine such solutions for the problems B, C, D etc. … if you never evaluate facts and hypotheses emotionally, you may not even be able to imagine what the “perfect solution” is.
MC may be a challenge: often it’s not obvious what’s the best possibility is. And the best possibilities may look weird.
Usual arguments against MC (e.g. “the universe doesn’t care about your feelings”, “you should base your opinions on your knowledge about the universe”) may be wrong. Because feelings may be based on the knowledge about reality.
Modeling people (even rationalists) as using different types of MC may simplify their arguments and opinions.
MC in the form of ideological reasoning is, in a way, the only epistemology known to us. Bayesianism is cool, but on some important level of reality it’s not really an epistemology (in my opinion), i.e. it’s hard/impossible to use and it doesn’t actually model thinking and argumentation.
If you want we can discuss those or other points in more detail.
Would you like to discuss a stronger claim, that motivated cognition may be a good epistemology?
Usually people use “logical reasoning + facts”. Maybe we can use “motivated reasoning + facts”. I.e. seek a balance between desirability and plausibility of a hypothesis.
The same logic may not apply to experience: if you want to explain subjective experience, you need to explain any possible subjective experience, any experience that could be constructed, observed or not.
Here’s lc’s opinion:
It is both absurd, and intolerably infuriating, just how many people on this forum think it’s acceptable to claim they have figured out how qualia/consciousness works, and also not explain how one would go about making my laptop experience an emotion like ‘nostalgia’, or present their framework for enumerating the set of all possible qualitative experiences.
(Including the ones not experienced by humans naturally, and/or only accessible via narcotics, and/or involve senses humans do not have or have just happened not to be produced in the animal kingdom)
For some time I wanted to apply the idea of probabilistic thinking (used for predicting things) to describing things, making analogies between things. This is important because your hypotheses (predictions) depend on the way you see the world. If you could combine predicting and describing into a single process, you would unify cognition.
Fuzzy logic and fuzzy sets is one way to do it. The idea is that something can be partially true (e.g. “humans are ethical” is somewhat true) or partially belong to a class (e.g. a dog is somewhat like a human, but not 100%). Note that “fuzzy” and “probable” are different concepts. But fuzzy logic isn’t enough to unify predicting and describing. Because it doesn’t tell us much about how we should/could describe the world. No new ideas.
I have a different principle for unifying probability and description. Here it is:
Properties of objects aren’t contained in specific objects. Instead, there’s a common pool that contains all possible properties. Objects take their properties from this pool. But the pool isn’t infinite. If one object takes 80% of a certain property from the pool, other objects can take only 20% of that property (e.g. “height”). Socialism for properties: it’s not your “height”, it’s our “height”.
How can an object “take away” properties of other objects? For example, how can a tall object “steal” height from other objects? Well, imagine there are multiple interpretations of each object. Interpretation of one object affects interpretation of all other objects. It’s just a weird axiom. Like a Non-Euclidean geometry.
This sounds strange, but this connects probability and description. And this is new. I think this principle can be used in classification and argumentation. Before showing how to use it I want to explain it a little bit more with some analogies.
Connected houses
Imagine two houses, A and B. Those houses are connected in a specific way.
When one house turns on the light at 80%, the other turns on the light only at 20%.
When one house uses 60% of the heat, the other uses only 40% of the heat.
(When one house turns on the red light, the other turns on the blue light. When one house is burning, the other is freezing.)
Those houses take electricity and heat from a common pool. And this pool doesn’t have infinite energy.
Kindness
Usually people think about qualities as something binary: you either has it or not. For example, a person can be either kind or not.
For me an abstract property such as “kindness” is like the white light. Different people have different colors of “kindness” (blue kindness, green kindness...). Every person has kindness of some color. But nobody has all colors of kindness.
Abstract kindness is the common pool (of all ways to express it). Different people take different parts of that pool.
Some more analogies
Theism analogy. You can compare the common pool of properties to the “God object”, a perfect object. All other objects are just different parts of the perfect object. You also can check out Monadology by Gottfried Leibniz.
Spectrum analogy. You can compare the common pool of properties to the spectrum of colors. Objects are just colors of a single spectrum.
Ethics analogy. Imagine that all your good qualities also belong (to a degree) to all other people. And all bad qualities of other people also belong (to a degree) to you. As if people take their qualities from a single common pool.
Buddhism analogy. Imagine that all your desires and urges come (to a degree) from all other people. And desires and urges of all other people come (to a degree) from you. There’s a single common pool of desire. This is somewhat similar to karma. In rationality there’s also a concept of “values handshakes”: when different beings decide to share each other’s values.
Quantum analogy. See quantum entanglement. When particles become entangled, they take their properties from a single common pool (quantum state).
Fractal analogy. “All objects in the Universe are just different versions of a single object.”
Subdivision analogy. Check out Finite subdivision rule. You can compare the initial polygone to the common pool of properties. And different objects are just pieces of that polygone.
Connection with recursion
Recursion. If objects take their properties from the common pool, it means they don’t really have (separate) identities. It also means that a property (X) of an object is described in terms of all other objects. So, the property (X) is recursive, it calls itself to define itself.
For example, imagine we have objects A, B and C. We want to know their heights. In order to do this we may need to evaluate those functions:
A(height), B(height), C(height)
A(B(height)), A(C(height)) …
A(B(C(height))), A(C(B(height))) …
A priori assumptions about objects should allow us to simplify this and avoid cycles.
Fractals. See Coastline paradox. You can treat a fractal as an object with multiple interpretations (where an interpretation depends on the scale). Objects taking their properties from the common pool = fractals taking different scales from the common range.
Classification
To explain how to classify objects using my principle, I need to explain how to order them with it.
I’ll explain it using fantastical places and videogame levels, because those things are formal and objective enough (they are 3D shapes). But I believe the same classification method can be applied to any objects, concepts and even experiences.
Basically, this is an unusual model of contextual thinking. If we can formalize this specific type of contextual thinking, then maybe we can formalize contextual thinking in general. This topic will sound very esoteric, but it’s the direct application of the principle explained above.
Intro
(I interpret paintings as “real places”: something that can be modeled as a 3D shape. If a painting is surreal, I simplify it a bit in my mind.)
Take a look at those places: image.
Let’s compare 2 of them: image. Let’s say we want to know the “height” of those places. We don’t have a universal scale to compare the places. Different interpretations of the height are possible.
If we’re calling a place “very tall”—we need to understand the epithet “very tall” in probabilistic terms, such as “70-90% tall”—and we need to imagine that this probability is taken away from all other places. We can’t have two different “very tall” places. Probability should add up to 100%.
Now take a look at another place (A): image (I ignore the cosmos to simplify it). Let’s say we want to know how enclosed it is. In one interpretation, it is massively enclosed by trees. In another interpretation, trees are just a decorative detail and can be ignored. Let’s add some more places for context: image. They are definitely more open than the initial place, so we should update towards more enclosed interpretation of (A). All interpretations should be correlated and “compatible”. It’s as if we’re solving a puzzle.
You can say that properties of places are “expandable”. Any place contains a seed of any possible property and that seed can be expanded by a context. “Very tall place” may mean Mt. Everest or a molehill depending on context. You can compare it to a fractal: every small piece of a fractal can be expanded into the entire thing. And I think it’s also very similar to how human language, human concepts work.
You also may call it “amplification of evidence”: any smallest piece of evidence (or even absence of any evidence) can be expanded into very strong evidence by context. We have a situation like in the Raven paradox, but even worse.
Rob Gonsalves
(I interpret paintings as “real” places.)
Places in random order: image.
My ordering of places: image.
I used 2 metrics to evaluate the places:
Is the space of the place “box-like” and small or not?
Is the place enclosed or open?
The places go from “box-like and enclosed” to “not box-like and open” in my ordering.
But to see this you need to look at the places in a certain way, reason about them in a certain way:
Place 1 is smaller than it seems. Because Place 5 is similar and “takes away” its size.
Place 2 is more box-like than it seems. Because similar places 4 and 6 are less box-like.
Place 3 is more enclosed than it seems. Because similar places 4 and 6 “take away” its openness.
Place 5 is more open than it seems. Because similar places 1 and 2 “take away” its closedness.
Almost any property of any specific place can be “illusory”. But when you look at places in the context you can deduce their properties vie the process of elimination.
Hello! I’ve heard you can ask people about your content in the open thread. Sorry if I’m asking too soon.
Could you help me to explain this (“Should AI learn human values, human norms or something else?”) idea better? It’s a 3 minute read.
I also would like to discuss with somebody those thought experiments. Not in a 100% formal way.
I checked out some of your posts (haven’t read 100% of them): Learning Normativity: A Research Agenda and Non-Consequentialist Cooperation?
You draw a distinction between human values and human norms. For example, an AI can respect someone’s autonomy before the AI gets to know their values and the exact amount of autonomy they want.
I draw the same distinction, but more abstract. It’s a distinction between human values and properties of any system/task. AI can respect keeping some properties of its reward systems intact before it gets to know human values.
I think even in very simple games an AI could learn important properties of systems. Which would significantly help the AI to respect human values.
My points about complexity still stand:
Such things as Impact Measures still require “system level” thinking.
Recognizing/learning properties of pathological systems may be easier than perfectly learning human values (without learning to be a deceptive manipulator).
I don’t think that “act level reasoning” and “system level reasoning” is a meaningful distinction. I think it’s the same thing. Humans need to do it anyway. And AI would need to do it anyway. I just suggested making such reasoning fundamental.
The example of deconstructing and constructing, building deep in a city has the characteristics of taking down an old building and building up a newer one in its valuable land area. The judgement of categorising demolising as a method of construction as part of a pathological system would then seem to be a false negative. If we can make good quality judgements and bad quality judgements like these, what basis do we have to think that the judgement on the system is leading as forward rather than leading us astray?
Different tasks may assume different types of “systems”. You can specify the type of task you’re asking or teach the AI to determine it/ask human if there’s an ambiguity.
“Turning a worse thing into a better thing” is generally a way better idea than “breaking and fixing a thing without making it better”. It’s true for a lot of tasks, both instrumental and terminal.
The point about desserted island is that “money systems” have an area of applicability and there are things outside of that.
“Money systems” is just a metaphor. And this metaphor is still applicable here. I mean, I used exactly the same example in the post: “if you’re sealed in a basement with a lot of money they’re not worth anything”.
What general conclusions about my idea do you want to reach? I think it’s important for the arguments. For example, if you want to say that my idea may have problems, then of course I agree. If you want to say that my idea is worse than all other ideas and shouldn’t be considered, then I disagree.
I agree with this summary. The idea’s that any human-like thinking (or experience) looks similar on all levels/in all domains and it’s simple enough (open to introspection). If it’s true, then there’s some easy way to understand human values.
If nothing happens in the discussion of this post, my next post may be about a way to analyze human values (using the idea of “merging biases”). It will be way shorter. It will be inspired by Bodily autonomy, but my example won’t be political.
I don’t have a model. The point of my idea is to narrow down what model is needed (and where/how we can easily find it). The point of math language (“acasual trade” and “decision trees”) is the same.
Everything mentioned in the post is like a container. Container may not model what’s inside of it at all, but it limits the amount of places we need to check out (in order to find what we want). If we don’t easily find what we wanted by looking into the container (and a little bit around it), then my idea is useless.
Can anything besides useful math change your opinion in any way? I saw your post (Models Modeling Models, 1. Meanings of words):
When I say “I like dancing,” this is a different use of the word ‘like,’ backed by a different model of myself, than when I say “I like tasting sugar.” The model that comes to mind for dancing treats it as one of the chunks of my day, like “playing computer games” or “taking the bus.” I can know what state I’m in (the inference function of the model) based on seeing and hearing short scenes. Meanwhile, my model that has the taste of sugar in it has states like “feeling sandpaper” or “stretching my back.” States are more like short-term sensations, and the described world is tightly focused on my body and the things touching it.
I think my theory talks about the same things, but more and deeper. I want to try to prove that you can’t rationally prefer your theory to mine.
I’m bad at math. But I know a topic where you could formulate my ideas using math. I could try to formulate them mathematically with someone’s help.
I can give a very abstract example. It’s probably oversimplified (in a wrong way) and bad, but here it is:
You got three sets, A {9, 1} and B {5, −3} and C {4, 4}. You want to learn something about the sets. Or maybe you want to explain why they’re ordered A > C > B in your data. You make orders of those sets using some (arbitrary) rules. For example:
A {9} > B {5} > C {4}. This order is based on choosing the largest element.
A {10} > C {8} > B {2}. This order is based on adding elements.
A {10} > C {8} > B {5}. This order is based on this: you add the elements if the number grows bigger, you choose the largest element otherwise. It’s a merge of the previous 2 orders.
If you want to predict A > C > B, you also may order the orders above:
(2) > (3) > (1). This order is based on predictive power (mostly) and complexity.
(2) > (1) > (3). This order is based on predictive power and complexity (complexity gives a bigger penalty).
(3) > (2) > (1). This order is based on how large the numbers in the orders are.
This example is likely useless out of context. But you read the post: so, if there’s something you haven’t understood just because it was confusing without numbers, then this example should clarify something to you. For example, it may clarify what my post misses to be understandable/open to specific feedback.
If you’d like to get some more concrete feedback from the community here, I’d recommend phrasing your ideas more precisely by using some common mathematical terminology, e.g. talking about sets, sequences, etc.
“No math, no feedback” if this is an irrational requirement it’s gonna put people at risk. Do you think there isn’t any other way to share/evaluate ideas? For example, here’re some notions:
On some level our thoughts do consist of biases. See “synaptic weight”. My idea says that “biases” exist on (almost) all levels of thinking and those biases are simple enough/interpretable enough. Also it says that some “high-level thinking” or “high-level knowledge” can be modeled by simple enough biases.
You could compare my theory to other theories. To Shard Theory, for example. I mean, just to make a “map” of all theories: where each theory lies relative to the others. Shard Theory says that value formation happens through complex enough negotiation games between complex enough objects (shards). My theory says that all cognition happens because of a simpler process between simpler objects.
I think it would be simply irrational to abstain from having any opinions about those notions. Do you believe there’s something simpler (and more powerful) than Shard Theory? Do you believe that human thinking and concepts are intrinsically complex and (usually) impossible to simplify? Etc.
A rational thing would be to say your opinions about this and say what could affect those opinions. You already said about math, but there should be some other things too. Simply hearing some possibilities you haven’t considered (even without math) should have at least a small effect on your estimates.
I’m noticing two things:
It’s suspicious to me that values of humans-who-like-paperclips are inherently tied to acquiring an unlimited amount of resources (no matter in which way). Maybe I don’t treat such values as 100% innocent, so I’m OK keeping them in check. Though we can come up with thought experiments where the urge to get more resources is justified by something. Like, maybe instead of producing paperclips those people want to calculate Busy Beaver numbers, so they want more and more computronium for that.
How consensual were the trades if their outcome is predictable and other groups of people don’t agree with the outcome? Looks like coercion.
Thus, it doesn’t matter in the least if it stifles human output, because the overwhelming majority of us who don’t rely on our artistic talent to make a living will benefit from a post-scarcity situation for good art, as customized and niche as we care to demand.
How do you know that? Art is one of the biggest outlets of human potential; one of the biggest forces behind human culture and human communities; one of the biggest communication channels between people.
One doesn’t need to be a professional artist to care about all that.
A stupid question about anthropics and [logical] decision theories. Could we “disprove” some types of anthropic reasoning based on [logical] consistency? I struggle with math, so please keep the replies relatively simple.
Imagine 100 versions of me, I’m one of them. We’re all egoists, each one of us doesn’t care about the others.
We’re in isolated rooms, each room has a drink. 90 drinks are rewards, 10 drink are punishments. Everyone is given the choice to drink or not to drink.
The setup is iterated (with memory erasure), everyone gets the same type of drink each time. If you got the reward, you get the reward each time. Only you can’t remember that.
If I reason myself into drinking (reasoning that I have a 90% chance of reward), from the outside it would look as if 10 egoists have agreed (very conveniently, to the benefit of others) to suffer again and again… is it a consistent possibility?
(draft of a future post)
I want to share my model of intelligence and research. You won’t agree with it at the first glance. Or at the third glance. (My hope is that you will just give up and agree at the 20th glance.)
But that’s supposed to be good: it means the model is original and brave enough to make risky statements.
In this model any difference in “intelligence levels” or any difference between two minds in general boils down to “commitment level”.
What is “commitment”?
On some level, “commitment” is just a word. It’s not needed to define the ideas I’m going to talk about. What’s much more important is the three levels of commitment. There are often three levels which follow the same pattern, the same outline:
Level 1. You explore a single possibility.
Level 2. You want to explore all possibilities. But you are paralyzed by the amount of possibilities. At this level you are interested in qualities of possibilities. You classify possibilities and types of possibilities.
Level 3. You explore all possibilities through a single possibility. At this level you are interested in dynamics of moving through the possibility space. You classify implications of possibilities.
...
I’m going to give specific examples of the pattern above. This post is kind of repetitive, but it wasn’t AI-generated, I swear. Repetition is a part of commitment.
Why is commitment important?
My explanation won’t be clear before you read the post, but here it goes:
Commitment describes your values and the “level” of your intentionality.
Commitment describes your level of intelligence (in a particular topic). Compared to yourself (your potential) or other people.
Commitments are needed for communication. Without shared commitments it’s impossible for two people to find a common ground.
Commitment describes the “true content” of an argument, an idea, a philosophy. Ultimately, any property of a mind boils down to “commitments”.
Basics
1. Commitment to exploration
I think there are three levels of commitment to exploration.
Level 1. You treat things as immediate means to an end.
Imagine two enemy caveman teleported into a laboratory. They try to use whatever they find to beat each other. Without studying/exploring what they’re using. So, they are just throwing microscopes and beakers at each other. They throw anti-matter guns at each other without even activating them.
Level 2. You explore things for the sake of it.
Think about mathematicians. They can explore math without any goal.
Level 3. You use particular goals to guide your exploration of things. Even though you would care about exploring them without any goal anyway. The exploration space is just too large, so you use particular goals to narrow it down.
Imagine a physicist who explores mathematics by considering imaginary universes and applying physical intuition to discover deep mathematical facts. Such person uses a particular goal/bias to guide “pure exploration”. (inspired by Edward Witten, see Michael Atiyah’s quote)
More examples
In terms of exploring ideas, our culture is at the level 1 (angry caveman). We understand ideas only as “ideas of getting something (immediately)” or “ideas of proving something (immediately)”. We are not interested in exploring ideas for the sake of it. The only metrics we apply to ideas are “(immediate) usefulness” and “trueness”. Not “beauty”, “originality” and “importance”. People in general are at the level 1. Philosophers are at the level 1 or “1.5″. Rationality community is at the level 1 too (sadly): rationalists still mostly care only about immediate usefulness and truth.
In terms of exploring argumentation and reasoning, our culture is at the level 1. If you never thought “stupid arguments don’t exist”, then you are at the level 1: you haven’t explored arguments and reasoning for the sake of it, you immediately jumped to assuming “The Only True Way To Reason” (be it your intuition, scientific method, particular ideology or Bayesian epistemology). You haven’t stepped outside of your perspective a single time. Almost everyone is at the level 1. Eliezer Yudkowsky is at the level 3, but in a much narrower field: Yudkowsky explored rationality with the specific goal/bias of AI safety. However, overall Eliezer is at level 1 too: never studied human reasoning outside of what he thinks is “correct”.
I think this is kind of bad. We are at the level 1 in the main departments of human intelligence and human culture. Two levels below our true potential.
2. Commitment to goals
I think there are three levels of commitment to goals.
Level 1. You have a specific selfish goal.
“I want to get a lot of money” or “I want to save my friends” or “I want to make a ton of paperclips”, for example.
Level 2. You have an abstract goal. But this goal doesn’t imply much interaction with the real world.
“I want to maximize everyone’s happiness” or “I want to prevent (X) disaster”, for example. This is a broad goal, but it doesn’t imply actually learning and caring about anyone’s desires (until the very end). Rationalists are at this level of commitment.
Level 3. You use particular goals to guide your abstract goals.
Some political activists are at this level of commitment. (But please, don’t bring CW topics here!)
3. Commitment to updating
“Commitment to updating” is the ability to re-start your exploration from the square one. I think there are three levels to it.
Level 1. No updating. You never change ideas.
You just keep piling up your ideas into a single paradigm your entire life. You turn beautiful ideas into ugly ones so they fit with all your previous ideas.
Level 2. Updating. You change ideas.
When you encounter a new beautiful idea, you are ready to reformulate your previous knowledge around the new idea.
Level 3. Updating with “check points”. You change ideas, but you use old ideas to prime new ones.
When you explore an idea, you mark some “check points” which you reached with that idea. When you ditch the idea for a new one, you still keep in mind the check points you marked. And use them to explore the new idea faster.
Science
4.1 Commitment and theory-building
I think there are three levels of commitment in theory-building.
Level 1.
You build your theory using only “almost facts”. I.e. you come up with “trivial” theories which are almost indistinguishable from the things we already know.
Level 2.
You build your theory on speculations. You “fantasize” important properties of your idea (which are important only to you or your field).
Level 3.
You build your theory on speculations. But those speculations are important even outside of your field.
I think Eliezer Yudkowsky and LW did theory-building of the 3rd level. A bunch of LW ideas are philosophically important even if you disagree with Bayesian epistemology (Eliezer’s view on ethics and math, logical decision theories and some Alignment concepts).
4.2 Commitment to explaining a phenomenon
I think there are three types of commitment in explaining a phenomenon.
Level 1.
You just want to predict the phenomenon. But many-many possible theories can predict the phenomenon, so you need something more.
Level 2.
You compare the phenomenon to other phenomena and focus on its qualities.
That’s where most of theories go wrong: people become obsessed with their own fantasies about qualities of a phenomenon.
Level 3.
You focus on dynamics which connect this phenomenon to other phenomena. You focus on overlapping implications of different phenomena. 3rd level is needed for any important scientific breakthrough. For example:
Imagine you want to explain combustion (why/how things burn). On one hand you already “know everything” about the phenomenon, so what do you even do? Level 1 doesn’t work. So, you try to think about qualities of burning, types of transformations, types of movement… but that won’t take you anywhere. Level 2 doesn’t work too. The right answer: you need to think not about qualities of transformations and movements, but about dynamics (conservation of mass, kinetic theory of gases) which connect different types of transformations and movements. Level 3 works.
Epistemology pt. 1
5. Commitment and epistemology
I think there are three levels of commitment in epistemology.
Level 1. You assume the primary reality of the physical world. (Physicism)
Take statements “2 + 2 = 4” and “God exists”. To judge those statements, a physicist is going to ask “Do those statements describe reality in a literal way? If yes, they are true.”
Level 2. You assume the primary reality of statements of some fundamental language. (Descriptivism)
To judge statements, a descriptivist is going to ask “Can those statements be expressed in the fundamental language? If yes, they are true.”
Level 3. You assume the primary reality of semantic connections between statements of languages. And the primary reality of some black boxes which create those connections. (Connectivism) You assume that something physical shapes the “language reality”.
To judge statements, a connectivist is going to ask “Do those statements describe an important semantic connection? If yes, they are true.”
...
Recap. Physicist: everything “physical” exists. Descriptivist: everything describable exists. Connectivist: everything important exists. Physicist can be too specific and descriptivist can be too generous. (This pattern of being “too specific” or “too generous” repeats for all commitment types.)
Thinking at the level of semantic connections should be natural to people (because they use natural language and… neural nets in their brains!). And yet this idea is extremely alien to people epistemology-wise.
Implications for rationality
In general, rationalists are “confused” between level 1 and level 2. I.e. they often treat level 2 very seriously, but aren’t fully committed to it.
Eliezer Yudkowsky is “confused” between level 1 and level 3. I.e. Eliezer has a lot of “level 3 ideas”, but doesn’t apply level 3 thinking to epistemology in general.
On one hand, Eliezer believes that “map is not the territory”. (level 1 idea)
On another hand, Eliezer believes that math is an “objective” language shaped by the physical reality. (level 3 idea)
Similarly, Eliezer believes that human ethics are defined by some important “objective” semantic connections (which can evolve, but only to a degree). (level 3)
“Logical decision theories” treat logic as something created by connections between black boxes. (level 3)
When you do Security Mindset, you should make not only “correct”, but beautiful maps. Societal properties of your map matter more than your opinions. (level 3)
So, Eliezer has a bunch of ideas which can be interpreted as “some maps ARE the territory”.
6. Commitment and uncertainty
I think there are three levels of commitment in doubting one’s own reasoning.
Level 1.
You’re uncertain about superficial “correctness” of your reasoning. You worry if you missed a particular counter argument. Example: “I think humans are dumb. But maybe I missed a smart human or applied a wrong test?”
Level 2.
You un-systematically doubt your assumptions and definitions. Maybe even your inference rules a little bit (see “inference objection”). Example: “I think humans are dumb. But what is a “human”? What is “dumb”? What is “is”? And how can I be sure in anything at all?”
Level 3.
You doubt the semantic connections (e.g. inference rules) in your reasoning. You consider particular dynamics created by your definitions and assumptions. “My definitions and assumptions create this dynamic (not presented in all people). Can this dynamic exploit me?”
Example: “I think humans are dumb. But can my definition of “intelligence” exploit me? Can my pessimism exploit me? Can this be an inconvenient way to think about the world? Can my opinion turn me into a fool even I’m de facto correct?”
...
Level 3 is like “security mindset” applied to your own reasoning. LW rationality mostly teaches against it, suggesting you to always take your smallest opinions at face value as “the truest thing you know”. With some exceptions, such as “ethical injunctions”, “radical honesty”, “black swan bets” and “security mindset”.
Epistemology pt. 2
7. Commitment to understanding/empathy
I think there are three levels of commitment in understanding your opponent.
Level 1.
You can pass the Ideological Turing Test in a superficial way (you understand the structure of the opponent’s opinion).
Level 2. “Telepathy”.
You can “inhabit” the emotions/mindset of your opponent.
Level 3.
You can describe the opponent’s position as a weaker version/copy of your own position. And additionally you can clearly imagine how your position could turn out to be “the weaker version/copy” of the opponent’s position. You find a balance between telepathy and “my opinion is the only one which makes sense!”
8. Commitment to “resolving” problems
I think there are three levels of commitment in “resolving” problems.
Level 1.
You treat a problem as a puzzle to be solved by Your Favorite True Epistemology.
Level 2.
You treat a problem as a multi-layered puzzle which should be solved on different levels.
Level 3.
You don’t treat a problem as a self-contained puzzle. You treat it as a “symbol” in the multitude of important languages. You can solve it by changing its meaning (by changing/exploring the languages).
Applying this type of thinking to the Unexpected hanging paradox:
Alignment pt. 1
9.1 Commitment to morality
I think there are three levels of commitment in morality.
Level 1. Norms, desires.
You analyze norms of specific communities and desires of specific people. That’s quite easy: you are just learning facts.
Level 2. Ethics and meta-ethics.
You analyze similarities between different norms and desires. You get to pretty abstract and complicated values such as “having agency, autonomy, freedom; having an interesting life; having an ability to form connections with other people”. You are lost in contradictory implications, interpretations and generalizations of those values. You have a (meta-)ethical paralysis.
Level 3. “Abstract norms”.
You analyze similarities between implications of different norms and desires. You analyze dynamics created by specific norms. You realize that the most complicated values are easily derivable from the implications of the simplest norms. (Not without some bias, of course, but still.)
I think moral philosophers and Alignment researches are seriously dropping the ball by ignoring the 3rd level. Acknowledging the 3rd level doesn’t immediately solve Alignment, but it can pretty much “solve” ethics (with a bit of effort).
9.2 Commitment to values
I think there are three levels of values.
Level 1. Inside values (“feeling good”).
You care only about things inside of your mind. For example, do you feel good or not?
Level 2. Real values.
You care about things in the real world. Even though you can’t care about them directly. But you make decisions to not delude yourself and not “simulate” your values.
Level 3. Semantic values.
You care about elements of some real system. And you care about proper dynamics of this system. For example, you care about things your friend cares about. But it’s also important to you that your friend is not brainwashed, not controlled by you. And you are ready that one day your friend may stop caring about anything. (Your value may “die” a natural death.)
3rd level is the level of “semantic values”. They are not “terminal values” in the usual sense. They can be temporal and history-dependent.
9.3 Commitment and research interest
So, you’re interested in ways in which an AI can go wrong. What specifically can you be interested in? I think there are three levels to it.
Level 1. In what ways some AI actions are bad?
You classify AI bugs into types. For example, you find “reward hacking” type of bugs.
Level 2. What qualities of AIs are good/bad?
You classify types of bugs into “qualities”. You find such potentially bad qualities as “AI doesn’t care about the real world” and “AI doesn’t allow to fix itself (corrigibility)”.
Level 3. What bad dynamics are created by bad actions of AI? What good dynamics are destroyed?
Assume AI turned humanity into paperclips. What’s actually bad about that, beyond the very first obvious answer? What good dynamics did this action destroy? (Some answers: it destroyed the feedback loop, the connection between the task and its causal origin (humanity), the value of paperclips relative to other values, the “economical” value of paperclips, the ability of paperclips to change their value.)
On the 3rd level you classify different dynamics. I think people completely ignore the 3rd level. In both Alignment and moral philosophy. 3rd level is the level of “semantic values”.
Alignment pt. 2
10. Commitment to Security Mindset
I think Security Mindset has three levels of commitment.
Level 1. Ordinary paranoia.
You have great imagination, you can imagine very creative attacks on your system. You patch those angles of attack.
Level 2. Security Mindset.
You study your own reasoning about safety of the system. You check if your assumptions are right or wrong. Then, you try to delete as much assumptions as you can. Even if they seem correct to you! You also delete anomalies of the system even if they seem harmless. You try to simplify your reasoning about the system seemingly “for the sake of it”.
Level 3.
You design a system which would be safe even in a world with changing laws of physics and mathematics. Using some bias, of course (otherwise it’s impossible).
Humans, idealized humans are “level 3 safe”. All/almost all current approaches to Alignment don’t give you a “level 3 safe” AI.
11. Commitment to Alignment
I think there are three levels of commitment a (mis)aligned AI can have. Alternatively, those are three or two levels at which you can try to solve the Alignment problem.
Level 1.
AI has a fixed goal or a fixed method of finding a goal (which likely can’t be Aligned with humanity). It respects only its own agency. So, ultimately it does everything it wants.
Level 2.
AI knows that different ethics are possible and is completely uncertain about ethics. AI respects only other people’s agency. So, it doesn’t do anything at all (except preventing, a bit lazily, 100% certain destruction and oppression). Or requires an infinite permission:
Am I allowed to calculate “2 + 2”?
Am I allowed to calculate “2 + 2” even if it leads to a slight change of the world?
Am I allowed to calculate “2 + 2” even if it leads to a slight change of the world which you can’t fully comprehend even if I explain it to you?
...
Wait, am I allowed to ask those question? I’m already manipulating you by boring you to death. I can’t even say anything.
Level 3.
AI can respect both its own agency and the agency of humanity. AI finds a way to treat its agency as the continuation of the agency of people. AI makes sure it doesn’t create any dynamic which couldn’t be reversed by people (unless there’s nothing else to do). So, AI can both act and be sensitive to people.
Implications for Alignment research
I think a fully safe system exists only on the level 3. The most safe system is the system which understands what “exploitation” means, so it never willingly exploits its rewards in any way. Humans are an example of such system.
I think alignment researchers are “confused” between level 1 and level 3. They try to fix different “exploitation methods” (ways AI could exploit its rewards) instead of making the AI understand what “exploitation” means.
I also think this is the reason why alignment researches don’t cooperate much, pushing in different directions.
Perception
11. Commitment to properties
Commitments exist even on the level of perception. There are three levels of properties to which your perception can react.
Level 1. Inherent properties.
You treat objects as having more or less inherent properties. “This person is inherently smart.”
Level 2. Meta-properties.
You treat any property as universal. “Anyone is smart under some definition of smartness.”
Level 3. Semantic properties.
You treat properties only as relatively attached to objects: different objects form a system (a “language”) where properties get distributed between them and differentiated. “Everyone is smart, but in a unique way. And those unique ways are important in the system.”
12.1 Commitment to experiences and knowledge
I think there are three levels of commitment to experiences.
Level 1.
You’re interested in particular experiences.
Level 2.
You want to explore all possible experiences.
Level 3.
You’re interested in real objects which produce your experiences (e.g. your friends): you’re interested what knowledge “all possible experiences” could reveal about them. You want to know where physical/mathematical facts and experiences overlap.
12.2 Commitment to experience and morality
I think there are three levels of investigating the connection between experience and morality.
Level 1.
You study how experience causes us to do good or bad things.
Level 2.
You study all the different experiences “goodness” and “badness” causes in us.
Level 3.
You study dynamics created by experiences, which are related to morality. You study implications of experiences. For example: “loving a sentient being feels fundamentally different from eating a sandwich. food taste is something short and intense, but love can be eternal and calm. this difference helps to not treat other sentient beings as something disposable”
I think the existence of the 3rd level isn’t acknowledged much. And yet it could be important for alignment. Most versions of moral sentimentalism are 2nd level at best. Epistemic Sentimentalism can be 3rd level.
Final part
Specific commitments
You can ponder your commitment to specific things.
Are you committed to information?
Imagine you could learn anything (and forget it if you want). Would you be interested in learning different stuff more or less equally? You could learn something important (e.g. the most useful or the most abstract math), but you also could learn something completely useless—such as the life story of every ant who ever lived.
I know, this question is hard to make sense of: of course, anyone would like to learn everything/almost everything if there was no downside to it. But if you have a positive/negative commitment about the topic, then my question should make some sense anyway.
Are you committed to people?
Imagine you got extra two years to just talk to people. To usual people on the street or usual people on the Internet.
Would you be bored hanging out with them?
My answers: >!Maybe I was committed to information in general as a kid. Then I became committed to information related to people, produced by people, known by people.!<
My inspiration for writing this post
I encountered a bunch of people who are more committed to exploring ideas (and taking ideas seriously) than usual. More committed than most rationalists, for example.
But I felt those people lack something:
They are able to explore ideas, but don’t care about that anymore. They care only about their own clusters of idiosyncratic ideas.
They have very vague goals which are compatible with any specific actions.
They don’t care if their ideas could even in principle matter to people. They have “disconnected” from other people, from other people’s context (through some level of elitism).
When they acknowledge you as “one of them”, they don’t try to learn your ideas or share their ideas or argue with you or ask your help for solving a problem.
So, their commitment remains very low. And they are not “committed” to talking.
Conclusion
If you have a high level of commitment (3rd level) at least to something, then we should find a common language. You may even be like a sibling to me.
Thank you for reading this post. 🗿
Cognition
14.1 Studying patterns
I think there are three levels of commitment to patterns.
You study particular patterns.
You study all possible patterns: you study qualities of patterns.
You study implications of patterns. You study dynamics of patterns: how patterns get updated or destroyed when you learn new information.
14.2 Patterns and causality
I think there are three levels in the relationship between patterns and causality. I’m going to give examples about visual patterns.
Level 1.
You learn which patterns are impossible due to local causal processes.
For example: “I’m unlikely to see a big tower made of eggs standing on top of each other”. It’s just not a stable situation due to very familiar laws of physics.
Level 2.
You learn statistical patterns (correlations) which can have almost nothing to do with causality.
For example: “people like to wear grey shirts”.
Level 3.
You learn patterns which have a strong connection to other patterns and basic properties of images. You could say such patterns are created/prevented by “global” causal processes.
For example: “I’m unlikely to see a place fully filled with dogs. dogs are not people or birds or insects, they don’t create such crowds”. This is very abstract, connects to other patterns and basic properties of images.
Implications for Machine Learning
I think...
It’s likely that Machine Learning models don’t learn level 3 patterns as well as they could, as sharply as they could.
Machine Learning models should be 100% able to learn level 3 patterns. It shouldn’t require any specific data.
Learning/comparing level 3 patterns is interesting enough on its own. It could be its own area of research. But we don’t apply statistics/Machine Learning to try mining those patterns. This may be a missed opportunity for humans.
I think researchers are making a blunder by not asking “what kinds of patterns exist? what patterns can be learned in principle?” (not talking about universal approximation theorem)
15. Cognitive processes
Suppose you want to study different cognitive processes, skills, types of knowledge. There are three levels:
You study particular cognitive processes.
You study qualities of cognitive processes.
You study dynamics created by cognitive processes. How “actions” of different cognitive processes overlap.
I think you can describe different cognitive processes in terms of patterns they learn. For example:
Causal reasoning learns abstract configurations of abstract objects in the real world. So you can learn stuff like “this abstract rule applies to most objects in the world”.
Symbolic reasoning learns abstract configurations of abstract objects in your “concept space”. So you can learn stuff like “”concept A contains concept B” is an important pattern”.
Correlational reasoning learns specific configurations of specific objects.
Mathematical reasoning learns specific configurations of abstract objects. So you can build arbitrary structures with abstract building blocks.
Self-aware reasoning can transform abstract objects into specific objects. So you can think thoughts like, for example, “maybe I’m just a random person with random opinions”.
I think all this could be easily enough formalized.
Meta-level
Can you be committed to exploring commitment?
I think yes.
One thing you can do is to split topics into sub-topics and raise your commitment in every particular sub-topic. Vaguely similar to gradient descent. That’s what I’ve been doing in this post so far.
Another thing you can do is to apply recursion. You can split any topic into 3 levels of commitment. But then you can split the third level into 3 levels too. So, there’s potentially an infinity of levels of commitment. And there can be many particular techniques for exploiting this fact.
But the main thing is the three levels of “exploring ways to explore commitment”:
You study particular ways to raise commitment.
You study all possible ways to raise commitment.
You study all possible ways through a particular way. You study dynamics and implications which the ways create.
I don’t have enough information or experience for the 3rd level right now.