Another double-counting: wanting for people to be saved for altruistic reasons and wanting to personally do things that save people.
Signer
Maybe it’s not a rational update, but people just taking their time to update to what they should have rationally believed 3 years ago.
Fire is reducible.
And so are qualia. The only difference is that the science haven’t yet provided a useful reduction. But laws of physics still don’t say how you should reduce things. And reductions doesn’t preserve everything—fire can look continuous, but actually consist of atoms.
No it comes from the observationthat our sensorium is not a picture of our brains?
You can also observe that fire is not a picture of atoms. Reductions are indirect, can have different precision and some parts of observations are just wrong. There are no observations that contradict future neurology predicting all your experiences more precisely than you can feel them now.
If Qualia are identified with microphysical properties, those properties need to be localised to solve the binding problem.
Again, there is no reason to directly identify qualia with microphysical properties. You don’t need to make atoms continuous to bind them to continuous-looking fire. The idea is to only identify phenomenal nature of qualia with physical existence. After that science can figure out specific useful model and just say “your observations of qualia are not sensitive enough to say anything about localization on nanometer scale” like it says in the case of continuous-looking fire.
I’m not saying that figuring out how brain implements human experiences is a solved or uninteresting problem. It’s just not a Hard, philosophical problem. At least no more, than in the case of fire.
It factors into localised parts.
Approximately localized. And even without quantum effects there are definitely relevant interactions on the macro scale. And gravity. And space. I just don’t get how “physical human is not spatially extensive” objection makes sense. Of course, it doesn’t matter, because saying that qualia are spatially extensive is like saying that fire is continuous.
Why is there a binding problem for fire?
Because there is no fire in the ontology of modern physics and there are no laws of physics that say that some arrangement of atoms are fire. There are only extra-physical conventions that say that if atoms work approximately like fire you can say that fire reduces to atoms. That’s how reductionism works. It works the same way for observations—there are no physical laws that determine how precisely your measurement equipment must draw numbers for you to conclude your physical theory is correct. And it works the same way for qualia—there are no physical laws that say that some neural activity is your experience of blue.
Are you now saying that the binding comes from neurology?
Binding comes from a human desire to describe things in an approximate, useful way. Fundamentally, there is no binding between real physics and continuity of fire. And so the binding problem is an easy problem of scientifically describing a brain in enough precision that all pixels of your visual field are predictable from this description.
something is a WF doesn’t mean it is nonlocal or particularly spatially extensive , since WFs can bunch down to any finite size.
Sure.
Most of the electrons in the human body are localised to orbitals that are some nanometers across (But not localised within them).
But WF of a human is spatially extensive enough.
our sensorium should look like a fine grained brain scan
Why not like a drawing of a head?
Anyway, the binding problem for qualia is no different from the binding problem for fire. There is just no reason to promote limits of human introspection into fundamental ontology, just like there is no reason why fire can’t look continuous, but actually consist of mostly empty space.
Oh, ok, I misunderstood you.
or you could have the qualia instead of that (monistic panpsychism)
Physics is monistic panpsychism—there are no just geometric-causal-numerical ingredients, there is also implicit statement that universe that equations describe has intrinsic property of existence.
Yes, but why do you refuse to believe it? What’s your evidence that your experience of color is ontologically primitive? It’s just baseless assumption.
Physicalists who aren’t thoroughgoing eliminativists or illusionists, are actually dualists.
Can you imagine believing in dualistic non-physical parts of your experience that you are not aware of?
They mean that (there is more chance that) training will produce obedient AI that will help governments become more totalitarian and will not effectively pursue some very alien goal.
For people who have color vision, I can state it more concretely: color exists in reality, it doesn’t exist in physics, therefore physics is incomplete in some way.
You don’t have enough evidence of this. Nothing about your experience of color contradicts it being neurons. Do you agree, that you can have thoughts about your experience of color? Like “I’ve seen blue sky yesterday”. Do you agree that they can be more or less correct, like when you forgot, that actually it was very cloudy all day yesterday? Do you agree that you can describe you experience more or less precisely? Do you agree that your experience has structure? When you say that “color” exists you mean something, that works in specific ways. For example, it does not create blue-sky experiences on very cloudy days. And if you describe these ways precisely enough, you’ll get a description of neurons. What does you think a physical description of you describes, when it describes a difference between a state interpretable as you seeing a blue sky and a state interpretable as you seeing a cloudy sky?
Is it just that you refuse to believe that your experience has any parts you are not aware of?
I’m not a fan of platonism. Definitely not of a traditional platonism, as some separate additional category in fundamental ontology. Looks like something human mathematicians would come up to feel better about themselves. Even though it is an outside view reasoning, similar to the one people use to dismiss panpsychism—I still don’t see what’s the point, when you can just say that any instance of math working is a physical fact.
The mathematical universe is more likely, but I’m not even sure it is more simple hypothesis, than some other, not so mathy physics.
Assuming it, I can see how not having to worry about existence of high-level abstractions can help. It’s just funny, because “but it IS some other territory” is very overpowered argument. Causality gets weird, but platonists probably love acausual stuff, so whatever. Personally, in this scenario, I worry that mathematical universe doesn’t give existence to some abstraction and so if you rely on this, you can still get zombies on some level. Probably it’s not so limited, but even then, are you supposed to be able to constrain mathematical universe by thinking about abstractions in our world?
Again, this is all correct. Well, except level 6. But level 6 is hilarious.
A physicalist, if I understand correctly, could consistently claim that such an experiment is deluding the subject, essentially doing something like modifying the memory of the experience so that they inaccurately feel the same, when in fact there was a difference.
It’s all arbitrary ethics. You can already say that changing location deludes you. Suddenly starting to care about complexity is just letting your epistemology bleed into your values.
C1 wants to say that worlds which are structurally isomorphic are literally the same world.
I don’t think this is a typical or correct view, if you factor existence out of structure. People believe in reality. “Shut up and calculate” has a name precisely because it’s not a universal position. There is a physical difference between real and fictional chair, even if you describe them as having identical structure. It’s just that usually existence is implicit—physics doesn’t talk about fictional chairs. C1 doesn’t have an answer to “relations need relata” because “relations need relata” is correct.
And so is “blue is like a chair”.
They’re arguing that conscious experience of blue and red gives evidence of something that doesn’t purely fit the causal/functional role in the way a chair does.
Yeah, but they don’t have a strong argument. I’m not sure what is a rigorous way to show that argument from conceivability of world B fails, if we accept the framework of conceivability arguments. Rules of counterfactual behavior are rules of physics and so worlds have different relations, maybe? But I don’t believe conceivability arguments are that rigorously justified in the first place. I accept them in case of zombies, mostly because there is a broadly physicalist solution—zombies are different in that they don’t exist. But in the blue/red case you can conceive of a functionally same chair that exists differently as much as you can conceive of spectrum inversion. You don’t even need to be unphysical about it—antimatter chair from an antimatter-dominated world counterfactually annihilates if you bring it to our world.
And more importantly, like C1 says, parsimony—there is no need to think about different kinds of existence, when you can explain everything with one kind. You agree that if we grant intrinsic property of existence, then third-person descriptions describe first-person experience as completely as they describe chair? Because then neurons and atoms are just more precise description of the same reality that you call “I’m seeing blue”. C2 doesn’t have evidence or arguments that say that “blue” is not neurons, if neurons (are high-level description of reality that) intrinsically exists. But then all differences between blue and red are describable by relations (that are about things that exists) and so arguments about inverted spectrum should not change anything.
If you start to say that some “intrinsic property” is needed to realise the structure then C2 has an opening to claim this is the categorical protophenomenal property required to fix phenomenal character.
Well, there isn’t much that makes it “phenomenal”. Chairs also exist. And it’s not unphysical to say that things exist. It supposed to feel acceptable by everyone by design^^. And if you accept it, all phenomenal structure—all differences between red and blue and all first-person descriptions—are as completely describable by relational physics as chairs. In the end physicalist can say it’s not that consciousness maps to existence, it’s just that people confused consciousness with different, perfectly physical concept of existence.
Ensuring that you get good generalization, and that models are doing things for the right reasons, is easy when you can directly verify what generalization you’re getting and directly inspect what reasons models have for doing things. And currently, all of the cases where we’ve inadvertently selected for misaligned personas—alignment faking, agentic misalignment, etc.—are cases where the misaligned personas are easy to detect: they put the misaligned reasoning directly in their chain-of-thought, they’re overtly misaligned rather than hiding it well, and we can generate fake scenarios that elicit their misalignment.
But visible misalignment being easy to detect and correlated with misaligned chain-of-thought doesn’t guarantee that training that eliminates visible misalignment and misaligned chain-of-thought results in a model that does things for the right reasons? The model can still learn unintended heuristics. And what’s the actual hypothesis about model’s reasons when they appear to be right? Its learned reasoning algorithm is isomorphic to a reasoning algorithm of a helpful human that reads same instructions, or what?
Let me put it this way then, how do you combine all of these tiny little microexperiences into a coherent macroexperience?
Microexperiences are unphysical—there are no electrons, only global wavefunction. So you only have decomposition problem. It is solved by weak illusionism: there is no real fundamental perfect isolation of qualia, just qualia of isolation. For every detailed description of isolation of your qualia, there is either non-contradicting physical description of only approximately isolated part of reality, or your description is wrong—same way a description of a chair works.
Yes, but I have a principled reason to special plead here. The complete description of the world is only complete from the third person perspective. It’s incomplete from a first person perspective because we need to explain the phenomenal character of consciousness.
I think it circles here? You started by justifying incompleteness by inverted spectrum, received the objection about chairs being analogous, and then answer that the difference is in incompleteness. The problem is that the chair analogy is correct—the difference between blue and red is completely describable by physics. You only need intrinsic property of existence for the whole universe to solve zombies. But you also need it for a chair to be real.
Of course, I don’t think many physicalists actually believe in structural relations all the way down.
Conscious phenomenology should only arise in systems whose internal states model both the world and their own internal dynamics as an observer within that world. Neural or artificial systems that lack such recursive architectures should not report or behave as though they experience an “inner glow.”
What part of staring at a white wall without inner dialog and then later remembering it requires inner modeling at the moment of staring?
Internal shifts in attention and expectation can alter what enters conscious awareness, even when sensory input remains constant. This occurs in binocular rivalry and various perceptual illusions,17 consistent with consciousness depending on recursive self-modeling rather than non-cyclic processing of external signals.
But why would changing processing to non-cyclic result in experience becoming unconscious, instead of, I don’t know, conscious, but less filtered by attention?
And as usual, do you then consider any program, that reads it’s own code, to be conscious?
(1 - conscious) * (1 - each_other) * (1 - care_other) * (1 - bored) * (1 - avoid_wireheading) * (1 - active_learning)
Wait, but paperclipper is independent of all of these and your arguments about them? Self-aware distributed coordinating paperclipper with loop prevention, that creates real paperclips and learns things is still paperclipper.
But even if that’s the case the central consistently repeated version of the value loading problem in Bostrom 2014 centers on how it’s simply not rigorously imaginable how you would get the relevant representations in the first place.
I’m not so sure. Like, first of all, you mean something like “get before superintelligence” or “get into the goal slot”, because there is obviously a method to just get the representations—just build a superintelligence with a random goal, it will have your representations. That difference was explicitly stated then, it is often explicitly stated now—all that “AI will understand but not care”. The focus on the frameworks where it gets hard to translate from humans to programs is consistent with him trying to constrain methods of generating representations to only useful ones.
There is a reason why it is called “the value loading problem” and not “the value understanding problem”. “The value translation problem” would be somewhat in the middle: having actual human utility program would certainly solve some of Bostrom’s problems.
I don’t know whether Bostrom actually thought about non-superintelligent AI that already understands but don’t care. But I don’t think this line of argumentations of yours is correct about why such a scenario contradicts his points. Even if he didn’t consider it, it’s not “contra”, unless it actually contradicts him. What actually may contradict him is not “AI will understand values early” but “AI will understand values early and training such early AI will make it care about right things”.
The fact we don’t do this to begin with heavily implies, almost as a necessary consequence really, that the representation of happiness which is a correct understanding of what we meant was not available at the time we specified what happiness is.
It depends on what you mean by “available”—we already had a representation of happiness in a human brain. And building corrigible AI that builds a correct representation of happiness is not enough—like you said, we need to point at it.
If you had a non superintelligent corrigible AI that builds a world model with a correct specification of happiness in it, you would use that specification.
If you can use it.
If Bostrom does not expect us to do this, that implies he does not expect us to build an AI that builds a correct representation of happiness until it is incorrigible or otherwise not able to be used to specify happiness for our superintelligent AI.
Yes, the key is “otherwise not able to be used”.
Therefore Bostrom expects we will not have an AI that correctly understands concepts like happiness until after it is already superintelligent.
No, unless by “correctly understands” you mean “have an identifiable representation that humans can use to program other AI”—he may expect that we will have an intelligence that correctly understands concepts like happiness while not yet being superintelligent (like we have humans, that are better at this than “maximize happiness”) but we still won’t be able to use it.
But they don’t unpack to optimality being a real thing. No real entity actually optimizes anything, except maybe everything minimizes action. “It’s useful in economics” doesn’t mean you can just extrapolate it wherever.
What is supported by what? Is the claim that thinking about utility worked for economists, so everyone should think about utility, or that empirical research shows that anyone smart is trying to conquer the world, or what is the claim and what it is the evidence?
It is all ungrounded philosophy without quantifying what actual theories match reality by how much.