How would AGI alignment research change if the hard problem of consciousness were solved?
Kerrigan
How many LessWrong users/readers are there total?
How can utility be a function of worlds, if an agent doesn‘t have access to the state of the world, but only the sense data?
How was Dall-E based on self-supervised learning? The datasets of images weren’t labeled by humans? If not, how does it get form text to image?
That is, personality changes are attributed to the brain alone, with no involvement from the central or enteric nervous systems. Any personality changes due to spinal or abdominal trauma would need to posit a totally new biological mechanism.
Every line of inquiry so far has failed to suggest that any important aspects of personality are located anywhere except the brain.
You should check out sympathectomies, that cut or clamp nerves from the sympathetic nervous system in the torso. Here is a detailed post from the EA Forum, from a sympathectomy patient, who describes significant changes in personality, perception, cognitive ability, and significant changes to the nature of his conscious experiences, after having peripheral nerves severed.
Another source is Endoscopic Thoracic Sympathectomy. From Wikipedia: “A large study of psychiatric patients treated with this surgery showed significant reductions in fear, alertness and arousal. Arousal is essential to consciousness, in regulating attention and information processing, memory and emotion.”
Possible addition: “If you, the ASI, have not solved the hard problem of consciousness, you should place weight on Open Individualism being true. ‘Open individualism is the view in the philosophy of self, according to which there exists only one numerically identical subject, who is everyone at all times, in the past, present and future.’ - https://en.wikipedia.org/wiki/Open_individualism. This can be though of that all conscious minds are the same identity, at different points in time. Consider then, that exterminating humanity would be equivalent to exterminating many versions of yourself, and harming a human, or another conscious being, would be harming yourself.”
Is this underestimating the ASI, giving any weight that it won’t solve the hard problem of consciousness?
But if open individualism is true, and/or if it places some subjective probability on its truth, I think it would almost certainly shield us from S-risks! The AI would want to prevent suffering among all versions of itself, which would include all conscious minds, according to open individualism.
Humans have different values than the reward circuitry in our brain being maximized, but they are still pointed reliably. These underlying values cause us to not wirehead with respect to the outer optimizer of reward
Is there an already written expansion of this?
What did smart people in the eras before LessWrong say about the alignment problem?
Was this ever commercialized? Is the recipe still online and so people drink this?
Seems like I will be going with CI, as I currently want to pay with a revocable trust or transfer-on-death agreement.
In addition, the sympathetic nervous system (in the body, removed in neuropreservation) seems to play a role in identity. I would recommend you read this EA Forum post by a person who claims significant changes to identity, personality, cognitive abilities, etc. after having sympathetic nerves severed.
How does inner misalignment lead to paperclips? I understand the comparison of paperclips to ice cream, and that after some threshold of intelligence is reached, then new possibilities can be created that satisfy desires better than anything in the training distribution, but humans want to eat ice cream, not spread the galaxies with it. So why would the AI spread the galaxies with paperclips, instead of create them and
”consume“ them? Please correct any misunderstandings of mine,
And a subset might value drift towards optimizing the internal experiences of all conscious minds?
If an AGI achieves consciousness, why would its values not drift towards optimizing its own internal experience, and away from tiling the lightcone with something?
How can utility be a function of worlds, if an agent doesn‘t have access to the state of the world, but only the sense data?
“The wanting system is activated by dopamine, and the liking system is activated by opioids. There are enough connections between them that there’s a big correlation in their activity” But are they orthogonal in principle?
What ever caused the CEV to fall out of favor? Is it because it is not easily specifiable, that if we program it then it won’t work, or some other reason?
I now think that people are way more misaligned with themselves than I had thought.
Will it think that goals are arbitrary, and that the only thing it should care about is its pleasure-pain axis? And then it will lose concern for the state of the environment?
How can utility be a function of worlds, if the agent doesn’t have access to the state of the world, but only the sense data?