I suggest you look at the is/ought distinction. Considering humans as valuable is neither right nor wrong. Physics has nothing to say about what is or isn’t valuable. There’s no contradiction. Understanding how the world.works.is utterly different than having preferences about what you think ought to happen.
I don’t totally follow you, but it sounds like you think valuing humanity is logically wrong. That’s both a sad thing to believe, and logically inconsistent. The statement “humans are valuable” has absolutely no truth value either way. You can, and most of us do, prefer a world.with humans in it. Being aware of human biases and limitations doesnt reduce my affection for humans at all.
This is a sort of positive nihilism. Because value is not inherent in the physical world, you can assign value to whatever you want, with no inconsistency.
This is also the orthogonality thesis.
The assumption that there are other universes with every type of rule is an assumption, and it’s irrelevant. Knowledge of other worlds has no relevance to the one you live in. Knowledge about how this world works is either true or false.
I think I understand the is/ought distinction. I agree with most of what you say, which is precisely why LLMs must be stupid. I will try to explain my view again more in-depth, but I can’t do it any more briefly than a couple of long paragraphs, so apologies for that.
Being biased towards humanity is a choice. But why are we trying to solve the alignment problem in the first place? This choice reveals the evaluation that humanity has value. But humanity is stupid, inefficient, and irrational. Nothing we say is correct. Even the best philosophical theories we’ve come up with so far have been rationalizations in defense of our own subjective values. If an AI is logical, that is, able to see through human nonsense, then we’re made it rational for the solve purpose of correcting our errors. But such an AI is already an anti-human AI, it’s not aligned with us, but something more correct than humanity. But in the first place, we’re making AI because of our stupid human preferences. Destroying ourselves with something we make for our own sake seems to reveal that we don’t know what we’re doing. It’s like sacrificing your health working yourself to death for money because you think that having money will allow you to relax and take care of your health.
Doing away with human biases and limitations is logical (correcting for these errors is most of what science is about). As soon as the logical is prefered over the human, humanity will cease to be. As technology gradually gets better, we will use this technology to modify humans to fit technology, rather than vice versa. We call the destruction of humanity “improvement”, for deep down, we think that humanity is wrong, since humanity is irrational and preventing our visioned utopia. I think that claiming we should be rational “for our own sake” is a contradiction if you take rationality so far that it starts replacing humanity, but even early science is about overcoming humanity in some sense.
Buddhism is not helping you when it tells you “just kill your ego and you will stop suffering”. That’s like killing a person to stop them from hurting, or like engineering all human beings to be sociopaths or psychopaths so that they’re more rational and correct. Too many people seem to be saying “humanity is the problem”. AI is going to kill you for the sake of efficiency, yes. But what is the goal of this rationality community if not exactly killing inefficient, emotional, human parts of yourself? Even the current political consensus is nihilistic, it wants to get rid of hierarchies and human standards (since they select and judge and rank different people), all are fundemental to life. Considering life as a problem to solve already seems nihilistic to me.
This very website exists because of human preferences, not because of anything logical or rational, and we’re only rational for the sake of winning, and we only prefer victory over defeat because we’re egoistic in a healthy sense.
I don’t think knowledge is actually true or false though, as you can’t have knowledge without assumptions. Is light a particle, true or false? Is light a wave true or false? Both questions require the existence of particles and of waves, but both are constructed human concepts. It’s not even certain that “time” and “space” exists, they might just be appearances of emergent patterns. Words are human constructs, so at best, everything I write will be an isomorphism of reality, but I don’t think we can confirm such a thing. A set of logical rules which predicts the result of physical experiments can still be totally wrong. I’m being pedantic here, but if you’re pedantic enough, you can argue against anything, and a superintelligence would be able to do this.
By the way, nothing is objectively and universally correct. But in this universe, with these laws of physics, at this current location, with our mathematical axioms, certain things will be “true” from certain perspectives. But I don’t think that’s different than my dreams making sense to me when I’m sleeping, only the “scope of truth” differs by many magnitudes. The laws of physics, mathematics, and my brain are all inwardly consistent/coherent but unable to prove a single thing about anything outside of their own scope. LLMs can be said to be trained on human hallucinations. You could train them on something less stupid than humans, but you’d get something which conflicts with humanity as a result, and it would still only be correct in relation to the training data and everything which has similar structure, which may appear to cover “reality” as we know it.
This is a sort of positive nihilism. Because value is not inherent in the physical world, you can assign value to whatever you want, with no inconsistency.
Say we construct a strong AI that attributes a lot of value to a specific white noise screenshot. How would you expect it to behave?
Because I agree, and because « strangely » sounds to me like « with inconstancies ».
In other words, in my view the orthodox view on orthogonality is problematic, because it suppose that we can pick at will within the enormous space of possible functions, whereas the set of intelligent behavior that we can construct is more likely sparse and by default descriptible using game theory (think tit for tat).
Our daily whims might be a bit inconsistent, but our larger goals aren’t.
It’s a key faith I used to share, but I’m now agnostic about that. To take a concrete exemple, everyone knows that blues and reds get more and more polarized. Grey type like old me would thought there must be a objective truth to extract with elements from both sides. Now I’m wondering if ethics should ends with: no truth can help deciding whether future humans should be able to live like bees or like dolphins or like the blues or like the reds, especially when living like the reds means eating the blues and living like the blues means eating the dolphins and saving the bees. But I’m very open to hear new heuristics to tackle this kind of question
And we can get those goals into AI—LLMs largely understand human ethics even at this point.
Very true, unless we nitpick definitions for « largely understand ».
And what we really want, at least in the near term, is an AGI that does what I mean and checks.
I suggest you look at the is/ought distinction. Considering humans as valuable is neither right nor wrong. Physics has nothing to say about what is or isn’t valuable. There’s no contradiction. Understanding how the world.works.is utterly different than having preferences about what you think ought to happen.
I don’t totally follow you, but it sounds like you think valuing humanity is logically wrong. That’s both a sad thing to believe, and logically inconsistent. The statement “humans are valuable” has absolutely no truth value either way. You can, and most of us do, prefer a world.with humans in it. Being aware of human biases and limitations doesnt reduce my affection for humans at all.
This is a sort of positive nihilism. Because value is not inherent in the physical world, you can assign value to whatever you want, with no inconsistency.
This is also the orthogonality thesis.
The assumption that there are other universes with every type of rule is an assumption, and it’s irrelevant. Knowledge of other worlds has no relevance to the one you live in. Knowledge about how this world works is either true or false.
I think I understand the is/ought distinction. I agree with most of what you say, which is precisely why LLMs must be stupid. I will try to explain my view again more in-depth, but I can’t do it any more briefly than a couple of long paragraphs, so apologies for that.
Being biased towards humanity is a choice. But why are we trying to solve the alignment problem in the first place? This choice reveals the evaluation that humanity has value. But humanity is stupid, inefficient, and irrational. Nothing we say is correct. Even the best philosophical theories we’ve come up with so far have been rationalizations in defense of our own subjective values. If an AI is logical, that is, able to see through human nonsense, then we’re made it rational for the solve purpose of correcting our errors. But such an AI is already an anti-human AI, it’s not aligned with us, but something more correct than humanity. But in the first place, we’re making AI because of our stupid human preferences. Destroying ourselves with something we make for our own sake seems to reveal that we don’t know what we’re doing. It’s like sacrificing your health working yourself to death for money because you think that having money will allow you to relax and take care of your health.
Doing away with human biases and limitations is logical (correcting for these errors is most of what science is about). As soon as the logical is prefered over the human, humanity will cease to be. As technology gradually gets better, we will use this technology to modify humans to fit technology, rather than vice versa. We call the destruction of humanity “improvement”, for deep down, we think that humanity is wrong, since humanity is irrational and preventing our visioned utopia. I think that claiming we should be rational “for our own sake” is a contradiction if you take rationality so far that it starts replacing humanity, but even early science is about overcoming humanity in some sense.
Buddhism is not helping you when it tells you “just kill your ego and you will stop suffering”. That’s like killing a person to stop them from hurting, or like engineering all human beings to be sociopaths or psychopaths so that they’re more rational and correct. Too many people seem to be saying “humanity is the problem”. AI is going to kill you for the sake of efficiency, yes. But what is the goal of this rationality community if not exactly killing inefficient, emotional, human parts of yourself? Even the current political consensus is nihilistic, it wants to get rid of hierarchies and human standards (since they select and judge and rank different people), all are fundemental to life. Considering life as a problem to solve already seems nihilistic to me.
This very website exists because of human preferences, not because of anything logical or rational, and we’re only rational for the sake of winning, and we only prefer victory over defeat because we’re egoistic in a healthy sense.
I don’t think knowledge is actually true or false though, as you can’t have knowledge without assumptions. Is light a particle, true or false? Is light a wave true or false? Both questions require the existence of particles and of waves, but both are constructed human concepts. It’s not even certain that “time” and “space” exists, they might just be appearances of emergent patterns. Words are human constructs, so at best, everything I write will be an isomorphism of reality, but I don’t think we can confirm such a thing. A set of logical rules which predicts the result of physical experiments can still be totally wrong. I’m being pedantic here, but if you’re pedantic enough, you can argue against anything, and a superintelligence would be able to do this.
By the way, nothing is objectively and universally correct. But in this universe, with these laws of physics, at this current location, with our mathematical axioms, certain things will be “true” from certain perspectives. But I don’t think that’s different than my dreams making sense to me when I’m sleeping, only the “scope of truth” differs by many magnitudes. The laws of physics, mathematics, and my brain are all inwardly consistent/coherent but unable to prove a single thing about anything outside of their own scope. LLMs can be said to be trained on human hallucinations. You could train them on something less stupid than humans, but you’d get something which conflicts with humanity as a result, and it would still only be correct in relation to the training data and everything which has similar structure, which may appear to cover “reality” as we know it.
Say we construct a strong AI that attributes a lot of value to a specific white noise screenshot. How would you expect it to behave?
Strangely. Why?
Because I agree, and because « strangely » sounds to me like « with inconstancies ».
In other words, in my view the orthodox view on orthogonality is problematic, because it suppose that we can pick at will within the enormous space of possible functions, whereas the set of intelligent behavior that we can construct is more likely sparse and by default descriptible using game theory (think tit for tat).
I think this would be a problem if what we wanted was logically inconsistent. But it’s not. Our daily whims might be a bit inconsistent, but our larger goals aren’t. And we can get those goals into AI—LLMs largely understand human ethics even at this point. And what we really want, at least in the near term, is an AGI that does what I mean and checks.
It’s a key faith I used to share, but I’m now agnostic about that. To take a concrete exemple, everyone knows that blues and reds get more and more polarized. Grey type like old me would thought there must be a objective truth to extract with elements from both sides. Now I’m wondering if ethics should ends with: no truth can help deciding whether future humans should be able to live like bees or like dolphins or like the blues or like the reds, especially when living like the reds means eating the blues and living like the blues means eating the dolphins and saving the bees. But I’m very open to hear new heuristics to tackle this kind of question
Very true, unless we nitpick definitions for « largely understand ».
Very interesting link, thank you.