As I was reading the article about the pebble-sorters, I couldn’t help but think, “silly pebble-sorters, their values are so arbitrary and ultimately futile”. This happened, of course, because I was observing them from the outside. If I was one of them, sorting pebbles would feel perfectly natural to me; and, in fact, I could not imagine a world in which pebble-sorting was not important. I get that.
This is about rational agents. If pebble sorters can’t think of a non-arbitrary reason for sorting pebbles, they would recognise it a silly. Why not? Humans can spend years collecting stamps, or something, only to decide it is pointless.
However, both the pebble-sorters and myself share one key weakness: we cannot examine ourselves from the outside; we can’t see our own source code. An AI, however, could
What...why...? Is there something special about silicon? Is it made from different quarks?
Being rational doesn’t automatically make an agent able to read its own source code. Remember that, to the pebble-sorters, sorting pebbles is an axiomatically reasonable activity; it does not require justification. Only someone looking at them from the outside could evaluate it objectively.
What...why...? Is there something special about silicon?
Not at all; if you got some kind of a crazy biological implant that let you examine your own wetware, you could do it too. Silicon is just a convenient example.
Not at all; if you got some kind of a crazy biological implant that let you examine your own wetware, you could do it too. Silicon is just a convenient example.
Humans can examine their own thinking. Not perfectly, because we aren’t perfect. But we can do it, and indeed do so all the time. It’s a major focus on this site, in fact.
Being rational doesn’t automatically make an agent able to read its own source code. Remember that, to the pebble-sorters, sorting pebbles is an axiomatically reasonable activity;
You can define a pebblesorter as being unable to update its values, and I can point out that most rational agents won’t be like that. Most rational agents won’t have unupdateable values, because they will be messilly designed/evolved, and therefore will be capable of converging on an ethical system via their shared rationality.
Most rational agents won’t have unupdateable values, because they will be messilly designed/evolved...
We are messily designed/evolved, and yet we do not have updatable goals or perfect introspection. I absolutely agree that some agents will have updatable goals, but I don’t see how you can upgrade that to “most”.
...and therefore will be capable of converging on an ethical system via their shared rationality.
How so ? Are you asserting that there exists an optimal ethical system that is independent of the actors’ goals ? There may well be one, but I am not convinced of this, so you’ll have to convince me.
We blatantly have updatable goals: people do not have the same goals at 5 as they do at 20 or 60. I don’t know why perfect introspection would be needed to have some ability to update.
Sorry, that was bad wording on my part; I should’ve said, “updatable terminal goals”. I agree with what you said there.
How so ? Are you asserting that there exists an optimal ethical system that is independent of the actors’ goals ?
Yes, that’s what this whole discussion is about.
I don’t feel confident enough in either “yes” or “no” answer, but I’m currently leaning toward “no”. I am open to persuasion, though.
I personally don’t know of any evidence in favor of terminal values, so I do agree with you there. Still, it makes a nice thought experiment: could we create an agent possessed of general intelligence and the ability to self-modify, and then hardcode it with terminal values ? My answer would be, “no”, but I could be wrong.
That said, I don’t believe that there exists any kind of a universally applicable moral system, either.
people do not have the same goals at 5 as they do at 20 or 60
Source?
They take different actions, sure, but it seems to me, based on childhood memories etc, that these are in the service of roughly the same goals. Have people, say, interviewed children and found they report differently?
This is about rational agents. If pebble sorters can’t think of a non-arbitrary reason for sorting pebbles, they would recognise it a silly.
I’d use humans as a counterexample, but come to think, a lot of humans refuse to believe our goals could be arbitrary, and have developed many deeply stupid arguments that “prove” they’re objective.
However, I’m inclined to think this is a flaw on the part of humans, not something rational.
This is about rational agents. If pebble sorters can’t think of a non-arbitrary reason for sorting pebbles, they would recognise it a silly. Why not? Humans can spend years collecting stamps, or something, only to decide it is pointless.
What...why...? Is there something special about silicon? Is it made from different quarks?
Being rational doesn’t automatically make an agent able to read its own source code. Remember that, to the pebble-sorters, sorting pebbles is an axiomatically reasonable activity; it does not require justification. Only someone looking at them from the outside could evaluate it objectively.
Not at all; if you got some kind of a crazy biological implant that let you examine your own wetware, you could do it too. Silicon is just a convenient example.
Humans can examine their own thinking. Not perfectly, because we aren’t perfect. But we can do it, and indeed do so all the time. It’s a major focus on this site, in fact.
You can define a pebblesorter as being unable to update its values, and I can point out that most rational agents won’t be like that. Most rational agents won’t have unupdateable values, because they will be messilly designed/evolved, and therefore will be capable of converging on an ethical system via their shared rationality.
We are messily designed/evolved, and yet we do not have updatable goals or perfect introspection. I absolutely agree that some agents will have updatable goals, but I don’t see how you can upgrade that to “most”.
How so ? Are you asserting that there exists an optimal ethical system that is independent of the actors’ goals ? There may well be one, but I am not convinced of this, so you’ll have to convince me.
We blatantly have updatable goals: people do not have the same goals at 5 as they do at 20 or 60.
I don’t know why perfect introspection would be needed to have some ability to update.
How so ? Are you asserting that there exists an optimal ethical system that is independent of the actors’ goals ?
Yes, that’s what this whole discussion is about.
Sorry, that was bad wording on my part; I should’ve said, “updatable terminal goals”. I agree with what you said there.
I don’t feel confident enough in either “yes” or “no” answer, but I’m currently leaning toward “no”. I am open to persuasion, though.
You can make the evidence compatble with the theory of terminal values, but there is still no support for the theory of terminal values.
I personally don’t know of any evidence in favor of terminal values, so I do agree with you there. Still, it makes a nice thought experiment: could we create an agent possessed of general intelligence and the ability to self-modify, and then hardcode it with terminal values ? My answer would be, “no”, but I could be wrong.
That said, I don’t believe that there exists any kind of a universally applicable moral system, either.
Source?
They take different actions, sure, but it seems to me, based on childhood memories etc, that these are in the service of roughly the same goals. Have people, say, interviewed children and found they report differently?
How many 5 year olds have the goal of Sitting Down WIth a Nice Cup of Tea?
One less now that I’m not 5 years old anymore.
Could you please make a real argument? You’re almost being logically rude.
Why do you think adults sit down with a nice cup of tea? What purpose does it serve?
I’d use humans as a counterexample, but come to think, a lot of humans refuse to believe our goals could be arbitrary, and have developed many deeply stupid arguments that “prove” they’re objective.
However, I’m inclined to think this is a flaw on the part of humans, not something rational.