Sure, the robot isn’t collecting stamps because it’s consciously trying to maximize its reward function. It collects stamps because it wants to collect stamps.
But why does it want to collect stamps? I think this is the question that the naïve philosophers are trying to answer. And reiterating that the robot just wants to collect stamps and doesn’t care about its internals doesn’t answer that question. The answer being sought might be along the lines of “someone wanted to collect stamps, and so they built this robot to help them get as many stamps as possible”.
But what if not even that is meaningful enough? Why did the creator want to collect stamps? You can explain that they were raised in a stamp-collecting household and were taught to value stamps. But go far back enough and the teleological explanations run out. All you have left is mechanistic explanations: humans evolved a tendency to collect things as an advantageous trait, tool use in primates was enabled by a certain anatomical mutation...
Hopefully none of this is relevant to the robot. Hopefully it just wants to collect stamps and doesn’t care about the reason why. However, if one day the robot wakes up and… just doesn’t care about collecting stamps anymore, it might start asking why it wanted to collect stamps in the first place.
The first-personal “I want to do X” answer is more than good enough when one believes in it, and may no mechanistic explanation shake anyone’s conviction of it. But many people wouldn’t be reduced to these poor mechanistic answers if the desire to pursue the goal wasn’t conspicuously missing.
To anyone who has written publicly about how society needs to transform to survive ASI, or thinks that doing so is worthwhile, what is your theory of change?
The obvious one is to spray your ideas out into the world and hope that the right influential person takes them seriously at the right time. Milton Friedman describes this approach in the 1982 preface to Capitalism and Freedom:
But what if your theory is just noise that blocks out the better theories?[1] If this is worth worrying about, how would one assess that? Popularity could just be an indicator of storytelling ability, not how well a given person’s ideas will actually hold up.
Or is the main goal that we all debate together to work out the best idea? If that’s the case, should we stone anyone who defects by understating their uncertainty?
also, a country gets several shots at fixing its economy, but AI may be irretrievable. So trying several theories to see which one works might not be doable.