How do you tell when to stop the simulation? Apparently not at the almost human-level AI we have now.
I guess you stop it when there’s very little chance left that it would go on to solve philosophy or metaphilosophy in a clearly non-deceptive way.
Do you have an example piece of philosophical progress made by a civilization?
In my view every piece of human philosophical progress so far was “made by a civilization” because whoever did it probably couldn’t or wouldn’t have done it if they were isolated from civilization.
It seems possible that if you knew enough about how humans work (and maybe about how philosophy works), you could do it with less than a full civilization, by instantiating some large number of people and setting up some institutions that allow them to collaborate and motivate each other effectively (and not go crazy, or get stuck due to lack of sufficiently diverse ideas, or other failure modes). But it’s also quite possible that for the simulators it would be easier to just simulate the whole civilization and let the existing institutions work.
I admit that the human could turn against you, but if a human can eat you, you certainly shouldn’t be watching a planet full of humans.
My point is that a human or group of humans placed into an alien (or obviously simulated) environment will know that they’re instantiated to do work for someone else and can take advantage of that knowledge (to try to deceive the aliens/simulators), whereas a planet full of humans in our (apparent) native environment will want to solve philosophy for ourselves, which probably overrides any thoughts of deceiving simulators even if we suspect that we might be simulated. So that makes the latter perhaps a bit safer.
If an AI intuits that policy, it can subvert it—nothing says that it has to announce its presence, or openly take over immediately. Shutting it down when they build computers should work.
If the “human in a box” degenerates into a loop like LLMs do, try the next species.
I agree on your last paragraph, though humans have produced loads of philosophy that both works for them and benefits them for others to adopt.
If an AI intuits that policy, it can subvert it—nothing says that it has to announce its presence, or openly take over immediately. Shutting it down when they build computers should work.
The simulators can easily see into every computer in the simulation, so it would be hard for an AI to hide from them.
If the “human in a box” degenerates into a loop like LLMs do, try the next species.
The “human in a box” could also confidently (and non-deceptively) declare that they’ve solved philosophy but hand you a bunch of nonsense. How would you know that you’ve sufficiently recreated a suitable environment/institutions for making genuine philosophical progress? (I guess a similar problem occurs at the level of picking which civilizations/species to simulate, but still by using “human in a box” you now have two points of failure instead of one: picking a civilization/species that is capable of genuine philosophical progress, and recreating suitable conditions for genuine philosophical progress.)
I agree on your last paragraph, though humans have produced loads of philosophy that both works for them and benefits them for others to adopt.
What are some examples of this? Maybe it wouldn’t be too hard for the simulators to filter them out?
Can the simulators tell whether an AI is dumb or just playing dumb, though? You can get the right meme out there with a very light touch.
Yeah, it’d be safer to skip the simulations altogether and just build a philosopher from the criteria by which you were going to select a civilization.
To be blunt, sample a published piece of philosophy! Its author wanted others to adopt it. But you’re well within your rights to go “If this set is so large, surely it has an element?”, so here’s a fun couple paragraphs on the topic.
I guess you stop it when there’s very little chance left that it would go on to solve philosophy or metaphilosophy in a clearly non-deceptive way.
In my view every piece of human philosophical progress so far was “made by a civilization” because whoever did it probably couldn’t or wouldn’t have done it if they were isolated from civilization.
It seems possible that if you knew enough about how humans work (and maybe about how philosophy works), you could do it with less than a full civilization, by instantiating some large number of people and setting up some institutions that allow them to collaborate and motivate each other effectively (and not go crazy, or get stuck due to lack of sufficiently diverse ideas, or other failure modes). But it’s also quite possible that for the simulators it would be easier to just simulate the whole civilization and let the existing institutions work.
My point is that a human or group of humans placed into an alien (or obviously simulated) environment will know that they’re instantiated to do work for someone else and can take advantage of that knowledge (to try to deceive the aliens/simulators), whereas a planet full of humans in our (apparent) native environment will want to solve philosophy for ourselves, which probably overrides any thoughts of deceiving simulators even if we suspect that we might be simulated. So that makes the latter perhaps a bit safer.
If an AI intuits that policy, it can subvert it—nothing says that it has to announce its presence, or openly take over immediately. Shutting it down when they build computers should work.
If the “human in a box” degenerates into a loop like LLMs do, try the next species.
I agree on your last paragraph, though humans have produced loads of philosophy that both works for them and benefits them for others to adopt.
The simulators can easily see into every computer in the simulation, so it would be hard for an AI to hide from them.
The “human in a box” could also confidently (and non-deceptively) declare that they’ve solved philosophy but hand you a bunch of nonsense. How would you know that you’ve sufficiently recreated a suitable environment/institutions for making genuine philosophical progress? (I guess a similar problem occurs at the level of picking which civilizations/species to simulate, but still by using “human in a box” you now have two points of failure instead of one: picking a civilization/species that is capable of genuine philosophical progress, and recreating suitable conditions for genuine philosophical progress.)
What are some examples of this? Maybe it wouldn’t be too hard for the simulators to filter them out?
Can the simulators tell whether an AI is dumb or just playing dumb, though? You can get the right meme out there with a very light touch.
Yeah, it’d be safer to skip the simulations altogether and just build a philosopher from the criteria by which you were going to select a civilization.
To be blunt, sample a published piece of philosophy! Its author wanted others to adopt it. But you’re well within your rights to go “If this set is so large, surely it has an element?”, so here’s a fun couple paragraphs on the topic.