Which is funny because there is at least one situation where robin reasons from first principles instead of taking the outside view (cryonics comes to mind). I’m not sure why he really doesn’t want to go through the arguments from first principles for AGI.
rahulxyz
Small world, I guess :) I knew I heard this type of argument before, but I couldn’t remember the name of it.
So it seems like the grabby aliens model contradicts the doomsday argument unless one of these is true:
We live in a “grabby” universe, but one with few or no sentient beings long-term?
The reference classes for the 2 arguments are somehow different (like discussed above)
Thanks for the great writeup (and the video). I think I finally understand the gist of the argument now.
The argument seems to raise another interesting question about the grabby aliens part.
He’s using the hypothesis of grabby aliens to explain away the model’s low probability of us appearing early (and I presume we’re one of these grabby aliens). But this leads to a similar problem: Robin Hanson (or anyone reading this) has a very low probability of appearing this early amongst all the humans to ever exist.
This low probability would also require a similar hypothesis to explain away. The only way to explain that is some hypothesis where he’s not actually that early amongst the total humans to ever exist which means we turn out not to be “grabby”?
This seems like one the problems with anthropic reasoning arguments and I’m unsure how seriously to take them.
It doesn’t seem crazy to me that a GPT type architecture with the “Stack More Layers” could eventually model the world well enough to simulate consequentialist plans—i.e given a prompt like:
“If you are a blender with legs in environment X, what would you do to blend apples?” and provide a continuation with a detailed plan like the above (and GPT4/5 etc with more compute giving slightly better plans—maybe eventually at a superhuman level)
It also seems like it could do this kind of consequentialist thinking without itself having any “goals” to pursue. I’m expecting the response to be one of the following, but I’m not sure which:
“Well, if it’s already make consequentialist plans, surely it has some goals like maximizing the amount of text it generates etc., and will try to do whatever it can to ensure that (similar to the “consequentialist alphago” example in the conversation) instead of just letting itself be turned off.
A LLM / GPT will never be able to reliably output such plans with the current architecture or type of training data.