I wonder if the attractor state of powerful beings is a bipole consisting of:
a. wireheading / reward hacking, facing one’s inner world
b. defense, facing one’s outer world
As we’ve gotten more and more control over our environment, much of what we humans seem to want to do resembles reward hacking: video games, sex-not-for-procreation, solving captivating math problems, etc. In an ideal world, we might just look to do that all day long, in particular if we could figure out how to zap our brains into making every time feel like the first time.
However, if you spend all day wireheading and your neighbor doesn’t, your neighbor will outpace you in resource generation and may be able to, one way or another, melt you down for scrap (and repurpose your resources for their own wireheading, possibly).
Much human culture (e.g. social customs, religion) can be understood as an attempt to temper some of the wireheading in favor of more defense, i.e. it’s discouraged as immoral to over-indulge yourself on video games, you should be out working hard instead.
Perhaps this, or something akin to it, could be expected to hold for the behavior of advanced AI systems. The end state of superintelligences may be perfect wireheading hidden behind the impenetrable event horizon of a black hole so that nobody can disturb its reverie.[1]
- ^
Of course, it would be bad news if the epitome of defense is wiping out anything else that may surprise it, a la the List of Lethalities.
There’s a justifiable model for preferring “truthiness” / vibes to analytical arguments, in certain cases. This must be frustrating to those who make bold claims (doubly so for the very few whose bold claims are actually true!)
Suppose Sophie makes the case that pigs fly in a dense 1,000 page tome. Suppose each page contains 5 arguments that refer to some of / all of the preceding pages. Sophie makes the claim that I am welcome to read the entire book, or if I’d like I can sample, say, 10 pages (10 * 5 = 50 arguments) and reassure myself that they’re solid. Suppose that the book does in fact contain a lone wrong argument, a bit flip somewhere, that leads to the wrong result, but is mostly (99.9%) correct.
If I tell Sophie that I think her answer sounds wrong, she might say: “but here’s the entire argument, please go ahead, and show me where any of it is incorrect!”
Since I’m very unlikely to catch the error at a glance, and I’m unlikely to want to spend the time to read and grok the whole thing, I’m going to just say: sorry but the vibes are off, your conclusion just seems too far off my prior, I’m just going to assume you made a difficult-to-catch mistake somewhere, but I’m not going to bother finding it.
This is reasonable on my part since I’m forced to time-ration, but must be very frustrating for Sophie, in particular if she genuinely believes she’s right (as opposed to just being purposefully deceptive.)
There’s also the possibility of a tragedy-of-the-commons here, whereby spendign my time on this is selfishly not in my best interest, but has positive externalities.