Blog: Words of What Could Be
Onni
Some kind of meta grumbling about this concept:
Both trust handoff and decision handoff feel a bit slippery and “unfalisifiable” as concepts. What does it mean to automate the economy in a way that isn’t one or both? You say that ideally you’d want to do “trust handoff never thanks to some fancy scheme for AIs to watchdog each other”. But then your description of the “other extreme” of trust handoff sounds a lot like just such a fancy scheme. So is that trust handoff or not? I get that it’s a spectrum, but it seems really unhelpful to collapse this into a binary, and by extension into “an event” that happens at some point in time.I feel like this slipperiness creates a lot of potential for miscommunication and semi-intentional motte-and-baileys here? E.g. if I say that we should just never do trust handoff, and someone else thinks it’s inevitable, it’s hard to tell if we’re even disagreeing on the object level? Relatedly, I’ve often heard people treat “handoff” as the inevitable end point of the development of superintelligence, and your definition seems to allow a kind of motte and bailey here where you can retreat to some unfalsifiable motte of “well if you automate the economy surely the AIs could take over if they all suddenly coordinated”, and then return to a bailey of “we’re inevitably going to do a really risky form of handoff”, without necessarily even noticing that that’s what you did.
Relatedly, if I try to think about which actions in human society constitute “trust handoff”, this feels very slippery and confusing? Are humans currently doing a lot of trust handoff to other humans? In some sense, sort of: If all of a company’s employees decided to just destroy the company, they could. The board and shareholders spend relatively little time overseeing most companies. But the overall societal incentives are very much against destroying the company, so I don’t think this is well described as “trust handoff”?
Extending the analogy to the level of an entire society: If there’s some class of shareholders, it’s probably the case that if all employees of the companies and the government coordinated against you, they could just expropriate you. And in that sense “the working class” collectively is in a position to take over the world if they all suddenly gained class consciousness and coordinated against the shareholder class. This is certainly a somewhat risky situation for the shareholder class to be in, but it doesn’t feel very natural to me to describe this as the shareholder class “handing off trust” to the working class? Indeed, for some definition of “shareholder class” and “working class”, the scenario above arguably already approximately describes our current world.
In general I worry that the phrase is importing a lot of implications of the word “trust” that are misleading?
Relatedly, it also seems kind of unlikely that anyone will ever make a decision that feels clearly like “I’m handing off trust now”? Maybe you disagree?
I really loved reading this, it resonated quite deeply. I’ve felt much the same, though for me the biggest thing that made me misanthropic was reading about EA and all the implied ways that most people are arguably monstrous by omission and conformity. After that, reading Nietzsche didn’t make it much worse.
In recent years I’ve been feeling somewhat less misanthropic, so wanted to quickly say a bit about how that’s happened, in case it’s useful. Probably the biggest influence was reading Joseph Henrich’s The Secret of Our Success. The core thesis of the book might be summarized as arguing that humans are mostly implementing an algorithm that isn’t really “observe evidence, build causal models, reason about optimal action” but rather something closer to “look around, identify individuals and groups that are high status (rich, popular) and then blindly copy whatever they’re doing, because apparently it’s working”. This in itself may not seem so revolutionary, but the more interesting bit is that Henrich argues that people do this because it has, at least historically, mostly worked better than the “rational”, nullius in verba approach. Henrich describes a bunch of examples where sophisticated European explorers went to strange places and tried to use Reason and Evidence instead of listening to the superstitious locals, and died horrible deaths of exposure, starvation, and poisoning as a result.
Anyway, this is roughly my sympathetic perspective on the common man: They’re mostly doing conformity rather than real reasoning, but they’re often better off for it! My guess is that the current world rewards reason more, and conformity less, than the ancestral environment, so probably most normies would be better off being less conformist. But I suspect intense exposure to LW might be net bad for the average person, because they would mess up the implementation. But conformity does have a good track record, and it rarely goes badly wrong.
Bonus note: I’ve also found the way that Tolkien writes about hobbits, and the way that e.g. Gandalf relates to hobbits, to be another useful model for how to relate to the common man in a way that is realistic but not misanthropic.
On alternative terminology: It feels more natural to me to think about it in terms of the inverse, of how much oversight and how carefully-designed controls / checks and balances you have. Also importantly, how much centralization into “one” AI, vs. diverse competing AIs. As you have more powerful AI, to some extent it becomes by default more difficult to oversee, and in a competitive situation there’s a temptation to skimp on those checks and balances. This feels closer to how we think about it in the human case, e.g. there’s a temptation to remove checks and balances from the executive in exchange for getting the trains to run on time, a board might feel like they don’t want to interfere in their genius CEOs actions, etc.
And yeah I was using “unfalsifiable” in a very loose sense, mostly to point at the fact that, without a definition of where you draw the line between e.g. trust handoff and not-trust-handoff, it becomes impossible to definitively say that e.g. a prediction that we’d do “trust handoff” at a given point in time, was wrong.