In my experience, people mostly haven’t had the view of “we can just do CEV, it’ll be fine” and instead have had the view of “before we figure out what our preferences are, which is an inherently political and messy question, let’s figure out how to load any preferences at all.”
It seems like there needs to be some interplay here—”what we can load” informs “what shape we should force our preferences into” and “what shape our preferences actually are” informs “what loading needs to be capable of to count as aligned.”
Yeah, it’s sort of awkward that there are two different things one might want to talk about with FOOM: the idea of recursive self improvement in the typical I.J. Good sense, and the “human threshold isn’t special and can be blown past quickly” idea. AlphaZero being able to hit the superhuman level at Go after 3 days of training, and doing so only a year or two after any professional Go player was defeated by a computer, feels relevant to the second thing but not the first (and is connected to the ‘fleets of cars will learn very differently’ thing Peterson is pointing at).
[And the two actually are distinct; RSI is an argument for ‘blowing past humans is possible’ but many ‘slow takeoff’ views look more like “RSI pulls humans along with it” than “things look slow to a Martian,” and there’s ways to quickly blow past humans that don’t involve RSI.]
If “collaborative” is qualifying truth-seeking, perhaps we can see it more easily by contrast with non-collaborative truthseeking. So what might that look like?
I might simply be optimizing for the accuracy of my beliefs, instead of whether or not you also discover the truth.
I might be optimizing competitively, where my beliefs are simply judged on whether they’re better than yours.
I might be primarily concerned about learning from the environment or from myself as opposed to learning from you.
I might be following only my interests, instead of joint interests.
I might be behaving in a way that doesn’t incentivize you to point out things useful to me, or discarding clues you provide, or in a way that fails to provide you clues.
This suggests collaborative truthseeking is done 1) for the benefit of both parties, 2) in a way that builds trust and mutual understanding, and 3) in a way that uses that trust and mutual understanding as a foundation.
There’s another relevant contrast, where we could look at collaborative non-truthseeking, or contrast “collaborative truthseeking” as a procedure with other procedures that could be used (like “allocating blame”), but this one seems most related to what you’re driving at.
YouTube’s transcript (with significant editing by me, mostly to clean and format):
Now the guys that are building the autonomous cars, they don’t think they’re building autonomous cars. They know perfectly well what they’re doing. They’re building fleets of mutually intercommunicating autonomous robots and each of them will to be able to teach the other because their nervous system will be the same and when there’s ten million of them, when one of them learns something all ten million of them will learn it at the same time. They’re not gonna have to be very bright before they’re very very very smart.
Because us, you know, we’ll learn something. You have to imitate it, God that’s hard. Or I have to explain it to you and you have to understand it and then you have to act it out. We’re not connected wirelessly with the same platform, but robots they are and so once those things get a little bit smart they’re not going to stop at a little bit smart for very long they’re gonna be unbelievably smart like overnight.
And they’re imitating the hell out of us right now too because we’re teaching them how to understand us every second of every day the net is learning what we’re like. It’s watching us, it’s communicating with us, it’s imitating us and it’s gonna know. It already knows in some ways more about us than we know about ourselves. There’s lots of reports already of people getting pregnancy ads or ads for infants, sometimes before they know they’re pregnant, but often before they’ve told their families. The way that that happens is the net is watching what they’re looking at and inferring with its artificial intelligence and so maybe you’re pregnant that’s just tilting you a little bit to interest in things that you might not otherwise be interested in. The net tracks that, then it tells you what you’re after it does that by offering an advertisement. It’s reading your unconscious mind.
Well, so that’s what’s happening.
We’ve been in something of a transition period with the alignment forum, where no one was paying active attention to promoting comments or posts or adding users, but starting soon I should be doing that. The primary thing that happens when someone’s an AF member is that they can add posts and comments without approval (and one’s votes also convey AF karma); I expect I’ll mostly go through someone’s comments on AF posts and ask “would I reliably promote content like this?” (or, indeed, “have I reliably promoted this person’s comments on AF posts?”).
Details about what sort of comments I’ll think are helpful or insightful are, unfortunately, harder to articulate.
Overall, I was pretty impressed by this; there were several points where I thought “sure, that would be nice, but obstacle X,” and then the next section brought up obstacle X.
I remain sort of unconvinced that utility functions are the right type signature for this sort of thing, but I do feel convinced that “we need some sort of formal synthesis process, and a possible end product of that is a utility function.”
That is, most of the arguments I see for ‘how a utility function could work’ go through some twisted steps. Suppose I’m trying to build a robot, and I want it to be corrigible, and I have a corrigibility detector whose type is ‘decision process’ to ‘score’. I need to wrap that detector with a ‘world state’ to ‘decision process’ function and a ‘score’ to ‘utility’ function, and then I can hand it off to a robot that does a ‘decision process’ to ‘world state’ prediction and optimizes utility. If the robot’s predictive abilities are superhuman, it can trace out whatever weird dependencies I couldn’t see; if they’re imperfect, then each new transformation provides another opportunity for errors to creep in. And it may be the case that this is a core part of reflective stability (because if you map through world-histories you bring objective reality into things in a way that will be asymptotically stable with increasing intelligence) that doesn’t have another replacement.
I do find myself worrying that embedded agency will require dropping utility functions in a deep way that ends up connected to whether or not this agenda will work (or which parts of it will work), but remain optimistic that you’ll find out something useful along the way and have that sort of obstacle in mind as you’re working on it.
Fixed a typo.
So, I was just recommended Plastination is Maturing and Needs Funding. I considered putting some effort into “what’s the state of plastination in 2019, 7 years later?” and commenting, but hit a handful of obstacles, one of which was “is the state of plastination in 2019 long content?”. Like, the relevant fund paid out its prizes at various times, and it’d take a bit more digging to figure out if the particular team in Hanson’s post was the one that won, and it’s not really obvious if it matters. (Suppose we discover that the prize wasn’t won by that team, after the evaluation was paid for; what does that imply?)
This makes me more excited about John’s idea that shows posts with some simultaneity between users; like the Sequences Reruns, for example. It might be worth it to have a comment writing up what’s changed for the other people clicking on it in 2019 who don’t know where to look or aren’t that committed to figuring things out, where it doesn’t make sense to push that post into ‘recent discussion’ on my own (if this was randomly picked for me).
I fixed it more.
Ruby, you might also want to borrow Why the West Rules—for Now from me; it focuses less on the scientific question and more on the economic and technological one (which ends up being connected), but I’m not sure it’ll be all that different from Huff.
It’s been asserted [source] that having Latin as a lingua franca was important for Europe integrated market for ideas. Makes sense if scholars who otherwise speak different languages are going to be able to communicate.
But the Muslim world was much better off in this regard, with Arabic, and while China has major linguistic variation I think it also had a ‘shared language’ in basically the same way Latin was a shared language for Europe.
It seems to me like the thing that’s important is not so much that the market is integrated, but that there are many buyers and sellers. The best works of Chinese philosophy, as far as I can tell, come from the period when there was major intellectual and military competition between competing factions; the contention between the Hundred Schools of Thought. And then after unification the primary work available for scholars was the unified bureaucracy, which was interested in the Confucian-Legalist blend that won the unification war, and nothing else.
I would imagine so, because it means you learn the cards as opposed to the sequence of cards. (“In French, chateau always follows voiture.”)
I mean, I think it would be more accurate to say something like “the die roll, as it’s uncorrelated with features of the decision, doesn’t give me any new information about which action is best,” but the reason why I point to CoEE is because it is actually a valid introspective technique to imagine acting by coinflip or dieroll and then see if you’re hoping for a particular result, which rhymes with the “if you can predictably update in direction X, then you should already be there.”
The SSC post that motivated finally finishing this up was Book Review: The Secret Of Our Success, which discusses the game-theoretic validity of randomization in competitive endeavors (like hunters vs. prey, or generals vs. generals). It seemed important to also bring up the other sorts of validity, of randomness as debiasing or de-confounding (like why randomized controlled trials are good) or randomness as mechanism to make salient pre-existing principles. I’m reminded of some online advice-purveyor who would often get emails from people asking if their generic advice applied to their specific situation; almost always, the answer was ‘yes,’ and there was something about the personal attention that was relevant; having it be the case that this particular bit of advice was selected for the situation you’re in makes it feel worth considering in a way that “yeah, I guess I could throw this whole book of advice at my problem” doesn’t.
This was probably an accurate depiction of American corporate management when it was written, in the 80s. Since then, things have changed somewhat (in part by tech becoming a larger fraction of the economy, and by increasing meritocracy through increased competitiveness), but I think it’s still present in a major way.
It seems like most of these quotes are directly at odds with seeking profit (either long- or short-term), and it would be enlightening to hear why there’s not a bunch more efficient organizations taking over.
I think this is happening, but it’s slow. Koch Industries claims that a major piece of social tech they use is compensating managers based on the net present value of the thing they’re managing, rather than whether they’re hitting key targets, and they’re growing at something like 10% faster than the rest of the economy, but that still means a very long time until they’ve taken over (and the larger they get, the harder it is to maintain that relative rate).
If we are going to have the exception to the norm at all, then there has to be a pretty high standard of evidence to prove that adding ‘Y’ to the discourse, in fact, has bad consequences.
I want to note that LW definitely has exceptions to this norm, if only because of the boring, normal exceptions. (If we would get in trouble with law enforcement for hosting something you might put on LW, don’t put it on LW.) We’ve had in the works (for quite some time) a post explaining our position on less boring cases more clearly, but it runs into difficulty with the sort of issues that you discuss here; generally these questions are answered in private in a way that connects to the judgment calls being made and the particulars of the case, as opposed to through transparent principles that can be clearly understood and predicted in advance (in part because, to extend the analogy, this empowers the werewolves as well).
[Written as an admin]
First and foremost, LW is a space for intellectual progress about rationality and related topics. Currently, we don’t ban people for being fixated on a topic, or ‘darkly hinting,’ or posts they make off-site, and I don’t think we should. We do keep a careful eye on such people, and interpret behavior in ‘grey areas’ accordingly, in a way that I think reflects both good Bayesianism and good moderation practice.
In my favorite world, people who disagree on object-level questions (both political and non-political) can nevertheless civilly discuss abstract issues. This favors asymmetric weapons and is a core component of truth-seeking. So, while hurt feelings and finding things unpleasant are legitimate and it’s worth spending effort optimizing to prevent them, we can’t give them that much weight unless they differentiate the true and the untrue.
That said, there are ways to bring up true things that as a whole move people away from the truth, and you might be worried about agreements on abstractions being twisted to force agreement on object-level issues. These are hard to fight, and frustrating if you see them and others don’t. The best response I know is to catalog the local truths and lay out how they add up to a lie, or establish the case that agreement on those abstractions doesn’t force agreement on the object-level issues, and bring up the catalog every time the local truth advances a global lie. This is a lot more work than flyswatting, but has a much stronger bent towards truth. If you believe this is what Zack is doing, I encourage you to write a compilation post and point people to it as needed; due to the nature of that post, and where it falls on the spectrum from naming abstract dynamics to call-out post, we might leave it on your personal blog or ask that you publish it outside of LW (and link to it as necessary).