Yeah I had a convo about Kelly a few years ago where (I may be misremembering who said what) I was like “But my utility is basically linear in money, or in some sense it’s even significantly superlinear (because it would be easier to get a bunch of information and connections and good investments in a snowbally way), so I obviously should just bet as much as I can if I really think I have an edge.”. I think this is basically right, but not that relevant in practice just because there aren’t really opportunities like that unless you really go looking for them, and it’s much better to have some money even if you do have a safety net.
TsviBT
I do generally find myself more concerned about genetic engineering for extreme intelligence than other people in this cluster seem to be.
Tangent of course, but happy to discuss, whether in private or on a podcast.
I suggest that another underlying source of disagreement could be about the general factor of: what is this function, approximately:
[the similarity of all previous tries put together, to try X]
--> [how much should we expect to fail on try X on the grounds that it’s first-try-ish, i.e. how much is it Murphy’s Cursed]
If you think that even a fair amount of similarity still doesn’t get you to success on one-shot problems, then you’d talk about oneshotness as being a strong argument against AGI alignment attempts working out well. This kinda sounds like what Yudkowsky is arguing in this post. Someone else could disagree with that.
They may think it’s critical but not very firsty, i.e. sufficiently similar to / comparable to / generalizable from previous tries.
Another underlying disagreement could be about the general factor of: what is this function, approximately:
[the similarity of all previous tries put together, to try X]
--> [how much should we expect to fail on try X on the grounds that it’s first-try-ish, i.e. how much is it Murphy’s Cursed]
I would imagine that this isn’t the main source of disagreement, but I do find it hard to see how [create an alien mind that’s smarter than humanity] ends up less firsty than [make a Mars rover] etc., so I’m wondering if that’s not the claim.
I very clearly said that in my comment… Anyway, I guess there’s nothing to discuss here, I’m just saying that abortability is a relevant dimension to these scenarios. It’s something that’s brought up often, and also it bears on first-try-ness. If there is a situation that is akin to the eventual critical first try, but is abortable, then that would imply that when you get the eventual critical try, it doesn’t have to be your first try. There’s a nontrivial argument to make about “when it’s abortable, it’s not akin enough to the eventual critical try”.
I’m not actually sure exactly what “critical” means here. I’m taking it to just mean “you absolutely must get this try right”. That’s closely connected to abortability, in that if you can abort, it’s not fully lethal / critical yet. I don’t think it’s really the same thing, e.g. you could imagine an LLM-based bacterial package (a more complex “computer virus”) that permanently lives on many computer systems and is basically impossible to abort (short of scouring the planet of all computers with more than 16 GB of memory or whatever).
There’s whether or not you get to try again after your first try, and there’s how late in the game you can decide to not fully do the try at all. There’s at least 3 kinds of outcomes:
You abort (don’t fully do the try).
You do the try and succeed.
You do the try and fail (and can’t try again).
Because unaligned AGI is lethal, you don’t get to try again.
You kids need to turn off the car.
Right, another dimension to these scenarios is abortability. At some point, we cross out of technically feasible abortability—we (humans) wouldn’t be able to abort the AI’s growth even if we tried. Whether things are abortable before then depends on how humans react over time / new information (e.g. heeding arguments, heeding warning shots, being credulous about apparent alignment, etc.).
Well, one can point concretely at the exampled but not climaxed possibility of being better at putting ideas into words: https://www.lesswrong.com/posts/7RFC74otGcZifXpec/the-possible-shared-craft-of-deliberate-lexicogenesis
(Speaking for myself of course)
But what counts as a first try?
A given try is more firsty if it’s less like all previous tries put together. A one-shot problem is one where you try a pretty firsty try, and that try is likely to kill humanity.
How many previous tries are in the same class (e.g. small asteroid / big asteroid, or GPT2 / GPTN) is relevant in that a priori more such tries might suggest that future tries are less firsty. But it’s also perfectly plausible a priori to have lots of tries that you survive, and then a one-shot problem (lethal and very firsty).
You could even have a series of one-shot problems. Imagine for example that you have a lethal asteroid—but you saw it 10 years in advance, and it’s small enough that you can stop it with nukes. It’s one-shot (lethal and firsty), but maybe you survive. Then you have another similar asteroid, but you only saw it 6 months in advance—that might be another one-shot problem (do it all again but way faster). Then you have an asteroid that’s so big, all the nukes in the world wouldn’t stop it. One-shot again; there’s totally new, crucial challenges to solve.
I think that with AI, you very likely get a one-shot problem in the ballpark of superhuman AGI. It’s lethal, in that it would by default go on and extinct humanity, and very firsty, in that many core alignment difficulties first show up there.
I guess one thing I’m saying is that one can have any given beliefs without taking the Slighted affiliations/attitudes too far. It’s not a mere choice, it’s a skill, potentially a big one (like lots of subskills, like “programming” is a skill), that one could learn over time.
(Sorry, your post seems pretty interesting but I don’t have enough spare bandwidth; I’ll just note that those things you list here sound of course related to Slight but being Slighted is always a kind of choice you’re making, similar to being Consensus, and neither is that great—they both have advantages and disadvantages. They’re social and attitudinal attractors to some extent, but far from absolute.)
Yeah that’s fair.
My wild guess would be that there is a significant local persuasion overhang in the sense that if someone set up an RL context (like, any scaled up persuasive context, e.g. bots on various social platforms with clear feedback), there would be fast gains, maybe to the point of being really disruptive in certain classes of contexts. (There is another theory which states that this has already happened.) I think you’d then hit an asymptote below being relevant to most important contexts.
(Because today’s systems would not be able to follow along with how humans change their stances in response to things like this. For example, image generation can easily fool people, but for most people there would only be a brief period during which that person would send money based mainly on receiving a realistic image which, if real, would make them want to send money. They’d just learn to not do that.)
A Slighted might view a Consensus as sanctimonious, cowardly, conformist, overconfident, boring, virtue signaling, envious, self-deceiving, power-thirsty (in the sense of wanting to be in control of arbitrary social consensus), wanting to cut down tall poppies, arbitrary / not truth seeking, getting sucked into a locked-in equilibrium, tribalistic.
A Consensus might view a Slighted as delusional, aggressive, contrarian, overconfident, wanting to feel special, power-thirsty (in the sense of wanting to avoid social or legal accountability), needing to be put in their place / needing to be socialized / needing moral correction / needed to have values transmitted to them, overly focused on the perceived Slight, having a big ego about their contrarian positions and not updating, getting sucked into spirals of networks of misinformation, reckless, uncaring, tribalistic, paranoid/conspiratorial.
Although there are a few people I’ve talked to who don’t feel Slighted but are maybe oddly ambivalent about AGI/ASI creation due to “climate change, constant new wars, loneliness and addictions, WW3 indicator” and not feeling like it would lead to doom
Interesting. I guess feeling disenfranchised often leads one to being Slighted, but doesn’t necessarily, since one could instead just do some kind of giving up. (One could also do other things, such as maybe finding other hopeworthy long-term shared intentions with other people to invest in.)
Example: Joe Rogan is Slighted
Example: Graham Hancock (and generally “conspiracy” theorists in the sense of people who have a hard to change belief in a big hidden truth that is just barely hinted at) is Slighted, Flint Dibble is Consensus https://www.youtube.com/watch?v=-DL1_EMIw6w
Example: LessWrong tends Slighted I think
(I think the ontology here is probably complicated because “Slighted” is a predicate on (person, group of people they view as their “containing context” such as a country or religion or ideology or similar), so it can apply in crisscrossing ways. Slighted can even form large communities, which seems paradoxical, but there you go.)
IDK. Something like, Consensus is the default agreed-upon stance / the people who take that stance; Slighted is the people who don’t take that stance. (To be clear, I think I’m mostly saying something fairly obvious / not novel.)
For various reasons, the Slighted often end up viewing themselves as having been slighted by the Consensus. (E.g. they were actually slighted by the Consensus; or they were slighted by someone, and misattributed it to the Consensus; or they weren’t really slighted, but view themselves that way anyway.)
As an example, sometimes tech people (Slighted) are super dismissive of academia (Consensus) in general, describing them as cowards, liars, etc.
The ends of political horseshoes are very often Slighted. (Though not necessarily. Nick Fuentes is extremely Slighted; a far-right conservative guy who views himself as being old establishment may not be especially Slighted by default, though might later become so. Likewise far left.) Slightedness attracts and creates Slightedness. E.g. because you might actually be slighted some as punishment for association with other Slighted; and becauase you hang out with Slighted; etc. That’s also true for Consensus, though with different flavors.
There’s a massive divide between the Consensus and the Slighted that runs through many domains. I think healing this would in particular help decrease AGI X-risk by decreasing the motivation to make AGI and decreasing the motivated reasoning around X-risks. A conjecture is that both poles are attracting states in social space (let’s say the vector space of people represented as a vector v where v_i is how much you talk with / listen to person number i).
I think there’s a middle ground statement: