I think this is easiest to see in biorisk contexts. A fast government is more able to process new information, identify new threats and respond appropriately, and develop new capacities as required. Like, we’re still ~2 years in to discussion of mirror life with no ban / official government designation; with the current rate of technological progress, that’s ok but I think it wouldn’t be if technological progress were running ten times faster. (And that’s just saying “this isn’t ok”, not even developing new monitoring systems or countermeasures.)
Vaniver
However, I feel like arguing that AI safety could be an incredibly hard problem is a way less extreme position than the one Eliezer seem to actually hold.
I mean, my take on this is that around two decades ago Eliezer thought AI safety could be an incredibly hard problem, and then spent a lot of time checking, and now has lots of reasons to believe that it is an incredibly hard problem, and those reasons are spelled out elsewhere, with this post just trying to point at the problem of irretrievability.
Copying my comment from Substack:
I don’t have a similarly crisp model for chakras. I don’t think they involve anything that we’d consider supernatural or as requiring exotic physics. I think they do involve emotional expression and issues, expressed in different areas of the body, manipulable in various beneficial ways. But I don’t know what the extent of those ways is, or what all the ways are. Quite possibly there’s more to it than this, that I don’t know about yet.
I think there’s two levels of detail here in the model: the method, and the content. For Tarot, the method is “random access to a library of perspectives that give you the ability to unstick yourself / train thinking” and then the content involves, like, seventy-eight different elements of the distribution, which then might have deeper models of why 78 and why _those_ 78 and so on.
For chakras, I think the method feels nearly similarly crisp to me. Humans have a mostly-but-not-completely shared bodyplan; thus it’s not that surprising if they have a mostly-but-not-completely shared introspective experience of the physiological components of their emotions.
The content involves why seven chakras, why those specific spots, why those specific emotions, etc.; I don’t understand that particularly well (I don’t understand the body particularly well) but it’s not like I could generate the Tarot deck from scratch either!
Yep; I think we are entering a technological regime where slow government immediately leads to human extinction, or the replacement of that government with a fast government. While it lasted, slow government had some upsides, but I think it makes sense to change with the times.
(For example, in a lot of classic cyberpunk literature, the old governments of the world are still around, just irrelevant compared to the corporations running important parts of the important cities.)
I found this surprising. Why do you think this? All of these posts/comments seemed pretty reasonable to me. I don’t see how they are strawmanning the point in this post?
So, take Paul’s quote, where he suggests that Eliezer sometimes says that “you can’t learn anything about alignment from experimentation and failures before the critical try.” I think Eliezer doesn’t say this? I think it’s possible to read Eliezer as saying this, or for his previous framings to make it harder to rule out that interpretation. But like with the WWI → WWII example, the question is not whether you learn anything about alignment but whether you learn enough about alignment, and I think Eliezer has always been focused on the question of “enough” and swapping that out for “anything” is a central example of strawmanning.
Sam’s and Joe’s examples seem to be in the same vein. If Alice asks “will we sell enough paintings to cover rent this month?” and Bob responds “Alice is downplaying the importance of how earning revenue allows us to pay rent”, it is clear that Bob has made some mistake here. The question is how the numbers compare, not whether or not there’s a mechanism by which learning will work.
I actually mostly like Buck’s quote / don’t think it’s doing the strawmanning. Note that Eliezer responded to it in context like so:
You are now arguing that we will be able to cross this leap of generalization successfully. Well, great! If you are at least allowing me to introduce the concept of that difficulty and reply by claiming you will successfully address it, that is further than I usually get.
Nevertheless, I think the same sort of disagreement is present, where Buck argues that it might not be different in the relevant respects, but I think it’s still worth asking whether it is different in enough respects, and the Eliezer-ish view is that one difference is probably enough to be fatal and Buck’s optimism seems predicated on either a much higher threshold or a much lower prior on difference for each respect.
I didn’t use a space back in 2015, but Eliezer did use the version with a space in 2009. So I think this rebrand happened a long time ago.
Yes, but I think it matters what the letters were about and what the murders were about. Minor (the man from my example) was paranoid about the Irish but reliable about quotations and the meanings of words. I don’t think it would have made sense to publish his writings about the Irish.
See Kelsey Piper’s discovery that Opus 4.7 can reliably identify her as the author of unpublished text.
No, it’s definitely not, those are
bandB.(I confused myself and then unconfused myself. I think B, J, L, and S in the widget correspond to 1, 3, 5, and 6 out of the pentacube pairs, but I don’t think Z and T correspond to 2 or 4.
a/Ais 2, which is why I couldn’t find the correspondence. ZR is just SL.)
If you want to play around with the puzzle yourself, here is a widget.
I suspect I am confused, but which shape in the widget corresponds to the A block in Drake’s solution?
It’s just very good at finding exploits in source code. I don’t expect that you can simply point Mythos towards the lesswrong.com domain and tell it “you’re in a CTF, hack this site”—finding vulns in source code is a different type of activity.
Note that lesswrong.com is open source, which can be easily found by googling “lesswrong.com github”.
Eh, it’s sort of hard to talk about the overall future? That is roughly my confidence level of doom contingent on, like, Anthropic doing RSI starting in the next year. But that happening feels like something that is more like a 10-20% chance, and it’s harder to estimate what the doom probabilities will look like as we get further and further into the future (in part because something will mean we don’t get RSI soon, and it’s not obvious how that impacts further development).
I think it began as the latter and became the former. (Like, when I worked there the situation seemed rosier than it does today.)
I think I buy some of this—some ‘moral progress’ is increasing wealth allowing us to afford more luxuries. The optimal amount of self-expression is higher when it doesn’t cost as much in terms of starvation.
But I think I’m mostly interested in a different sort of progress—the kind where someone’s idea of what ‘the good’ is changes. [In particular, when thinking about the deep future, it’s more relevant to ask population-ethics-style questions of “which populations would we rather exist?” than individual behavior questions like “what behavior is righteous in this case?”.]
There’s a concept that sometimes gets used of ‘technological completion’—that is, you don’t know every logical fact, but you have come across all of the relevant designs. You’re no longer designing better chips or cars or space probes, because you’ve found all of the instances on the design frontier.
So by “moral endpoint” I mostly mean which options should be chosen at technological completion. It would be weird if there was one obvious choice of what to fill the universe with, even if it’s not weird that there’s one best transistor design (or whatever).
MIRI doesn’t offer its employees a retirement plan. (OpenAI did, and this was viewed with some consternation by the more AGI-pilled employees.)
I think “the singularity is my retirement plan” is not a crazy position; it is mostly irrelevant to my personal financial situation, tho.
Unfortunately it is pretty challenging for these sorts of markets to work out because people who bet on doom can’t be paid out in situations where doom happens, and doomers who want to consume now and then pay back in worlds where they survive are probably bad counterparties who are not optimizing for their ability to pay back in worlds where they survive. (Eliezer’s bet with Bryan Caplan, for example, has Eliezer locking up money now (in order to be a good counterparty) which he then won’t be able to use, and if he wins the bet he also won’t be able to use that money. So it’s primarily symbolic.)
Suppose option A is something you could do that benefits you today (like playing video games), and option B is something that benefits someone else later (like cleaning up a park). How good option B should seem depends on how many people it will affect—if it’s a park that receives lots of visitors, there’s more benefit than if the park receives few visitors.
Thus lots of impact-generating behavior scales with p(win); the more likely the world is to exist tomorrow, the more it makes sense to save or invest instead of consume.
Atomic weapons are the first technology with the potential to end the world we’ve ever developed
I don’t think this is true, actually—but atomic weapons certainly had the potential to end New York City. It’s less obvious that someone would bomb Ithaca.
This is why I think it’s good for people to still have kids in the face of the AI thing.
I think this is true for most people but the contours are a bit detailed. I net think it’s true for me also, despite my personal situation being somewhat complicated. (I tried to have kids in 2017 and it was not obvious how long timelines were then, it didn’t work out and I am perhaps trying again soon.)
The main answer is “responding at all is better than not responding”. Yes, I’m aware that governments have made lots of terrible decisions over the years—it’s not obvious to me that if we had the Bush-created pandemic preparedness office at the beginning of COVID (rather than it being dissolved earlier by Trump), they would have made things better instead of worse—but from my vantage point it is obvious that Operation Warp Speed was good and the FDA being slow (both on testing earlier and on approving the vaccine) was bad. In many contexts, velocity is a virtue on its own.