Justis has been helpful as a copy-editor for some AIsafety.info content recently. Will be abusing using his services more.
Algon
You convinced me to pre-order it. In particular, these lines:
> It wasn’t until I read an early draft of this book a year ago that I felt like I could trace a continuous, solid line from “superintelligence grown by a blind process...” to “...develops weird internal drives we could not have anticipated”. Before, I was like, “We don’t have justifiable confidence that we can make something that reflects our values, especially over the long haul,” and now I’m like, “Oh, you can’t get there from here. Clear as day.”
Note: I only read the top-level text. Also, thanks for linking to aisafety.info!
“AI War” is a kind of illegible term, and none of the sections seem like they’d give me a definition. Also, the first couple of top level texts don’t form a clean narrative. In particular, the jump from boxes 1->2 and 2-> 3. From 3->4 onwards, they do form a narrative/clear argument.One thing I did like is that you placed the most important info, “AI War seems unlikely to prevent AI Doom”, right at the top in big, bold letters. That does a decent job at conveying who this essay is for/what it’s about. But I feel like you could do better on that account, somehow? My gut is telling me an abstract/2-3 sentences at the top would be good. I know that’s weird, given how compact your argument is by default.
There’s also AISafety.info, which I’m a part of. We’ve just released a new intro section, and are requesting feedback. Here’s the LW announcement post.
Community Feedback Request: AI Safety Intro for General Public
SAEs falling out of mechanistic interpretability research
That doesn’t sound promising.
Thank you for this. The analogies are quite helpful in forcing me to consider if my argument is valid. (Admittedly, this post was written in haste, and probably errs somehow. But realistically, I wouldn’t have polished this rant any further. So publishing as is, it is.) It feels like the “good/bad for alignment”, “p doom changed” discussions are not useful in the way that analyzing winning probabilities in a chess game is useful. I’m not sure what it is, exactly.
Perhaps thinking through an analogy to go, with which I’ve got more experience, would help. When I play go, I rarely think about updating my “probability of victory” directly. Usually, I look at the strength of my groups, their solidity etc. and that of my enemy. And, of course, if they move as I wish. Usually, I wish them to move in such a way that I can accomplish some tactical objective, say killing a group in the top right so I can form a solid band of territory there and make some immortal groups. When my opponent moves, I update my plans/estimates regarding my local objectives, which propagates to my “chances of victory”.“Wait, the opponent moved there!? Crap, now my group is under threat. Are they trying to threaten me? Oh, wait, this bugger wants to surround me? I see. Can I circumvent that? Hmm… Yep, if I place this stone at C 4, it will push the field of battle to the lower left, where I’m stronger and can threaten more pieces than right now, and connect to the middle left.”
In other words, most of my time is spent focused on robust bottlenecks to victory, as they mostly determine my victory. My thoughts are not shaped like “ah, my odds of victory went down because my enemy place a stone at H 12 ”. The thoughts of victory come after the details. The updates to P(victory), likewise, are computed after computing P(details).
Dont focus on updating P doom
EDIT 2: Did you mean that there are advantages to having both courage and caution, so you can’t have a machine that has maximal courage and maximal caution? That’s true, but you can probably still make pareto improvements over humans in terms of courage and caution.
Would changing “increase” to “optimize” fix your objection? Also, I don’t see how your first paragraph contradicts the first quoted sentence.Mathematically impossible. If X matters then so does -X, but any increase in X corresponds to a decrease in -X.
I don’t know how the second sentence leads to the first. Why should a decrease in -X lead to less success? Moreover, claims of mathematical impossibility are often over-stated.
As for the paragraph after, it seems like it assumes current traits being on some sort of pareto frontier of economic-fitness. (And, perhaps, an assumption of adequate equilibria). But I don’t see why that’d be true. Like, I know of people who are more diligent than me, more intelligent, have lower discount rates etc. And they are indeed successful. EDIT: AFAICT, there’s a tonne of frictions and barriers, which weaken the force of the economic argument I think you’re making here.
The road from human-level to superintelligent AI may be short
Human-level is not the limit
AI may attain human-level soon
AI is advancing fast
That said, “nice to most people but terrible to a few” is an archetype that exists.
Honestly, this is close to my default expectation. I don’t expect everyone to be terrible to a few people, but I do expect there to be some class of people I’d be nice to that they’d be pretty nasty towards.
What are Responsible Scaling Policies (RSPs)?
What are the differences between a singularity, an intelligence explosion, and a hard takeoff?
Why not Valve?
What is scaffolding?
It’s kind of like there is this thing, ‘intelligence.’ It’s basically fungible, as it asymptotes quickly at close to human level, so it won’t be a differentiator.
I don’t think he ever suggests this. Though he does suggest we’ll be in a pretty slow takeoff world.
I keep seeing the first clause as “I don’t believe in your work”.