You convinced me to pre-order it. In particular, these lines:
> It wasn’t until I read an early draft of this book a year ago that I felt like I could trace a continuous, solid line from “superintelligence grown by a blind process...” to “...develops weird internal drives we could not have anticipated”. Before, I was like, “We don’t have justifiable confidence that we can make something that reflects our values, especially over the long haul,” and now I’m like, “Oh, you can’t get there from here. Clear as day.”
Algon
Note: I only read the top-level text. Also, thanks for linking to aisafety.info!
“AI War” is a kind of illegible term, and none of the sections seem like they’d give me a definition. Also, the first couple of top level texts don’t form a clean narrative. In particular, the jump from boxes 1->2 and 2-> 3. From 3->4 onwards, they do form a narrative/clear argument.One thing I did like is that you placed the most important info, “AI War seems unlikely to prevent AI Doom”, right at the top in big, bold letters. That does a decent job at conveying who this essay is for/what it’s about. But I feel like you could do better on that account, somehow? My gut is telling me an abstract/2-3 sentences at the top would be good. I know that’s weird, given how compact your argument is by default.
There’s also AISafety.info, which I’m a part of. We’ve just released a new intro section, and are requesting feedback. Here’s the LW announcement post.
SAEs falling out of mechanistic interpretability research
That doesn’t sound promising.
Thank you for this. The analogies are quite helpful in forcing me to consider if my argument is valid. (Admittedly, this post was written in haste, and probably errs somehow. But realistically, I wouldn’t have polished this rant any further. So publishing as is, it is.) It feels like the “good/bad for alignment”, “p doom changed” discussions are not useful in the way that analyzing winning probabilities in a chess game is useful. I’m not sure what it is, exactly.
Perhaps thinking through an analogy to go, with which I’ve got more experience, would help. When I play go, I rarely think about updating my “probability of victory” directly. Usually, I look at the strength of my groups, their solidity etc. and that of my enemy. And, of course, if they move as I wish. Usually, I wish them to move in such a way that I can accomplish some tactical objective, say killing a group in the top right so I can form a solid band of territory there and make some immortal groups. When my opponent moves, I update my plans/estimates regarding my local objectives, which propagates to my “chances of victory”.“Wait, the opponent moved there!? Crap, now my group is under threat. Are they trying to threaten me? Oh, wait, this bugger wants to surround me? I see. Can I circumvent that? Hmm… Yep, if I place this stone at C 4, it will push the field of battle to the lower left, where I’m stronger and can threaten more pieces than right now, and connect to the middle left.”
In other words, most of my time is spent focused on robust bottlenecks to victory, as they mostly determine my victory. My thoughts are not shaped like “ah, my odds of victory went down because my enemy place a stone at H 12 ”. The thoughts of victory come after the details. The updates to P(victory), likewise, are computed after computing P(details).
EDIT 2: Did you mean that there are advantages to having both courage and caution, so you can’t have a machine that has maximal courage and maximal caution? That’s true, but you can probably still make pareto improvements over humans in terms of courage and caution.
Would changing “increase” to “optimize” fix your objection? Also, I don’t see how your first paragraph contradicts the first quoted sentence.Mathematically impossible. If X matters then so does -X, but any increase in X corresponds to a decrease in -X.
I don’t know how the second sentence leads to the first. Why should a decrease in -X lead to less success? Moreover, claims of mathematical impossibility are often over-stated.
As for the paragraph after, it seems like it assumes current traits being on some sort of pareto frontier of economic-fitness. (And, perhaps, an assumption of adequate equilibria). But I don’t see why that’d be true. Like, I know of people who are more diligent than me, more intelligent, have lower discount rates etc. And they are indeed successful. EDIT: AFAICT, there’s a tonne of frictions and barriers, which weaken the force of the economic argument I think you’re making here.
That said, “nice to most people but terrible to a few” is an archetype that exists.
Honestly, this is close to my default expectation. I don’t expect everyone to be terrible to a few people, but I do expect there to be some class of people I’d be nice to that they’d be pretty nasty towards.
Why not Valve?
It’s kind of like there is this thing, ‘intelligence.’ It’s basically fungible, as it asymptotes quickly at close to human level, so it won’t be a differentiator.
I don’t think he ever suggests this. Though he does suggest we’ll be in a pretty slow takeoff world.
Consistently give terrible strategic takes, so people learn not to defer you.
Yeah! It’s much more in-depth than our article. We were thinking we should re-write ours to give the quick run down of EY’s and then link to it.
: ) You probably meant to direct your thanks to the authors, like @JanB.
A lot of the ideas you mention here remind me of stuff I’ve learnt from the blog commoncog, albeit in a business expertise context. I think you’d enjoy reading it, which is why I mentioned it.
Presumably, you have this self-image for a reason. What load-bearing work is it doing? What are you protecting against? What forces are making this the equilibrium strategy? Once you understand that, you’ll have a better shot of changing the equilibrium to something you prefer. If you don’t know how to get answers to those questions, perhaps focus on the felt-sense of being special.
Gently hold a stance of curiosity as to why you believe these things, give your subconscious room and it will float up answers your self. Do this for perhaps a minute or so. It can feel like there’s nothing coming for a while, and nothing will come, and then all of a sudden a thought floats into view. Don’t rush to close your stance, or protest against the answers you’re getting.
Yep, that sounds sensible. I sometimes use consumer reports in my usual method for buying something in product class X. My usual is:
1) Check what’s recommended on forums/subreddits who care about the quality of X.
2) Compare the rating distribution of an instance of X to other members of X.
3) Check high quality reviews. This either requires finding someone you trust to do this, or looking at things like consumer reports.
Asa’s story started fairly strong, and I enjoyed the first 10 or so chapters. But as Asa was phased out of the story, and it focused more on Denji, I felt it got worse. There were still a few good moments, but it’s kinda spoilt the rest of the story, and even Chainsaw Man for me. Denji feels like a caricature of himself. Hm, writing this, I realize that it isn’t that I dislike most of the components of the story. It’s really just Denji.
EDIT: Anyway, thanks for prompting me to reflect on my current opinion of Asa Mitaka’s story, or CSM 2 as I think of it. I don’t think I ever intended that to wind up as my cached-opinion. So it goes.
The Asa Mitaka manga.
You can also just wear a blazer if you don’t want to go full Makima. A friend of mine did that and I liked it. So I copied it. But alas I’ve grown bigger-boned since I stopped cycling for a while after my car-accident. So my Soon I’ll crush my skeleton down to a reasonable size, and my blazer will fit once more.
Side note, but what do you make of Chainsaw Man 2? I’m pretty disappointed by it all round, but you notice unusual features of the world relative to me, so maybe you see something good in it that I don’t.
I think I heard of proving too much from the sequences, but honestly, I probably saw it in some philosophy book before that. It’s an old idea.
If automatic consistency checks and examples are your baseline for sanity, then you must find 99%+ of the world positively mad. I think most people have never even considered making such things automatic, like many have not considered making dimensional analysis automatic. So it goes. Which is why I recommended them.Also, I think you can almost always be more concrete when considering examples, use more of your native architecture. Roll around on the ground to feel how an object rotates, spend hours finding just the right analogy to use as an intuition pump. For most people, the marginal returns to concrete examples are not diminishing.
Prove another way is pretty expensive in my experience, sure. But maybe this is just a skill issue? IDK.
Justis has been helpful as a copy-editor for some AIsafety.info content recently. Will be
abusingusing his services more.