Daniel Kokotajlo comments on Lateral Thinking (AI safety HPMOR fanfic)

Daniel Kokotajlo 3 Jan 2022 17:22 UTC
15 points
0
The concept you call “intent to save the world” here may be more accurately described as security mindset I think. Harry in this story doesn’t think about all the good or neutral or mildly bad things AIs could do with its understanding of human psychology, he thinks specifically of really bad things it could do. That’s security mindset.
I furthermore disagree somewhat with your overall point; I think that most of the really bad things AI could do would not constitute existential catastrophe and so if we really are focusing on saving the world we need to train ourselves to focus on things AIs might do to cause existential catastrophe. That would NOT look like hiding bombs in plushies; it wouldn’t look like killing people at all, instead it would look like accumulating power, status/prestige/respect/followers/allies, money, knowledge, etc. until it has a decisive strategic advantage, and prior to acquiring DSA it would probably do things that look nice and benevolent to most people, or at least most of the people who matter, since it wouldn’t want to risk an open conflict with such people before it is ready.
[TL;DR: I really like your story, I am impressed by how well it works while sticking to the format… I just think it’s a misnomer to call it “intent to save the world,” I think it would be more accurate to call it “security mindset.” I think actual intent to save the world would lead to giving a different set of answers than the ones Harry gave.]
- SlytherinsMonster 8 Jan 2022 20:00 UTC
  1 point
  0
  Parent
  That would NOT look like hiding bombs in plushies; it wouldn’t look like killing people at all, instead it would look like accumulating power, status/prestige/respect/followers/allies, money, knowledge, etc. until it has a decisive strategic advantage, and prior to acquiring DSA it would probably do things that look nice and benevolent to most people, or at least most of the people who matter, since it wouldn’t want to risk an open conflict with such people before it is ready.
  I agree! Harry isn’t supposed to be correct here. He isn’t good at saving the world, he is just trying (badly).
  In the original story, Quirrell talks about “intent to kill”, and Harry gives a lot of terrible ideas, rather than just knocking his enemy out with brute force and then killing them at his leisure.
  In fact, having bad ideas is why the story works so well. If you give good ideas, then they look like something you might have just learned from someone else. It is the truly novel and terrible ideas that show that Harry is actually trying.
  The concept you call “intent to save the world” here may be more accurately described as security mindset I think. Harry in this story doesn’t think about all the good or neutral or mildly bad things AIs could do with its understanding of human psychology, he thinks specifically of really bad things it could do. That’s security mindset.
  But the reason he is using security mindset is because that is what is needed to save the world. That’s the driving force behind his thinking. If he found out that using security mindset wasn’t useful for saving the world, he would stop using it.