Error

LW server reports: not allowed.

This probably means the post has been deleted or moved back to the author's drafts.

John_Maxwell 10 Apr 2019 1:58 UTC
6 points

Any agent that seeks X as an instrumental goal, with, say, Y as a terminal goal, can easily be outcompeted by an agent that seeks X as a terminal goal.

You offered a lot of arguments for why this is true for humans, but I’m less certain this is true for AIs.

Suppose the first AI devotes 100% of its computation to achieving X, and the second AI devotes 90% of its computation to achieving X and 10% of its computation to monitoring that achieving X is still helpful for achieving Y. All else equal, the first AI is more likely to win. But it’s not necessarily true that all else is equal. For example, if the second AI possessed 20% more computational resources than the first AI, I’d expect the second AI to win even though it only seeks X as an instrumental goal.
- Natália 12 Apr 2019 18:08 UTC
  1 point
  Parent
  Thank you for the correction. Thinking about it, I think that is true even of humans, in a certain sense. I would guess that the ability to hold several goal-nodes in one’s mind would scale with g and/or working memory capacity. Someone who is very smart and has tolerance for ambiguity would be able to aim for a very complex goal while simultaneously maintaining a great performance in the day-to-day mundane tasks they need to accomplish which might have seemingly no resemblance to the original goal at all.
  
  It seems to be a skill that requires “buckets” https://www.lesswrong.com/posts/EEv9JeuY5xfuDDSgF/flinching-away-from-truth-is-often-about-protecting-the
  
  So, both in humans and computers, I would guess this is an ability that requires certain cognitive or computational resources. So I maintain my original claim granted that those resources are controlled for.
- Pattern 10 Apr 2019 4:32 UTC
  0 points
  Parent
  Additionally, there may exist sets of goals that if pursued together, one is more likely to achieve all of them, than if any one (or any subset less than the whole) were pursued alone. (To put it a different way, it is possible to work on different things that give you ideas for each other, that you wouldn’t have had if you had been working on only one/a subset of them.)
Matthew Barnett 8 Apr 2019 16:50 UTC
3 points
The AI alignment problem seems to be a problem inherent to seeking out sufficiently complex goals.
I would add that the alignment problem can still occur for simple goals. In fact, I don’t think I can come up with a “goal” simple enough that I could specify it on an advanced artificial intelligence without mistake, even in principle. Of course, this might just be a limitation of my imagination.
The alignment problem really occurs whenever one agent can’t see all possible consequences of their actions. Given our extremely limited minds in this universe, the problem ends up popping up everywhere.
- Natália 12 Apr 2019 18:23 UTC
  2 points
  Parent
  I agree. I used the modifier “sufficiently” in order to avoid making claims about where a hard line between complex goals and simple goals would lie. Should have made that clearer.
Slider 8 Apr 2019 16:02 UTC
2 points
The line between executable actions and plans is presented as being quite clear cut and a statement about “nobody *does* getting rich” as a claim of fact. Similar logic could be employed to argue that “nobody ever grabs a glass they only apply pressure with their fingers. Or in reverse “there is nobody on the planet whose mind percieves ’getting rich* as a single executionable action nor could there ever (reasonably) be”. I could imagine that there are people for whom “hedgefunding” is a basic action that doesn’t require esoteric aiming while it not being typically for it to be so. And “getting rich” is not that far from that.
Kind of like magic as “a mechanism you know how to use but don’t know how it works” ie “sufficiently unexplained technology” is a concept in the eye of the beholder so too is the division between plans and actions. That is “sufficiently non-practical actions” are plans. Was part of what Yoda was trying to get with “don’t try, do” about this?
I think sex is still a functional part of human thriving. That it doesn’t do insemnation doesn’t stop it from doing all of it’s social/community bonding. If you use a hammer as a stepping stone in order to reach high places you are not failing to use a hammer to impact nails, you are succesfully using a hammer to build a house. “Having sex doesn’t lead to spreading genes” seems also like a claim of fact. Well what about keeping your marriage intact with your test tube childrens biological mother? I could see how celibacy could throw a serious wrench in that. If we also keep that worker ants further their genes agenda without having direct offspring can we truly rule out similar effects for example homosexual couples?
In a way evolution only cares about the goal of “thrive” and pushing for it really can’t go wrong. But in pushing it it is often important for it to be extremely flexible possibly to the point of anti-alignment with the sub-goals. Repurpose limbs? Repurpose molecyles? Alingment would be dangerous. Also in the confclit between asexual and sexual reproduction having strong aligment is the *downside* of asexual reproduction.
I read this also as a magic color analog that argues for white over atleast black and green”in order to get anywhere we need to erect a system to get there”. Green can answer “there is power in diversity, having all your eggs in the same basket is a recipe for stagnation. Red answers “Only by moving your brush against the canvas of life will you ever see a picture and even if you could see why would you then bother painting it?” Black would complain about unneccesary commitments and “leaving money on the table”. Blue can answer “If you now commit to optimise candy you will never come to appriciate the possibility of lobster dinners.”
Ruby 10 Apr 2019 20:13 UTC
1 point
Glad to learn my post was helpful! I don’t have time to engage more at the moment, but this post seems relevant to the topic: Dark Arts of Rationality.
Consider, for example, a young woman who wants to be a rockstar. She wants the fame, the money, and the lifestyle: these are her “terminal goals”. She lives in some strange world where rockstardom is wholly dependent upon merit (rather than social luck and network effects), and decides that in order to become a rockstar she has to produce really good music.
But here’s the problem: She’s a human. Her conscious decisions don’t directly affect her motivation.
In her case, it turns out that she can make better music when “Make Good Music” is a terminal goal as opposed to an instrumental goal.
When “Make Good Music” is an instrumental goal, she schedules practice time on a sitar and grinds out the hours. But she doesn’t really like it, so she cuts corners whenever akrasia comes knocking. She lacks inspiration and spends her spare hours dreaming of stardom. Her songs are shallow and trite.
When “Make Good Music” is a terminal goal, music pours forth, and she spends every spare hour playing her sitar: not because she knows that she “should” practice, but because you couldn’t pry her sitar from her cold dead fingers. She’s not “practicing”, she’s pouring out her soul, and no power in the ’verse can stop her. Her songs are emotional, deep, and moving.
It’s obvious that she should adopt a new terminal goal.