Test
Perhaps
In terms of utility functions, the most basic is: do what you want. “Want” here refers to whatever values the agent values. But in order for the “do what you want” utility function to succeed effectively, there’s a lower level that’s important: be able to do what you want.
Now for humans, that usually refers to getting a job, planning for retirement, buying insurance, planning for the long-term, and doing things you don’t like for a future payoff. Sometimes humans go to war in order to “be able to do what you want”, which should show you that satisfying a utility function is important.
For an AI who most likely has a straightforward utility function, and who has all the capabilities to execute it(assuming you believe that superintelligent AGI could develop nanotech, get root access to the datacenter, etc.), humans are in the way of “being able to do what you want”. Humans in this case would probably not like an unaligned AI, and would try to shut it down, or at least not die themselves. Most likely, the AI has a utility function that has no use for humans, and thus they are just resources standing in the way. Therefore the AI goes on holy war against humans to maximize its possible reward, and all the humans die.
The first type of AI is a regular narrow AI, the type we’ve been building for a while. The second type is an agentic AI, a strong AI, which we have yet to build. The problem is, AIs are trained using gradient descent, which basically involves running AI designs from all possible AI designs. Gradient descent will train the AI that can maximize the reward best. As a result of this, agentic AIs become more likely because they are better at complex tasks. While we can modify the reward scheme, as tasks get more and more complex, agentic AIs are pretty much the way to go, so we can’t avoid building an agentic AI, and have no real idea if we’ve even created one until it displays behaviour that indicates it.
Awesome post, putting into words the intuitions I had for what dimensions the alignment problem stayed in. You’ve basically meta-bounded the alignment problem, which is exactly what we need when dealing with problems like this.
China, overrated probably—I’m worried about signs that Chinese research is going stealth in an arms race. On the other hand, all of the samples from things like CogView2 or Pangu or Wudao have generally been underwhelming, and further, Xi seems to be doing his level best to wreck the Chinese high-tech economy and funnel research into shortsighted national-security considerations like better Uighur oppression, so even though they’ve started concealing exascale-class systems, it may not matter. This will be especially true if Xi really is insane enough to invade Taiwan.
Gwern has some insights in this post. Probably more stuff to be found on his website or twitter feed.
Well it depends on your priors for how an AGI would act, but as I understand it, all AGIs will be powerseeking. If an AGI is powerseeking, and has access to some amount of compute, then it will probably bootstrap itself to superintelligence, and then start pushing its utility function all over. Different utility functions cause different results, but even relatively mundane ones like “prevent another superintelligence from being created” could result in the AGI killing all humans and taking over the galaxy to make sure no other superintelligence gets made. I think it’s actually really really hard to specify the what-we-actually-want future for an AGI, so much so that evolutionarily training an AGI in an Earth-like environment so it develops human-ish morals will be necessary.
I’d say building an AGI that self-destructs would be pretty good. Especially since up until the point that a minimum breeding population of humans exists, and assuming life is not totally impossible(i.e. the AI hasn’t already deconstructed the earth, or completely poisoned all water and atmosphere), humans could still survive. Making an AGI that doesn’t die would probably not be in our best interests until almost exactly the end.
Thanks for the answer! As you suspected, I don’t think wireheading is a good thing, but after reading about infinite ethics and the repugnant conclusion I’m not entirely sure that there exists a stable mathematically expressible form of ethics we could give to an AGI. Obviously I think it’s possible if you specify exactly what you want and tell the AGI not to extrapolate. However I feel that realistically, it’s going to take our ethics and take it to its logical end, and there exists no ethical theory that really expresses how utility should be valued without causing paradoxes or problems we can’t solve. Unless we manage to build AGI using an evolutionary method to mimick human evolution, I believe that any training or theory given to it would subtly fail.
[Question] What would the creation of aligned AGI look like for us?
Would the appropriate analogy to agents be that humans are a qualitatively different type of agent compared to animals and basic RL agents, and thus we should expect that there will be a fundamental discontinuity between what we have so far, and conscious agents?
You may also want to consider opportunities on the EA Volunteer Job Board. Some of them are similar low effort wiki building.
https://airtable.com/embed/shrQvU9DMl0GRvdIN/tbll2swvTylFIaEHP
I think in general, the most innovative candies have been candies that break the norm. I remember a lot of buzz when some gum company made gum wrappers that you could eat with your gum(Cinnaburst?) Nowadays though, it seems like companies don’t need to go that far for people to buy their new chocolate/candy, and there are so many flavours and textures they can slap on if people get tired.
Hi, I really like this series and how it explains some of the lower level results we can expect from high level future scenarios. However I’d like to know how you expect digital people will interact with an economy that has been using powerful, high-level AI models or bureaucracies for a couple decades or longer(approximately my timeline for mind uploading, assuming no singularity). I’ve mostly read LessWrong posts and haven’t done anything technical, but I feel that a lot of the expected areas in which digital people would shine might end up being accommodated by narrow-ish AI.
I think Wanda was in front of her, so she got hit, and Luna pretended to die.
Well first of all, most important are skills that allow you to keep living, like sourcing water, sourcing food, knowing which foods to eat, cooking(debatable), etc. Next are skills that allow you to accomplish goals, like motivating yourself, recognizing a good idea, rationality, etc. And finally there are skills that directly apply to your goals, like say programming or using a computer.
But this is in a world where you have no access to anything else. In most places, you can circumvent all the survival stuff by getting a stable source of enough money. The skills involved in allowing you to accomplish goals, and which in general clear up what your goals are still apply, although some of their work can be offloaded if you can get advisors or some such. And then we have the skills that apply directly to your goals. For some goals you can offload even these skills by paying people to accomplish your goals, but for others you need the skills yourself.
Thus, obtaining a good source of money, and being able to manage it and make more of it seems pretty important. And so are meta-skills that help you figure out your goals and accomplish them faster/with less effort.
Is there any way to buy a select number of books from the set, or only 1?
The last sentence of the Avoiding Losses from Zero-Sum Games section trails off… is there more that got removed?
Interested in the series, would love more theory as I’ve been meaning to read more on genetic engineering to begin with. Also more references and the introductory material that started you off into the subject matter would be awesome.
Usually more apparent when it’s a sudden increase in resource generation power (e.g. winning the lottery.)
In addition to what Jay Bailey said, the benefits of an aligned AGI are incredibly high, and if we successfully solved the alignment problem we could easily solve pretty much any other problem in the world(assuming you believe the “intelligence and nanotech can solve anything” argument). The danger of AGI is high, but the payout is also very large.