Thoughts for and against an ASI figuring out ethics for itself

If artificial super intelligence (ASI) becomes a reality, it’ll be able to figure out pretty much anything better than humans, right? So why not ethics? As long as we have it under some level of control, we can just say, “Hey, act ethically, OK?,” and we should be good, right? Hmmm, I dunno… maybe?

Below I share some thoughts I have for and against an ASI figuring out ethics for itself. But first, what will an ASI likely have available to it to figure out a system of ethics it thinks is optimal?

Info an ASI will likely have

  1. The entire philosophical literature with all its arguments and counterarguments for things like deontology, utilitarianism, virtue ethics, etc.

  2. The entire psychology and physiology literatures for understanding how humans think, and feel both emotionally and physically

  3. Many many hours of videos displaying human behavior, such as from YouTube

  4. Huge amounts of text data from the internet, with plenty of examples of how people interact with each other there

  5. Works of fiction in which ethical dilemmas arise (e.g., Star Trek)

  6. A collection of philosophers/​ethicists to ask questions of and get opinions from (these opinions are all but guaranteed to be in conflict with each other on some significant issues)

  7. The ability to send out surveys to people to help determine their values/​preferences

  8. Perhaps the ability to run experiments on people and animals (either ethically or not)? [Edit: added “and animals” on 2-22-24 after reading mishka’s comment]

  9. Perhaps huge amounts of surveillance data on people (text messages, phone calls, security camera footage, credit histories, online shopping habits, criminal histories, etc.)

  10. Data on how well “ethics modules” in lesser AI systems worked up to that point

  11. The ASI may also figure out how to upload the mind of a really ethical human and use that as a starting point (although it seems like this would take significant time and experiments to figure out how to do, if it’s possible at all)

For an ASI figuring out ethics

OK, so with all that information available to an ASI, here’s my main thought in favor of an ASI figuring out ethics:

  1. An ASI will likely be able to figure out a lot of complex things that humans can’t or haven’t yet, and a consistent ethical system that works in the real-world just seems, intuitively, like it could be one of these

Against an ASI figuring out ethics

And here are some thoughts against an ASI figuring out ethics:

  1. An ASI likely won’t have a human body and direct experiences of pain and pleasure and emotions—it won’t be able to “try things on” to verify if its reasoning on ethics is “correct”

  2. There are a lot of conflicting arguments and unanswered questions in the ethics literature—is there really enough there for an ASI to build a consistent system from?

  3. People don’t act ethically or seemingly rationally all the time (or even much of the time) - how will an ASI make sense of all our apparent inconsistencies? Basically, it seems like there’s a lot of noise to sort through.

  4. Most people don’t really know what they want (what their true values are), so how will an ASI?

  5. How/​why will an ASI care about getting ethics right?

  6. Is it even possible to get ethics “right”?

  7. How will we know if the ASI got its ethical system right? Will we “know” it by things seeming fine until they’re not? Will it then be too late to help correct the ASI where it was wrong?

Maybe we should try our best to help an ASI get there

Given the thoughts above, my gut feel is that an ASI will be smart enough to figure out a reasonable systems of ethics for itself if it’s so inclined to. My biggest reservation about this, though, is an ASI’s lack of ability to “try humans on” since it most likely won’t have a body to help it do that. (If it is able to experience pain that it can’t completely control, like humans do, I think we could be in for a world of trouble!) Also, if an ASI decides it needs to run experiments or get survey results or something else from humans to hone in on an optimal system of ethics, this could take significant time. Minimizing honing time would likely be beneficial to reduce the risk of an ASI (or competing ASI’s that might come online) doing unethical things in the meantime. Therefore, I think it’d be useful to give an ASI our best effort version of an “ethics module” that it can start its honing from even if we think the ASI could ultimately figure out ethics for itself from “scratch” if given enough time.

What about pre-ASI AI’s?

I see even more reason to put our best effort into coming up with a viable “ethics module” when I think about what could happen between now and when the first ASI might arrive, e.g., the coming onslaught of agentic “weak” AI’s that’ll likely need ethical guard rails of some sort as they’re given more agentic power, and the potential for there to be multiple AGI’s under the control of multiple humans/​groups, some of whom won’t care about acting ethically.

This last scenario brings up another question: is there a way to make an AGI that can only be run on hardware with a built-in, “hard coded” ethics module that either can’t be altered without destroying the AGI, or could only be altered by the AGI itself in consultation with a human user it deems to be an un-coerced “ethicist” (a person with a track record of acting ethically plus a demonstrated understanding of ethics)?

Hmmm, I dunno… sounds like something it’d be nice to have an ethical ASI around to help us figure out.