Ceramic engineering researcher by training. Been interested in ethics for several years. More recently have gotten into data science.
sweenesm
sweenesm’s Shortform
Nice write up on this (even if it was AI-assisted), thanks for sharing! I believe another benefit is Raising One’s Self-Esteem: If high self-esteem can be thought of as consistently feeling good about oneself, then if someone takes responsibility for their emotions, recognizing that they can change their emotions at will, they can consistently choose to feel good about and love themselves as long as their conscience is clear.
This is inline with “The Six Pillars of Self-Esteem” by Nathaniel Branden: living consciously, self-acceptance, self-responsibility, self-assertiveness, living purposefully, and personal integrity.
Thanks for the post. I don’t know the answer to whether a self-consistent ethical framework can be constructed, but I’m working on it (without funding). My current best framework is a utilitarian one with incorporation of the effects of rights, self-esteem (personal responsibility) and conscience. It doesn’t “fix” the repugnant or very repugnant conclusions, but it says how you transition from one world to another could matter in terms of the conscience(s) of the person/people who bring it about.
It’s an interesting question as to what the implications are if it’s impossible to make a self-consistent ethical framework. If we can’t convey ethics to an AI in a self-consistent form, then we’ll likely rely in part on giving it lots of example situations (that not all humans/ethicists will agree on) to learn from and hope it’ll augment this with learning from human behavior, and then generalize well to outside all this not perfectly consistent training data. (Sounds a bit sketchy, doesn’t it—at least for the first AGI’s, but perhaps ASI’s could fare better?) Generalize “well” could be taken to mean that an AI won’t do anything that most people would strongly disapprove of if they understood the true implications of the action.
[This paragraph I’m less sure of, so take it with a grain of salt:] An AI that was trying to act ethically and taking the approval of relatively wise humans as some kind of signal of this might try to hide/avoid ethical inconsistencies that humans would pick up on. It would probably develop a long list of situations where inconsistencies seemed to arise and of actions it thought it could “get away with” versus not. I’m not talking about deception with malice, just sneakiness to try to keep most humans more or less happy, which, I assume would be part of what its ethics system would deem as good/valuable. It seems to me that problems may come to the surface if/when an “ethical” AI is defending against bad AI, when it may no longer be able to hide inconsistencies in all the situations that could rapidly come up.
If it is possible to construct a self-consistent ethical framework and we haven’t done it in time or laid the groundwork for it to be done quickly by the first “transformative” AI’s, then we’ll have basically dug our own grave for the consequences we get, in my opinion. Work to try to come up with a self-consistent ethical framework seems to me to be a very under explored area for AI safety.
Thanks for the interesting post! I basically agree with what you’re saying, and it’s mostly in-line with the version of utilitarianism I’m working on refining. Check out a write up on it here.
Thanks for the post. I don’t know if you saw this one: “Thank you for triggering me”, but it might be of interest. Cheers!
Thank you for sharing this. I’m sorry that anxiety and depression continue to haunt you. I’ve had my own, less extreme, struggles, so I can relate to some of what you wrote. In my case, I was lucky enough to find some good personal development resources that helped me a lot. One I might suggest for you to check out is: https://www.udemy.com/course/set-yourself-free-from-anger/. You can often get this course on sale for <$20. From what you’ve described, I think the “Mini Me” section might be most useful to you. Hope this helps you in some way.
Thanks for the interesting post! I agree that understanding ourselves better through therapy or personal development it is a great way to gain insights that could be applicable to AI safety. My personal development path got started mostly due to stress from not living up to my unrealistic expectations of how much I “should” have been succeeding as an engineer. It got me focused on self-esteem, and that’s a key feature of the AI safety path I’m pursuing.
If other AI safety researchers are interested in a relatively easy way to get started on their own path, I suggest this online course which can be purchased for <$20 when on sale: https://www.udemy.com/course/set-yourself-free-from-anger
Good luck on your boundaries work!
Thanks for the feedback! I’m not exactly sure what you mean by “no pattern-matching to actually glue those variables to reality.” Are you suggesting that an AGI won’t be able to adequately apply the ethics calculator unless it’s able to re-derive the system for itself based on its own observations of reality? The way I envision things happening is that the first AGI’s won’t be able to derive a mathematically consistent system of ethics over all situations (which is what the ethics calculator is supposed to be) - no human has done it yet, as far as I know—but an ASI likely will—if it’s possible.
If a human can figure it out before the first AGI comes online, I think this could (potentially) save us a lot of headaches, and the AGI could then go about figuring out how to tie the ethics calculator to its reality-based worldview—and even re-derive the calculator—as its knowledge/cognitive abilities expand with time. Like I said in the post, I may fail at my goal, but I think it’s worth pursuing, while at the same time I’d be happy for others to pursue what you suggest, and hope they do! Thanks again for the comment!
Update on Developing an Ethics Calculator to Align an AGI to
I don’t know if you saw this post from yesterday, but you may find it useful: https://www.lesswrong.com/posts/ELbGqXiLbRe6zSkTu/a-review-of-weak-to-strong-generalization-ai-safety-camp
Thanks for adding the headings and TL;DR.
I wouldn’t say my own posts have been particularly well-received on LW so far, but I try to look at this as a learning experience—perhaps you can, too, for your posts?
When I was in grad school, my advisor took the red pen to anything I wrote and tore it apart—it made me a better writer. Perhaps consider taking a course on clear technical writing (such as on udemy.com), or finding tips on YouTube or elsewhere on the web, and then practicing them, perhaps with ChatGPT’s help? Becoming a more clear and concise writer can be useful both for getting one’s views across and crystallizing one’s own thinking.
Thanks for the comment. I agree that context and specifics are key. This is what I was trying to get at with “If you’d like to change or add to these assumptions for your answer, please spell out how.”
By “controlled,” I basically mean it does what I actually want it to do, filling in the unspecified blanks at least as well as a human would to follow as closely as it can to my true meaning/desire.
Thanks for your “more interesting framing” version. Part of the point of this post was to give AGI developers food for thought about what they might want to prioritize for their first AGI to do.
Thank you for the comment. I think all of what you said is reasonable. I see now that I probably should’ve been more precise in defining my assumptions, as I would put much of what you said under “…done significant sandbox testing before you let it loose.”
[Question] If you controlled the first agentic AGI, what would you set as its first task(s)?
Thanks for the post. I’d like to propose another possible type of (or really, way of measuring) subjective welfare: self-esteem-influenced experience states. I believe having higher self-esteem generally translates to assigning more of our experiences as “positive.” For instance, someone with low self-esteem may hate exercise and deem the pain of it to be a highly negative experience. Someone with high self-esteem, on the other hand, may consider a particularly hard (painful) workout to be a “positive” experience as they focus on how it’s going to build their fitness to the next level and make them stronger.
Further, I believe that our self-esteem depends on to what degree we take responsibility for our emotions and actions—more responsibility translates to higher self-esteem (see “The Six Pillars of Self-Esteem” by Nathaniel Branden for thoughts along these lines). At low self-esteem levels, “experience states” basically translate directly to hedonic states, in that only pleasure and pain can seem to matter as “positive experiences” and “negative experiences” to a person with low self-esteem (the exception may be if someone’s depressed, when not much at all seems to matter). At high self-esteems, hedonic states play a role in experience states, but they’re effectively seen through a lens of responsibility, such as the pain of exercise seen through the lens of one’s own responsibility for getting oneself in shape, and deciding to feel good emotionally about pushing through the physical pain (here we could perhaps be considered to be getting closer to belief-like preferences).
I only skimmed the work—I think it’s hard to expect people to read this much without knowing if the “payoff” will be worth it. For adding headings, you can select the text of a heading and a little tool bar should pop up that says “Paragraph” on the left—if you click on the down arrow next to it, you can select Heading 1, Heading 2, etc. The text editor will automatically make a table of contents off to the left of your post based on this.
For summing up your post, maybe you could try popping it into ChatGPT and asking it to summarize it for you? Personally, in a summary I’d want to know quickly what “changing our currency type” entails (changing to what, exactly?), why you think it’s critical (how is it going to “empower the greater good” while other things won’t), and what you mean by “greater good.”
Hope this helps!
Thanks for the post. It might be helpful to add some headings/subheadings throughout, plus a summary at the top, so people can quickly extract from it what they might be most interested in.
Thanks for the comment. I do find that a helpful way to think about other people’s behavior is that they’re innocent, like you said, and they’re just trying to feel good. I fully expect that the majority of people are going to hate at least some aspect of the ethics calculator I’m putting together, in large part because they’ll see it as a threat to them feeling good in some way. But I think it’s necessary to have something consistent to align AI to, i.e., it’s better than the alternative.
Thanks for the comment! Yeah, I guess I was having a bit too much fun in writing my post to explicitly define all the terms I used. You say you “don’t think ethics is something you can discover.” But perhaps I should’ve been more clear about what I meant by “figuring out ethics.” According to merriam-webster.com, ethics is “a set of moral principles : a theory or system of moral values.” So I take “figuring out ethics” to basically be figuring out a system by which to make decisions based on a minimum agreeable set of moral values of humans. Whether such a “minimum agreeable set” exists or not is of course debatable, but that’s what I’m currently trying to “discover.”
Towards that end, I’m working on a system by which to calculate the ethics of a decision in a given situation. The system recommends that we maximize net “positive experiences.” In my view, what we consider to be “positive” is highly dependent on our self-esteem level, which in turn depends on how much personal responsibility we take and how much we follow our conscience. In this way, the system effectively takes into account “no pain, no gain” (conscience is painful and so can be building responsibility).
I agree that I’d like us to retain our humanity.
Regarding AI promoting certain political values, I don’t know if there’s any way around that happening. People pretty much always want to push their views on others, so if they have control of an AI, they’ll likely use it as a tool for this purpose. Personally, I’m a Libertarian, although not an absolutist about it. I’m trying to design my ethics calculator to leave room for people to have as many options as they can without infringing unnecessarily on others’ rights. Having options, including to make mistakes and even to not always “naively” maximize value, are necessary to raise one’s self-esteem, at least the way I see it. Thanks again for the comment!
American Philosophical Association (APA) announces two $10,000 AI2050 Prizes for philosophical work related to AI, with June 23, 2024 deadline: https://dailynous.com/2024/04/25/apa-creates-new-prizes-for-philosophical-research-on-ai/
https://www.apaonline.org/page/ai2050
https://ai2050.schmidtsciences.org/hard-problems/