Ceramic engineering researcher by training. Been interested in ethics for several years. More recently have gotten into data science.
sweenesm
I appreciate the comment, you keyed me in to a bunch of things I wasn’t aware of (The Guild of the Rose, NYC Megameetup, and more). I definitely agree that setting a good example in one’s own life is a great place to start. And yes, several established power structures do stand to lose if people become less easy to manipulate.
I’m still hopeful that there’s some way to make progress if we get enough good minds churning out ideas on how to enroll people into their own personal development. This makes me wonder, though—which is more difficult, human alignment or AI alignment?
I look forward to your post. One thing I’ll add at this point is that The Dignity Index group is working on rating politicians’ speech using machine learning, in hopes that this could help shift political dialogue. I’ve done something similar with a bit more complicated rating system I developed independently. If you’re interested, check out some ratings of politicians’ tweets here: twitter.com/DishonorP. I don’t feel that ratings systems by themselves will have a large impact on shifting behaviors, but seeing that some people put out actually non-partisan ratings may give others a tiny bit more hope in humanity.
It’s an interesting point, what’s meant by “productive” dialogue. I like the “less…arguments-as-soldiers” characterization. I asked ChatGPT4 what productive dialogue is and part of its answer was: “The aim is not necessarily to reach an agreement but to understand different perspectives and possibly learn from them.” For me, productive dialogue basically means the same thing as “honorable discourse,” which I define as discourse, or conversation, that ultimately supports love and value building over hate and value destruction. For more, see here: dishonorablespeechinpolitics.com/blog2/#CivilVsHonorable
Thanks for the comment. If an AGI+ answered all my questions “correctly,” we still wouldn’t know if it were actually aligned, so I certainly wouldn’t endorse giving it power. But if it answered any of my questions “incorrectly,” I’d want to “send it back to the drawing board” before even considering using it as you suggest (as an “obedient tool-like AGI”). It seems to me like there’d be too much room for possible abuse or falling into the wrong hands for a tool that didn’t have its own ethical guardrails onboard. But maybe I’m wrong (part of me certainly hopes so because if AGI/AGI+ is ever developed, it’ll more than likely fall into the “wrong hands” at some point, and I’m not at all sure that everyone having one would make the situation better).
Thanks for the comment. You bring up an interesting point. The abortion question is a particularly difficult one that I don’t profess to know the “correct” answer to, if there even is a “correct” answer (see https://fakenous.substack.com/p/abortion-is-difficult for an interesting discussion). But asking an AGI+ about abortion, and to give an explanation of its reasoning, should provide some insight into either its actual ethical reasoning process or the one it “wants” to present to us as having.
These questions are in part an attempt to set some kind of bar for an AGI+ to pass towards at least showing it’s not obviously misaligned. The results will either be it obviously failed, or it gave us sufficiently reasonable answers plus explanations that it “might have passed.”
The other reason for these questions is that I plan to use them to test an “ethics calculator” I’m working on that I believe could help with development of aligned AGI+.
(By the way, I’m not sure that we’ll ever get nearly all humans to agree on what “aligned” actually looks like/means. “What do you mean it won’t do what I want?!? How is that ‘aligned’?! Aligned with what?!”)
Thanks. Yup, agreed.
Thanks for the post. I wish more people looked at things the way you describe, i.e., being thankful for being triggered because it points to something unresolved within them that they can now work on setting themselves free from. Btw, here’s an online course that can help with removing anger triggers: https://www.udemy.com/course/set-yourself-free-from-anger
Thank you for the comment! You bring up some interesting things. To your first point, I guess this could be added to the “For an ASI figuring out ethics” list, i.e., that an ASI would likely be motivated to figure out some system of ethics based on the existential risks it itself faces. However, by “figuring out ethics,” I really mean figuring out a system of ethics agreeable to humans (or “aligned” with humans) (I probably should’ve made this explicit in my post). Further, I’d really like it if the ASI(s) “lived” by that system. It’s not clear to me that an ASI being worried about existential risks for itself would translate to that. (Which I think is your third point.) The way I see it, humans only care about ethics because of the possibility of pain (and death). I put “and death” in parentheses because I don’t think we actually care directly about death, we care about the emotional pain that comes when thinking about our own death/the deaths of others (and whether death will involve significant physical pain leading up to it).
This leads to your second point—what you mention would seem to fall under “Info an ASI will likely have” number 8: “…the ability to run experiments on people” with the useful addition of “and animals, too.” I hadn’t thought about an ASI having hybrid consciousness in the way you mention (to this point, see below). I have two concerns with this: one is that it’d likely take some time, during which the ASI may unknowingly do unethical things. The second concern is more important, I think: being able to get the experience of pain when you want to is significantly different from not being able to control the pain. I’m not sure that a “curious” ASI getting an experience of pain (and other human/animal things) would translate into an empathic ASI that would want our lives to “go well.” But these are interesting things to think about, thanks for bringing them up!
One thing that makes it difficult for me personally to imagine what an ASI (in particular, the first one or few) might do is what hardware it might be built on (classical computers, quantum computers, biology-based computers, some combination of systems, etc.) Also, I’m very sketchy on what might motivate an ASI—which is related to the hardware question, since our human biological “hardware” is ultimately where human motivations come from. It’s difficult for me to see beyond an ASI just following some goal(s) we effectively give it to start with, like any old computer program, but way more complicated, of course. This leads to thoughts of goal misspecification and emergent properties, but I won’t get into those.
If, to give it its own motivation, an ASI is built from the start as a human hybrid, we better all hope they pick the right human for the job!
Yes, I sure hope ASI has stronger human-like ethics than humans do! In the meantime, it’d be nice if we could figure out how to raise human ethics as well.
Thank you for the feedback! I haven’t yet figured out the “secret sauce” of what people seem to appreciate on LW, so this is helpful. And, admittedly, although I’ve read a bunch, I haven’t read everything on this site so I don’t know all of what has come before. After I posted, I thought about changing the title to something like: “Why we should have an ‘ethics module’ ready to go before AGI/ASI comes online.” In a sense, that was the real point of the post: I’m developing an “ethics calculator” (a logic-based machine ethics system), and sometimes I ask myself if an ASI won’t just figure out ethics for itself far better than I ever could. Btw, if you have any thoughts on why my initial ethics calculator post was so poorly voted, I’d greatly appreciate them as I’m planning an update in the next few weeks. Thanks!
Thanks for the comment! Yeah, I guess I was having a bit too much fun in writing my post to explicitly define all the terms I used. You say you “don’t think ethics is something you can discover.” But perhaps I should’ve been more clear about what I meant by “figuring out ethics.” According to merriam-webster.com, ethics is “a set of moral principles : a theory or system of moral values.” So I take “figuring out ethics” to basically be figuring out a system by which to make decisions based on a minimum agreeable set of moral values of humans. Whether such a “minimum agreeable set” exists or not is of course debatable, but that’s what I’m currently trying to “discover.”
Towards that end, I’m working on a system by which to calculate the ethics of a decision in a given situation. The system recommends that we maximize net “positive experiences.” In my view, what we consider to be “positive” is highly dependent on our self-esteem level, which in turn depends on how much personal responsibility we take and how much we follow our conscience. In this way, the system effectively takes into account “no pain, no gain” (conscience is painful and so can be building responsibility).
I agree that I’d like us to retain our humanity.
Regarding AI promoting certain political values, I don’t know if there’s any way around that happening. People pretty much always want to push their views on others, so if they have control of an AI, they’ll likely use it as a tool for this purpose. Personally, I’m a Libertarian, although not an absolutist about it. I’m trying to design my ethics calculator to leave room for people to have as many options as they can without infringing unnecessarily on others’ rights. Having options, including to make mistakes and even to not always “naively” maximize value, are necessary to raise one’s self-esteem, at least the way I see it. Thanks again for the comment!
Thanks for the comment. I do find that a helpful way to think about other people’s behavior is that they’re innocent, like you said, and they’re just trying to feel good. I fully expect that the majority of people are going to hate at least some aspect of the ethics calculator I’m putting together, in large part because they’ll see it as a threat to them feeling good in some way. But I think it’s necessary to have something consistent to align AI to, i.e., it’s better than the alternative.
Thanks for the post. It might be helpful to add some headings/subheadings throughout, plus a summary at the top, so people can quickly extract from it what they might be most interested in.
I only skimmed the work—I think it’s hard to expect people to read this much without knowing if the “payoff” will be worth it. For adding headings, you can select the text of a heading and a little tool bar should pop up that says “Paragraph” on the left—if you click on the down arrow next to it, you can select Heading 1, Heading 2, etc. The text editor will automatically make a table of contents off to the left of your post based on this.
For summing up your post, maybe you could try popping it into ChatGPT and asking it to summarize it for you? Personally, in a summary I’d want to know quickly what “changing our currency type” entails (changing to what, exactly?), why you think it’s critical (how is it going to “empower the greater good” while other things won’t), and what you mean by “greater good.”
Hope this helps!
Thanks for the post. I’d like to propose another possible type of (or really, way of measuring) subjective welfare: self-esteem-influenced experience states. I believe having higher self-esteem generally translates to assigning more of our experiences as “positive.” For instance, someone with low self-esteem may hate exercise and deem the pain of it to be a highly negative experience. Someone with high self-esteem, on the other hand, may consider a particularly hard (painful) workout to be a “positive” experience as they focus on how it’s going to build their fitness to the next level and make them stronger.
Further, I believe that our self-esteem depends on to what degree we take responsibility for our emotions and actions—more responsibility translates to higher self-esteem (see “The Six Pillars of Self-Esteem” by Nathaniel Branden for thoughts along these lines). At low self-esteem levels, “experience states” basically translate directly to hedonic states, in that only pleasure and pain can seem to matter as “positive experiences” and “negative experiences” to a person with low self-esteem (the exception may be if someone’s depressed, when not much at all seems to matter). At high self-esteems, hedonic states play a role in experience states, but they’re effectively seen through a lens of responsibility, such as the pain of exercise seen through the lens of one’s own responsibility for getting oneself in shape, and deciding to feel good emotionally about pushing through the physical pain (here we could perhaps be considered to be getting closer to belief-like preferences).
Thanks for the comment. I agree that context and specifics are key. This is what I was trying to get at with “If you’d like to change or add to these assumptions for your answer, please spell out how.”
By “controlled,” I basically mean it does what I actually want it to do, filling in the unspecified blanks at least as well as a human would to follow as closely as it can to my true meaning/desire.
Thanks for your “more interesting framing” version. Part of the point of this post was to give AGI developers food for thought about what they might want to prioritize for their first AGI to do.
Thanks for adding the headings and TL;DR.
I wouldn’t say my own posts have been particularly well-received on LW so far, but I try to look at this as a learning experience—perhaps you can, too, for your posts?
When I was in grad school, my advisor took the red pen to anything I wrote and tore it apart—it made me a better writer. Perhaps consider taking a course on clear technical writing (such as on udemy.com), or finding tips on YouTube or elsewhere on the web, and then practicing them, perhaps with ChatGPT’s help? Becoming a more clear and concise writer can be useful both for getting one’s views across and crystallizing one’s own thinking.
I don’t know if you saw this post from yesterday, but you may find it useful: https://www.lesswrong.com/posts/ELbGqXiLbRe6zSkTu/a-review-of-weak-to-strong-generalization-ai-safety-camp
Thanks for the feedback! I’m not exactly sure what you mean by “no pattern-matching to actually glue those variables to reality.” Are you suggesting that an AGI won’t be able to adequately apply the ethics calculator unless it’s able to re-derive the system for itself based on its own observations of reality? The way I envision things happening is that the first AGI’s won’t be able to derive a mathematically consistent system of ethics over all situations (which is what the ethics calculator is supposed to be) - no human has done it yet, as far as I know—but an ASI likely will—if it’s possible.
If a human can figure it out before the first AGI comes online, I think this could (potentially) save us a lot of headaches, and the AGI could then go about figuring out how to tie the ethics calculator to its reality-based worldview—and even re-derive the calculator—as its knowledge/cognitive abilities expand with time. Like I said in the post, I may fail at my goal, but I think it’s worth pursuing, while at the same time I’d be happy for others to pursue what you suggest, and hope they do! Thanks again for the comment!
Thank you for the comment. Yes, I agree that “doing a good job of this is going to be extremely challenging.” I know it’s been challenging for me just to get to the point that I’ve gotten to so far (which is somewhat past my original post). I like to joke that I’m just smart enough to give this a decent try and just stupid enough to actually try it. And yes, I’m trying to find a rough approximation as a good starting point, in hopes that it’ll be useful.
Thanks for the suggestion about civil damages—I haven’t looked into that, only criminal “damages” (in terms of criminal sentences) thus far. I actually don’t expect that the first version of my calculations, based on my own ethics/values, will particularly agree with civil damages, but it may be interesting to see if the calculations can be modified to follow an alternate ethical framework (one less focused on self-esteem) that does give reasonable agreement.
Regarding masochistic and sadistic pleasure, it depends on how we define them. One might regard people who enjoy exercise as being into “masochistic pleasure.” That’s not what I mean by it. For masochistic pleasure I basically mean pleasure that comes from one’s own pain, plus self-loathing. Sadistic pleasure would be pleasure that comes from the thought of others’ pain, plus self-loathing (even if it may appear as loathing of other, the way I see it, it’s ultimately self-loathing). Self-loathing involves not taking responsibility for one’s emotions about oneself and is part of having a low self-esteem. I appreciate you pointing to the need for clarification on this, and hope it’s now clarified a bit. Thanks again for the comment!