To me “ought” is unpacked as “I have assigned a certain moral weight or utility to various possible worlds that appear to depend on an agent’s actions, and I prefer the world with the higher utility”. Not sure if this matches any specific moral philosophy or whatever.
Is there a concept of a safe partially aligned AI? Where it recognizes its own limitations of understanding of the human[-ity] and limit its actions to what it knows is within those limits with high probability?
I like the idea of multidimensional preferences, such as liking/wanting/approving, not just maximizing a single utility function. I suspect that there are more dimensions that are worth considering. For example “deserving” is the one often missed by those with reasonably happy childhood. In those who faced emotional abuse growing up and eventually internalized it, the difference between wanting and deserving can be very considerable. It is quite common to hear “I want to be happy” but when you ask something like “Do you feel that you deserve to be happy?” the answer is often either a pause or a negative, something like “I am not a good person, I do not deserve to be happy.” Not sure if this can be incorporated into your model, and what other axes are potentially worth considering.j
we’d never see this Scientist AI reach a point where we could trust it to do what we mean.
Quite possibly. But I suspect that means that we will not be able to trust any AI to DWIM.
This topic touches a chord with me for various reasons, so I will try to comment as I read.
One of the most persistent arguments, often left implicit, that antifeminists have on their side is “Women want to be raped and abused.”
There is a problem with the definitions here. A significant fraction of women (and smaller but not insignificant fraction of men, and let’s not forget about all other genders, too) have physical and/or sexual and/or emotional abuse fantasies in various degrees. Some of those want it to remain a fantasy, others want to experience it in actuality, but in a controlled and safe way, yet others want to feel helpless and out of control, yet be still OK afterwards (such as consensual non-consent), and a small minority really want to be hurt bad or even killed, the hell with the consequences. The “antifeminists” probably project this small minority onto the majority of women.
On one hand, this is an obvious logical contradiction if taken literally. Rape is unwanted sex — how can you want something unwanted?
Humans are not monolithic agents! We have many contradictory drives, needs and values. A part of us may want to be abused, while another part is horrified by the idea.
Men are more violent than women. This is a human universal, and true in many of our mammalian relatives as well.
If you mean physical violence, then yes. If you mean emotional violence, then women are just as adept at it, of not more so, than men. Some mother/daughter relationships are of the worst and most toxic kind. “Will I ever be good enough?” is a classic example.
you might start to wonder whether at least some women might have a thing for men who use force on them
Uh. As you said, and as is universally acknowledged, “bad boys” have more success than nice guys. And I mean real nice guys, not the self-proclaiming ones. Scott Aaronson’s famous comment 171 is a great example of it. And the reaction to it is also a great example of vicious emotional abuse, mostly by some self-proclaimed feminists.
And it’s a description of a woman in an abusive relationship with a rapist.
Seriously? Have you read the book? Sure, Christian is not the model of a healthy BDSM relationship with Ana, though he certainly appears to be in his previous BDSM relationships with more experienced partners. Absolutely, he crosses the consent line here and there, mostly unintentionally and out of anger, which is generally a hallmark of an abuser. He even once or twice blames Ana for it: “you make me do this”, which is the excuse most abusers use. However, his overarching goal is a mutually happy and satisfying relationship. He doesn’t try to gaslight her, he takes responsibility for his actions as the older and more experienced partner, and he tries as hard as he can to make it work. This is very impressive given his background of growing up in an abusive situation and having his sexuality shaped by a non-consensual submissive BDSM relationship with an older woman. As far as D/s (or even vanilla) relationship go, Christian and Ana’s are definitely on the positive side of the Bell curve. They genuinely love each other and care for each other from the get go, and overcome a lot of obstacles to be happy together.
The question Pervocracy asks is — why are people into this?
It’s a good question, and worth researching without prejudice. FetLife.com is the site devoted to all legal kinks, and you will find a huge variety of them there.
I wouldn’t be surprised if the author and many of the fans have been in abusive relationships or grew up in abusive households.
There is definitely a correlation between kink and childhood trauma. Sometimes it is about reenactment, sometimes it is about self-hate, and sometimes it is about overcoming the external manifestations of the trauma by engaging in ostensibly similar roles, but in a safe and consensual setting. That said, it’s just a correlation, and by no means a certainty. Plenty of people enjoy dominant or submissive roles despite growing up in a healthy and nurturing environment. Nurture has rather limited effect on one’s upbringing.
A classic and maybe even defining feature of abuse is that the abused person is made to feel that it is normal or even right for them to be harmed. They’re told “You deserve it.” Or “this is just what relationships or families are like.” Or “you aren’t being harmed, you’re fine.” Over time, abused people may come to believe this.
Yep. During my years emotionally supporting people online I have seen plenty of that. When done by the parents or guardians during a child formative years, this form of abuse is extremely insidious, and nearly impossible to overcome later in life. As an aside, I wish the EA movement spent some time focusing on this hidden source of suffering that is all around us.
And people who deeply believe that it is normal or right for them to be harmed may expose themselves to harm again, in order to confirm or validate their model of the world. This is what “self-harm” or “self-destructive behavior” is. It’s not that the harm makes them happy. It’s that it makes them right.
Sadly, that is indeed what happens. Once a part of you internalizes the abuser’s message, abuse-seeking becomes a pattern, and often a blind spot. The more one gets abused growing up, the more split their personality becomes, C-PTSD tilting into DID in especially severe cases.
Maybe abused people really do have a higher risk of seeking out a repetition of the harm they experienced and were taught to believe was normal.
That’s not even a maybe. Abuse seeking and reenactment is a well-documented pattern.
So I think the antifeminist account is confusing cause and effect. It’s not that women want men to hurt them. It’s that men hurt women a lot.
And men hurt men a lot. And women hurt men and women a lot, just in different ways. Reverse sexism is still sexism, just like reverse racism is still racism. No race, gender or ethnicity has a monopoly on being good or bad. The Bell curves are wide and very much overlapping. And yes, people often confuse cause and effect. And, as you had mentioned previously, the effect, once internalized becomes and perpetuates the cause. Also, another definitional question: “hurt” may mean many things. What you probably mean is the non-consensual hurt, and what the “antifeminists” mean is their projection of a minority of women wanting to get hurt in various ways onto all women, the majority of whom have no interest in being hurt in the way these particular men want to hurt them.
But even if there were agency on her part in seeking out abusers, people do not only optimize for their own well-being. People also, and in fact primarily, optimize for validation.
In other words, “Congratulations, asshole! Even if you’re right, you found someone who was hurting herself and decided you’d help her along.”
This is another real and common pattern. Predators are good at sniffing out their prey. Many childhood abuse survivors still give off this victim vibe years after the original abuse is over, often without realizing it. The two groups naturally gravitate toward each other, and, as a result, it is easy for an abuse victim to end up in another abusive relationship, and it is easy for an abuser to reenact the same role without ever realizing the harm they are doing to their partner.
In modern, colloquial language, I think the best word we have for the thing is healthy, as in, “make healthy choices,” and in close analogy with the concept of physical health.
Yes, that’s the commonly used term.
If by validation you mean acting to reconcile your view of self with your view of the world. Or the views of some part of you, since a human rarely has uniform and unchanging values.
You shouldn’t be (intentionally, avoidably) making people less healthy. You shouldn’t fuck people up.
Yes, that’s the idea. In reality, few people intend to make others less healthy. More often than not, we make someone miserable, “less healthy” while thinking that we are doing what is good for them. Sometimes because we think we know better than they do, sometimes to justify our own actions, sometimes just because we are careless. Ask an abuser, and most of them, save for true psychopaths, would find a perfectly good excuse for what they do, and why, in their opinion, they were actually doing what the other person needed or even wanted.
You can’t always fix one of those tangles from the outside, but it’s a good idea to stop it from spreading, interrupt it, get people out of it when they have a decent chance of recovery, and so on.
Uh. Sadly the situation tends to be far worse than that. Notice the “post-traumatic” part of PTSD and CPTSD. Paradoxically, the removal of trauma can have negative effects! I know quite a few people who ostensibly overcame shitty childhood and were on their way to live a healthy life, only to see it all crashing down a few years later, as the post-traumatic effects got out of control and became disabling. Like a kettle under pressure, when the external pressure is removed, it is far easier for it to crack. Constantly fighting for survival is actually the healthiest way to live for some people.
And, likewise, if you see someone who appears to be optimizing for the opposite of health and happiness, you shouldn’t help them with that goal on the grounds of “revealed preference.” It’s probably not going to make you healthy and happy either.
Generally, that’s a good advice. Unless you know what you are doing, hurting others because they seem to crave it is a slippery slope. And hurting those who don’t want to get hurt because you think that deep inside they do, like those “antifeminist” claim, is definitely a bad idea.
I don’t think there would be anything as clearcut as a separation between objective and non-objective collapse. A more likely candidate experiment against objective collapse would be a quantum computer consisting of 10^2+ qubits.
we adopt in the following the hypothesis held by the majority of physicists, namely that the collapse process is real in the sense that it occurs independently from the presence of an observer or a measurement apparatus
Uh, no. Only a small minority of physicists think of the apparent collapse as an independent physical process. Most physicists follow the “shut up and calculate probabilities” approach, and most of those who venture into the nature of apparent collapse agree that it is very much observer-related, as in, the entanglement of the observer with the observed is essential for the Born rule to manifest.
Regardless, it seems like a good idea to target “the phenomena taking place in the little-explored realm where the quantum world interfaces classical physics”, since this is where the mystery likely lies.
The different outcomes predicted for this experiment by quantum physics and by thermodynamics are distinguishable by measurements of the distribution of radiation in A and B.
That indeed seems doable, and both authors are condensed matter experts, one is a experimentalist, wonder why the paper does not hint at a potential experimental implementation.
I have my reservations about their interpretation of the proposed experiment, whatever the outcome, but I’d rather wait for the experts in the field to chime in. Someone like Scott Aaronson, for example.
Never done this kind of formal debating, but it feels like the main skill you learn is the exact opposite of rationality: give up caring about an accurate description of the world in favor of extracting the most personal benefits from a preset view.
Really enjoying reading every post in this sequence! Strangely, handwritten text does not detract from it, if anything, it makes each post more readable.
I wonder why most humans are not fans of wireheading, at least not until addicted to it. Do we naturally think in terms of an “unhacked box” when evaluating usefulness of something?
Humans are only in a small part pliable reasoning. Most of what makes us us is genetic, subconscious and not available to introspection. We have more blind spots than we have sighted, and we actively resist correcting those blind spots. LW-style rationality tends to appeal to the people who on average are at or below mean in interpersonal skills, so you start with a huge handicap and learning about biases and how to deal with them only gives you some marginal advantage over those like you, not a magic bullet to achieve your goals. Speaking of the goals, humans are confused about what goals they have, what values they have, and a person is better represented not as a single optimizer, but as a multitude of competing agents, some of which are not aware of the others’ presence, and some never bubble up to the conscious awareness at all. Those lucky few of us who have one or two rationality-shaped blind spots benefit the most, the rest, well, here we are discussing why we are not winning instead of actually winning.
Not sure why the above comment was downvoted to −15. It’s a fair question, even if the person asking seems to misinterpret both quantum mechanics and mathematical logic. Quantum mechanics seems to be an accurate description of the “lower levels” of the agent’s model of the universe, and mathematical logic is a useful meta-model that helps us construct better quality models of the universe. They are not, as far, as I know, interrelated, and there is no “hence”. Additionally, while quantum mechanics is a good description of the microscopic world, it is much less useful at the level of living organisms (though ion channel opening and closing reflects the underlying quantum-mechanical tunneling), so there is no indication that human thinking is inherently quantum mechanical and could not be some day implemented by a classical computer without a huge complexity penalty.
You can, for a certain value of “can”. It won’t have happened, of course, but you may still decide to act contrary to how you act, two different outcomes of the same algorithm.
This confuses me even more. You can imagine act contrary to your own algorithm, but the imagining different possible outcomes is a side effect of running the main algorithm that takes $10. It is never the outcome of it. Or an outcome. Since you know you will end up taking $10, I also don’t understand the idea of playing chicken with the universe. Are there any references for it?
You don’t know that it’s inaccurate, you’ve just run the computation and it said $5.
Wait, what? We started with the assumption that examining the algorithm, or running it, shows that you will take $10, no? I guess I still don’t understand how
What if you see that your algorithm leads to taking the $10 and instead of stopping there, you take the $5?
is even possible, or worth considering.
This map from predictions to decisions could be anything.
Hmm, maybe this is where I miss some of the logic. If the predictions are accurate, the map is bijective. If the predictions are inaccurate, you need a better algorithm analysis tool.
The map doesn’t have to be identity, decision doesn’t have to reflect prediction, because you may write an algorithm where it’s not identity.
To me this screams “get a better algorithm analyzer!” and has nothing to do with whether it’s your own algorithm, or someone else’s. Can you maybe give an example where one ends up in a situation where there is no obvious algorithm analyzer one can apply?
Sure, one can imagine hypothetically taking $5, even if in reality they would take $10. That’s a spurious output from a different algorithm altogether. it assumes the world where you are not the same person who takes $10. So, it would make sense to examine which of the two you are, if you don’t yet know that you will take $10, but not if you already know it. Which of the two is it?
Thank you for your explanation! Still trying to understand it. I understand that there is no point examining one’s algorithm if you already execute it and see what it does.
I don’t understand that point. you say “nothing stops you”, but that is only possible if you could act contrary to your own algorithm, no? Which makes no sense to me, unless the same algorithm gives different outcomes for different inputs, e.g. “if I simply run the algorithm, I take $10, but if I examine the algorithm before running it and then run it, I take $5″. But it doesn’t seem like the thing you mean, so I am confused.
What if you examine your algorithm and find that it takes the $5 instead?
How can it be possible? if your examination of your algorithm is accurate, it gives the same outcome as mindlessly running it, with is taking $10, no?
It could be the same algorithm that takes the $10, but you don’t know that, instead you arrive at the $5 conclusion using reasoning that could be impossible, but that you don’t know to be impossible, that you haven’t decided yet to make impossible.
So your reasoning is inaccurate, in that you arrive to a wrong conclusion about the algorithm output, right? You just don’t know where the error lies, or even that there is an error to begin with. But in this case you would arrive to a wrong conclusion about the same algorithm run by a different agent, right? So there is nothing special about it being your own algorithm and not someone else’s. If so, the issue is reduced to finding an accurate algorithm analysis tool, for an algorithm that demonstrably halts in a very short time, producing one of the two possible outcomes. This seems to have little to do with decision theory issues, so I am lost as to how this is relevant to the situation.
I am clearly missing some of your logic here, but I still have no idea what the missing piece is, unless it’s the libertarian free will thing, where one can act contrary to one’s programming. Any further help would be greatly appreciated.
Notice (well, you already know that) that accepting that identical agents make identical decisions (superrationality, as it were) and to make different decisions in identical circumstances the agents must necessarily be different, gets you out of many pickles. For example, in the 5&10 game an agent would examine its own algorithm, see that it leads to taking $10 and stop there. There is no “what would happen if you took a different action”, because the agent taking a different action would not be you, not exactly. So, no Lobian obstacle. In return, you give up something a lot more emotionally valuable: the delusion of making conscious decisions. Pick your poison.
An example of gaming the Goodhart’s law: https://www.smbc-comics.com/comic/median
If you know your own actions, why would you reason about taking different actions? Wouldn’t you reason about someone who is almost like you, but just different enough to make a different choice?
If you are a genie who thinks it has created a new mango, check with a sample of humans if they think it is one. Treat humans like you treat the rest of the world: an object of research and non-invasive hypothesis testing. You are not a super-genie until you understand humans better than they understand themselves, including what makes them tick, what would delight or horrify them. So, a genie who ends up tiling the universe with smiley faces is not a super-genie at all. It failed to understand the basics of the most important part of its universe.
Thank you! This post has been very illuminating for me. I have read the link on structural concurrency, and it made perfect sense. To the degree that I thought “How come I did not think of this on my own sooner?“.
The idea of a thread bundle with no main thread but with a staggered connection thread launcher thread for the Happy Eyeballs and a growing number of connection threads makes sense to me. A potential implementation could look like follows (I used a java-like pseudocode):
Not sure if your implementation is similar.
Can more than one succeed at once? (I am not familiar with TCP.)
More than one thread can get a successful connection because of the race condition between connecting and canceling. This is a problem when two or more threads connect and send cancel signals before receiving a cancel signal themselves. In this situation it is not immediately obvious which successful connection should persist and which must die. This can be resolved by having a single synchronized connection slot, so only one thread can stuff the connection in the slot, and all others will find it taken and promptly die. This is one potential implementation of the exit funnel in the structured threading model OP is describing.