hendrycks recently published a paper introducing a new moral theory. the paper contains this insane table, which claims that you should value a foreign stranger at 3e-12 times the value you assign to yourself. even setting aside the fact that this is apparently supposed to be a prescriptive theory, even as a descriptive theory, i think this is utter madness.
the core problem is that it assumes if x% of your total caring is assigned to people other than yourself, then you must give away x% of your wealth to be consistent.
the argument goes that since most people don’t give away more than say 50% of their wealth, then if there are 1e-10 people then each one can only get a tiny sliver of your caring.
but this is wrong, because there is no simple relationship between the % of your caring to be about other people and the % of your money you should give away. i think you should care about random strangers closer to 1e-3 than 1e-12. if you care about each stranger x times as much as yourself, you should keep giving away money to the person who is most in need until each marginal $ helps them more than x times as much as each marginal $ helps you.
if x = 1e-12, then you’re saying you won’t give a single dollar to charity until you have so much money that a dollar helps the stranger more than a trillion times more than it helps you; or, because a trillion dollars is always worth less than a trillion times the value of a dollar due to convexity, you won’t give a single dollar to charity until it would help them more than giving you a trillion dollars, which probably won’t be true until you are absurdly wealthy.
I might just be missing something here, but as presented it does seem a bit like Hendrycks’ arguments, in addition to being philosophically weak, also misunderstands some concept in basic calculus.
That said, I think while your model is (much) better, and closer to my own, it also is descriptively not amazing. I think in practice when people (including myself!) learn that charities are 10x more or less effective than they previously thought, they rarely adjust their giving substantially so that their new donations are in line with marginal utility.
I also think people’s actual decisions are often connected to framing effects, reciprocity norms, etc, rather than pure utility. For example I think if you’re a tourist your willingness to potentially sacrifice your life for a large group of random foreign strangers near you[1] is probably much higher than while you’re sitting at home an ocean away[2].
Ex 1: you’re driving a rental SUV, your brakes don’t work and you have the option of careening either into a group of schoolchildren or off a cliff. Compare that to willingness to donate money or a kidney. Ex 2: rushing into a burning building.
Assuming these are real situations, ignoring bravado/cheap talk etc. Obviously in non-serious hypotheticals people may say things they won’t actually do.
i broadly agree that most people aren’t thinking about this at all. my model is mostly prescriptive, under the constraint that it prescribes actions that are vaguely close to what people do in practice (as opposed to utilitarianism which gives away every penny you have to charity, or hendrycksism which lets billions die if it means you get a really nice apartment.)
Having now skimmed the paper/read some parts a bit more carefully, one thing I do appreciate about it is the attempt to modus tollens Parfit’s ideas about personal identity. I think that’s a worthwhile angle, and more practically useful these days than using Reasons and Persons to dissuade people of egocentricity, which afaict is closer to Parfit’s original goals.
That said, I don’t think this particular implementation makes a lot of sense.
The Shapley mutual information also does way more “heavy lifting” to quote a favored AI phrase, than the paper wants to imply.
Finally, the whole idea is pretty crazy if you think about it. Is it actually rational to value yourself more than the rest of humanity combined? Is this actually consistent with most peoples’ endorsed preferences? This seems implausible!
People on average do seem to value themselves more than the entire rest of the world, perhaps with an exception for their closest friends and family.
I think a lot of people would sacrifice their lives for the rest of humanity, and a lot wouldn’t. I think by revealed preferences, there’s a wide range around self=8 billion strangers, and the distribution is pretty wide. Other measure of revealed preferences like donations seem to roughly agree.
Whether that’s rational in the sense of being logically consistent is debatable. Arguments like Parfit’s are typically not considered convincing in making people a lot more utilitarian, including to me, but that could be caused by motivated reasoning.
Whether that’s rational in the sense of being logically consistent is debatable. Arguments like Parfit’s are typically not considered convincing in making people a lot more utilitarian, including to me, but that could be caused by motivated reasoning.
I think partially motivated reasoning but a lot of it is a defense mechanism. Like if somebody you’ve never met before tells you to donate half your money to charity, you might (correctly!) infer that they do not have your best interests at heart. Regardless of whether they’re a fancy-sounding Oxford academic, a carpenter’s stepson from the Middle East, or your local internet philosophy & rationality blogger.
So I’m not at all surprised that people aren’t convinced by these arguments, nor do I (at a sufficiently high level of abstraction) believe that they ought to be.
When I say motivated reasoning, do you think that means it’s conscious and strategic? I worry it’s used that way more than the academic and IMO more important usage.
modus tollens Parfit’s ideas about personal identity. I think that’s a worthwhile angle, and more practically useful these days than using Reasons and Persons to dissuade people of egocentricity, which afaict is closer to Parfit’s original goals.
Aren’t you dividing twice there, since you: 1) single out a stranger (thus dividing the amount you care about the average stranger by their number) 2) then apply Hendricks central number to that stranger (where now you should be applying the pooled number, since you’re already ignoring all the other group members)
So I think this in fact pretty close to your intuition if interpreted correctly (you say 1e-3, Hendricks says 1e-2).
i don’t understand what you mean. the central column is saying i should care about myself 0.576 much, and Bob from Randomland 1.6e-12 much. where am I dividing twice?
my version of this table would say 1e-7 for self and 1e-10 for random person; the crux of my argument is that the ratio between the two is vastly more important than the absolute fraction of your caring a stranger occupies.
If I understand Hendryck’s logic here, then caring 1/1000 as much about a random stranger as about yourself, means you care several million times more about all random strangers combined than about yourself, which you don’t seem to be saying?
not OP, but that seems like a pretty reasonable conclusion. if i had to sacrifice my own life to save every person i didn’t personally know (ie. 8.1 billion people), i would absolutely do it in a heartbeat. i would also do it to just save a fraction of those people (8M people). once it starts getting down to much smaller fractions (saving 100-3 random people) does it start seeming like a hard tradeoff.
Sorry to be edgy, but, there are situations with options to sacrifice more than your life. I bet you have that limit. It’s just higher than your life.
Another point here, is that people are not that unified over time? Like, you can press some button that all subsequent yous would sincerely curse you for. Nooooo, the infinite torture dimension turned out to be a bit much! quote from pretty selfless human.
i agree that i would rather die instantly than live for 100 years of torture. i don’t think that proves as much as you think. i also think it’s fine for some people to make morbid utility calculations like these, and for others to say “i don’t want to think about that and i’m not going to answer”
Well, sure. You are fine with thinking about sacrificing your life and proudly announcing that, but anything more is too much to even talk about? Morbid calculations for me but not for thee
I just think you are wrong on your self model here. Like, I’m doubtful you would be able to like even saw your hand off without anesthesia, and it’s not any years of torture, it’s like 10 minutes of mild torture. A lot of people would bail on this, including me, and you are claiming what, to be unusually willing to sacrifice stuff, up from the prior?
not a lot of people (maybe literally 0) have had sufficient reason to saw off their own hand for altruistic reasons. i’ve donated a kidney, donate blood often, and gave more than the GWWC pledge when my income was high. any falsifiable claims you’d like to check while we’re speculating about my values?
my point is that % of caring is not a coherent concept, or at least not the one that maps onto the intuitive notion of what % of your wealth you should donate.
specifically, suppose instead of there being 1e10 people, there were 1e100 people. i claim the % of your money you should donate should basically not change at all, even though the % of caring assigned to yourself has plummeted by a huge amount
eh, i think there’s a wide range of reasonablish values that i wouldn’t have objected to. like i think it is vaguely defensible to care anywhere between 1e-1 and 1e-6 ish times as much about a stranger as yourself. 1e-12 is huge outside that window.
It is amazing that a paper that is essentially just a vaguer form of Hamilton’s Rule only cites him once.
As it stands, I think the table is incorrect but “right” in the sense that it really depends on which random constants you assign to these calculations, and I can’t see find any evidence of a careful selection in the paper or in his code.
Fwiw: Reading all comments here (though still w/o working link to hendryck’s article), I’m rather convinced we’re mainly circling around questions of definition.
OP leogao = caring in the actual natural sense
Rest: some special definition that I don’t fully understand but so far doesn’t seem super interesting (to me) and leads to indeed “batshit insane” result when read from natural meaning of “degree of caring” instead of hendrycks seemingly different concept
i think you should care about random strangers closer to 1e-3 than 1e-12.
In the table you reproduce, see the entry in the second column (“Count”) for “Foreign Stranger”: 7.8 x 10^9. For Hendrycks, Foreign Stranger doesn’t mean a particular individual that you encounter, it refers to every human being outside your nation or culture, whether or not you ever heard of them. So you can’t assign each of them a significance of 1/1000th of your caring; that exceeds your resources by a factor of a million.
i don’t think you understood my argument. i didn’t say you assign each of them 1/1000th of your total caring. i said you should assign each of them 1/1000th as much caring as you assign yourself. so you should occupy 1000⁄7 billion of your caring, and Bob from Randomland occupies 1⁄7 billion of your caring.
the entire point of my argument is it actually doesn’t matter what % of your own caring you take up. that’s not the relevant thing. the relevant thing is how much you care about each stranger relative to yourself, and the shape of your money utility curve.
I think it’s wrong that humans empirically value themselves as 1/10000000 of the rest of humanity. I guess I see your point, that you have some budget of caring, and on occasion you are willing to dispense quite a lot of it to a single stranger. But you would not dispense 99.9999% of your caring to all the strangers combined.
% of your caring is a flawed metric that doesn’t mean anything though! even if there are 1e100 strangers out there, as long as your caring about each individual relative to yourself is still 1/1000, the fraction of your money you’re willing to donate remains constant!
You have some pool of caring you are willing to donate, then in the case of where all other humans need a donation, they will each receive pool/total_pop. Then you care about each of them as pool/total_pop.
Like, if one encounters an opportunity to donate to a single stranger who needs it, people go above that pool/total_pop, but it doesn’t mean they would give more than total of pool in previous case. The scaling is weird.
i don’t understand what in my original post you disagree with. there is no such thing as a fixed pool of caring, i don’t even know what that means. the actual constraint is you have some finite amount of money. money is not the same as caring because each dollar is worth a different amount depending on how much money the recipient has. caring is just a multiplier on how much other people’s happiness is worth to you compared to your own happiness. if some of your dollars will bring so much more happiness to someone else (eg by saving their life) than yourself (eg by buying a slightly larger apartment) that it outweighs the fact that you don’t care about them as much as yourself, then you should give that dollar away. otherwise, you shouldn’t.
when I read “caring” + the table I assume something roughly equal to “percentage of attention/money/other resources spent”, otherwise how would you normalize caring to 1 (as is done in the table)?
hendrycks recently published a paper introducing a new moral theory. the paper contains this insane table, which claims that you should value a foreign stranger at 3e-12 times the value you assign to yourself. even setting aside the fact that this is apparently supposed to be a prescriptive theory, even as a descriptive theory, i think this is utter madness.
the core problem is that it assumes if x% of your total caring is assigned to people other than yourself, then you must give away x% of your wealth to be consistent.
the argument goes that since most people don’t give away more than say 50% of their wealth, then if there are 1e-10 people then each one can only get a tiny sliver of your caring.
but this is wrong, because there is no simple relationship between the % of your caring to be about other people and the % of your money you should give away. i think you should care about random strangers closer to 1e-3 than 1e-12. if you care about each stranger x times as much as yourself, you should keep giving away money to the person who is most in need until each marginal $ helps them more than x times as much as each marginal $ helps you.
if x = 1e-12, then you’re saying you won’t give a single dollar to charity until you have so much money that a dollar helps the stranger more than a trillion times more than it helps you; or, because a trillion dollars is always worth less than a trillion times the value of a dollar due to convexity, you won’t give a single dollar to charity until it would help them more than giving you a trillion dollars, which probably won’t be true until you are absurdly wealthy.
this is batshit insane!
I might just be missing something here, but as presented it does seem a bit like Hendrycks’ arguments, in addition to being philosophically weak, also misunderstands some concept in basic calculus.
That said, I think while your model is (much) better, and closer to my own, it also is descriptively not amazing. I think in practice when people (including myself!) learn that charities are 10x more or less effective than they previously thought, they rarely adjust their giving substantially so that their new donations are in line with marginal utility.
I also think people’s actual decisions are often connected to framing effects, reciprocity norms, etc, rather than pure utility. For example I think if you’re a tourist your willingness to potentially sacrifice your life for a large group of random foreign strangers near you[1] is probably much higher than while you’re sitting at home an ocean away[2].
Ex 1: you’re driving a rental SUV, your brakes don’t work and you have the option of careening either into a group of schoolchildren or off a cliff. Compare that to willingness to donate money or a kidney. Ex 2: rushing into a burning building.
Assuming these are real situations, ignoring bravado/cheap talk etc. Obviously in non-serious hypotheticals people may say things they won’t actually do.
i broadly agree that most people aren’t thinking about this at all. my model is mostly prescriptive, under the constraint that it prescribes actions that are vaguely close to what people do in practice (as opposed to utilitarianism which gives away every penny you have to charity, or hendrycksism which lets billions die if it means you get a really nice apartment.)
Having now skimmed the paper/read some parts a bit more carefully, one thing I do appreciate about it is the attempt to modus tollens Parfit’s ideas about personal identity. I think that’s a worthwhile angle, and more practically useful these days than using Reasons and Persons to dissuade people of egocentricity, which afaict is closer to Parfit’s original goals.
That said, I don’t think this particular implementation makes a lot of sense.
The Shapley mutual information also does way more “heavy lifting” to quote a favored AI phrase, than the paper wants to imply.
Finally, the whole idea is pretty crazy if you think about it. Is it actually rational to value yourself more than the rest of humanity combined? Is this actually consistent with most peoples’ endorsed preferences? This seems implausible!
People on average do seem to value themselves more than the entire rest of the world, perhaps with an exception for their closest friends and family.
I think a lot of people would sacrifice their lives for the rest of humanity, and a lot wouldn’t. I think by revealed preferences, there’s a wide range around self=8 billion strangers, and the distribution is pretty wide. Other measure of revealed preferences like donations seem to roughly agree.
Whether that’s rational in the sense of being logically consistent is debatable. Arguments like Parfit’s are typically not considered convincing in making people a lot more utilitarian, including to me, but that could be caused by motivated reasoning.
I think partially motivated reasoning but a lot of it is a defense mechanism. Like if somebody you’ve never met before tells you to donate half your money to charity, you might (correctly!) infer that they do not have your best interests at heart. Regardless of whether they’re a fancy-sounding Oxford academic, a carpenter’s stepson from the Middle East, or your local internet philosophy & rationality blogger.
So I’m not at all surprised that people aren’t convinced by these arguments, nor do I (at a sufficiently high level of abstraction) believe that they ought to be.
Motivated reasoning is a defense mechanism.
Motivated reasoning, confirmation bias, and AI risk theory
When I say motivated reasoning, do you think that means it’s conscious and strategic? I worry it’s used that way more than the academic and IMO more important usage.
Can you explain a bit more what you mean by this?
Aren’t you dividing twice there, since you:
1) single out a stranger (thus dividing the amount you care about the average stranger by their number)
2) then apply Hendricks central number to that stranger (where now you should be applying the pooled number, since you’re already ignoring all the other group members)
So I think this in fact pretty close to your intuition if interpreted correctly (you say 1e-3, Hendricks says 1e-2).
i don’t understand what you mean. the central column is saying i should care about myself 0.576 much, and Bob from Randomland 1.6e-12 much. where am I dividing twice?
my version of this table would say 1e-7 for self and 1e-10 for random person; the crux of my argument is that the ratio between the two is vastly more important than the absolute fraction of your caring a stranger occupies.
If I understand Hendryck’s logic here, then caring 1/1000 as much about a random stranger as about yourself, means you care several million times more about all random strangers combined than about yourself, which you don’t seem to be saying?
not OP, but that seems like a pretty reasonable conclusion. if i had to sacrifice my own life to save every person i didn’t personally know (ie. 8.1 billion people), i would absolutely do it in a heartbeat. i would also do it to just save a fraction of those people (8M people). once it starts getting down to much smaller fractions (saving 100-3 random people) does it start seeming like a hard tradeoff.
Sorry to be edgy, but, there are situations with options to sacrifice more than your life. I bet you have that limit. It’s just higher than your life.
Another point here, is that people are not that unified over time? Like, you can press some button that all subsequent yous would sincerely curse you for. Nooooo, the infinite torture dimension turned out to be a bit much! quote from pretty selfless human.
i agree that i would rather die instantly than live for 100 years of torture. i don’t think that proves as much as you think. i also think it’s fine for some people to make morbid utility calculations like these, and for others to say “i don’t want to think about that and i’m not going to answer”
Well, sure. You are fine with thinking about sacrificing your life and proudly announcing that, but anything more is too much to even talk about? Morbid calculations for me but not for thee
I just think you are wrong on your self model here. Like, I’m doubtful you would be able to like even saw your hand off without anesthesia, and it’s not any years of torture, it’s like 10 minutes of mild torture. A lot of people would bail on this, including me, and you are claiming what, to be unusually willing to sacrifice stuff, up from the prior?
Be less wrong etc
not a lot of people (maybe literally 0) have had sufficient reason to saw off their own hand for altruistic reasons. i’ve donated a kidney, donate blood often, and gave more than the GWWC pledge when my income was high. any falsifiable claims you’d like to check while we’re speculating about my values?
Wow, okay, you are right. You are way up, yeah.
But I still think there are limits, higher for you.
my point is that % of caring is not a coherent concept, or at least not the one that maps onto the intuitive notion of what % of your wealth you should donate.
specifically, suppose instead of there being 1e10 people, there were 1e100 people. i claim the % of your money you should donate should basically not change at all, even though the % of caring assigned to yourself has plummeted by a huge amount
once you write down a table with the first row and column, the result will be batshit insane no matter what numbers you put in.
eh, i think there’s a wide range of reasonablish values that i wouldn’t have objected to. like i think it is vaguely defensible to care anywhere between 1e-1 and 1e-6 ish times as much about a stranger as yourself. 1e-12 is huge outside that window.
I agree in absolute terms, but some forms of insanity are greater than others.
It is amazing that a paper that is essentially just a vaguer form of Hamilton’s Rule only cites him once.
As it stands, I think the table is incorrect but “right” in the sense that it really depends on which random constants you assign to these calculations, and I can’t see find any evidence of a careful selection in the paper or in his code.
Fwiw: Reading all comments here (though still w/o working link to hendryck’s article), I’m rather convinced we’re mainly circling around questions of definition.
OP leogao = caring in the actual natural sense
Rest: some special definition that I don’t fully understand but so far doesn’t seem super interesting (to me) and leads to indeed “batshit insane” result when read from natural meaning of “degree of caring” instead of hendrycks seemingly different concept
In the table you reproduce, see the entry in the second column (“Count”) for “Foreign Stranger”: 7.8 x 10^9. For Hendrycks, Foreign Stranger doesn’t mean a particular individual that you encounter, it refers to every human being outside your nation or culture, whether or not you ever heard of them. So you can’t assign each of them a significance of 1/1000th of your caring; that exceeds your resources by a factor of a million.
i don’t think you understood my argument. i didn’t say you assign each of them 1/1000th of your total caring. i said you should assign each of them 1/1000th as much caring as you assign yourself. so you should occupy 1000⁄7 billion of your caring, and Bob from Randomland occupies 1⁄7 billion of your caring.
the entire point of my argument is it actually doesn’t matter what % of your own caring you take up. that’s not the relevant thing. the relevant thing is how much you care about each stranger relative to yourself, and the shape of your money utility curve.
I think it’s wrong that humans empirically value themselves as 1/10000000 of the rest of humanity. I guess I see your point, that you have some budget of caring, and on occasion you are willing to dispense quite a lot of it to a single stranger. But you would not dispense 99.9999% of your caring to all the strangers combined.
% of your caring is a flawed metric that doesn’t mean anything though! even if there are 1e100 strangers out there, as long as your caring about each individual relative to yourself is still 1/1000, the fraction of your money you’re willing to donate remains constant!
I don’t get it.
You have some pool of caring you are willing to donate, then in the case of where all other humans need a donation, they will each receive pool/total_pop. Then you care about each of them as pool/total_pop.
Like, if one encounters an opportunity to donate to a single stranger who needs it, people go above that pool/total_pop, but it doesn’t mean they would give more than total of pool in previous case. The scaling is weird.
Your previous statements are unclear.
i don’t understand what in my original post you disagree with. there is no such thing as a fixed pool of caring, i don’t even know what that means. the actual constraint is you have some finite amount of money. money is not the same as caring because each dollar is worth a different amount depending on how much money the recipient has. caring is just a multiplier on how much other people’s happiness is worth to you compared to your own happiness. if some of your dollars will bring so much more happiness to someone else (eg by saving their life) than yourself (eg by buying a slightly larger apartment) that it outweighs the fact that you don’t care about them as much as yourself, then you should give that dollar away. otherwise, you shouldn’t.
when I read “caring” + the table I assume something roughly equal to “percentage of attention/money/other resources spent”, otherwise how would you normalize caring to 1 (as is done in the table)?
Can you give the link? If what you’re saying is implied by his theory yes it would be wholly insane, but I have hard time believing it
Paper here: https://eigenism.org/paper.pdf
https://x.com/hendrycks/status/2052422910133104670?s=20
hendrycks doubles down on the claim in this thread
Thx, though I cannot read it (I don’t have X and I don’t want to make an X account).
You can always use xcancel.com as a mirror for X: https://xcancel.com/hendrycks/status/2052422910133104670