robot altruist powered by love
nowl
It’s actually 150% :
it’s fair to say that I wrote 300% of this book! Nate then wrote the other 150%! The combined material was ruthlessly cut down, by Nate, and either rewritten or replaced by Nate. I couldn’t possibly write anything this short, and I don’t expect it to read like standard eliezerfare. (Except maybe in the parables that open most chapters.)
(So it sounds like they both wrote new text, then Nate cut it down)
I thought it was good because it made me ruminate on how close the real world is to being driven by these sorts of characters.
I don’t know what updates to make from these studies, because:
Idk if the negative effects they found would be prevented by supplements/blood tests.
Idk if there were selection effects on which studies end up here. I know one could list studies for either conclusion (“eating animals is more/less healthy than not”), as is true of many topics.
What process determined the study list?
Most people would consider sacrificing their health for others to be too demanding an ethical framework.
(This comment is local to the quote, not about the post’s main arguments) Most people implicitly care about the action/inaction distinction. They think “sacrificing to help others” is good but in most cases non-obligatory. They think “proactively hurting others for own benefit” is bad, even if it’d be easier.
Killing someone for their body is a case of harming for own gain. The quote treats it as just not making a sacrifice.
I think it does feel to many that not-killing animals is proactive helping, and not-not-killing animals is inaction, because the default is to kill them (and it’s abstracted away so actually one is only paying someone else to kill them and it’s never presented to one as this, and so on). And that’s part of why animal-eating is commonly accepted (though the core reason is usually thinking animals are not all that morally relevant).
But in the end “proactively helping others others is nonobligatory” wouldn’t imply “not-killing animals is nonobligatory”.
have they tested that?
they say in a reply that they think the difference they notice is caused by folate (aka vitamin B9) deficiency, but many vegans take supplements and get blood tests to avoid having any deficiencies in the main nutrients. so maybe they’re only noticing vegans who don’t take supplements (that contain b9).
So—was it worth backing a candidate?
Yes. That path ultimately led me to a place of greater agency and purpose.
Do you know how the others were effected? Have you kept in contact?
Sounds like some of them were hurt? You say you forced them to take action, and that afterwards they “shifted to ‘self-care.’ ”
If you newly preorder MIRI’s book because of this comment, I’ll donate the cost of the book (plus taxes/shipping) to your preferred charity.[1]
(I’ll do this for up to 50 people for now. 1⁄50 so far.)
- ^
The cost varies with book format, so reply or message me with: total cost, your preferred charity.
- ^
That observation makes a narrow window valuable: whatever we want preserved must be radiated now, redundantly, and in a form that minds unlike ours can decode with high probability. The useful direction is not “build a grand vault” but “broadcast a self-bootstrapping curriculum until the last switch flips.”
A way to do this without needing to create a formal (“universal language”) alignment curriculum would be to just broadcast a lot of internet data, somehow emphasizing dictionaries (easier to first interpret) and LessWrong text. One way to emphasize would be to send them more times. Maybe include some formalisms that try to indicate the concept of language.
In case we’re already able to radiate arbitrary bitsequences, there might not be any large hurdles to doing this.
I’m curious if others have similar experiences or clarifying thoughts.
I have a similar experience. I’m not sure what to write, but here are some thoughts.
The healthy kid probably wouldn’t be excited by a back-flip robot arbitrarily instantiated in a hell. Its awesomeness is context-dependent.
What kinds of motivation do you have access to?
In fiction, the protagonists have motivation even when the world seems really bad. They want to change it. Most real humans instead keep the torture out of mind, but not you.
Separately, a theorist may devote so much energy to their object of study that it becomes intrinsically important to their mind. This is myopic in a way (it’s about the thing itself). But maybe it can be non-myopic if thinking about it is instrumental, so contextualizing doesn’t diminish it. Maybe passion and heroic motivation can then fuse together.
I read your prior comment as saying:
Vegans make moral judgments
Therefore all vegans are judgemental
In my comment, (1) was ~”most vegans believe others are acting immorally.” The same isn’t true of OP’s other examples, like polygamists.
To elaborate more, I’d be confused in the same way by this labeling of, e.g., someone who opposed historical slavery but wasn’t loud about their beliefs every time they encountered it. Like, to me the central case of “A judges B” is “A thinks B is not living up to moral standards to the degree it’s reasonable to expect/hope others to”.
I think vegans are quiet about their beliefs for reasons that are usually not “they don’t actually think it’s that bad for others to eat animals.” I think the reasons are usually things like, “it wouldn’t help to speak up right now,” “they’d just mock me,” etc.
The below is a sort of reductio ad absurdum of dictionary definitions being helpful here.
This seems like one of those definitions that says little because it refers back to the base word (in this case “judge”). What does this actually mean? The link defines “judge” as “to form a negative opinion about”. I’m not sure what “characterized by a tendency (to form a negative opinion about) harshly” would mean. Replacing “harshly” with its dictionary definitions only makes things worse: whether a belief is “excessively critical or negative” is relative to one’s beliefs; some would label the belief “buying animal products is morally similar to buying things produced with slave labor” as “excessively critical or negative”, while others wouldn’t. “unduly severe in making demands”—same thing with “unduly severe”, also an “opinion” itself doesn’t make demands.
(Also, there’s the question of if “characterized by a tendency” means “as an intrinsic personal quality”, or if “consistently as conclusions of one moral idea” counts)
most vegans are not judgemental
I don’t understand. Most vegans believe buying animal products is immoral, which implies they believe people who do are acting immorally. I’m not sure what else judgement could mean. (Maybe “expresses this”?)
Edit: This is copied from my comment in the thread, I should have written it at the start.
To elaborate more, I’d be confused in the same way by this labeling of, e.g., someone who opposed historical slavery but wasn’t loud about their beliefs every time they encountered it. Like, to me the central case of “A judges B” is “A thinks B is not living up to moral standards to the degree it’s reasonable to expect/hope others to”.
I think vegans are quiet about their beliefs for reasons that are usually not “they don’t actually think it’s that bad for others to eat animals.” I think the reasons are usually things like, “it wouldn’t help to speak up right now,” “they’d just mock me,” etc.
To avoid double-counting of evidence, I’m guessing that the first person controls the
@ENERGY
twitter account. Its bio says “Led by @SecretaryWright”.
I independently noticed this too, and I think it’s true of mathematical universes. I also think this universe’s primitives include something (qualia) that is not expressible in our formal systems (i.e. ‘math’ as we know it), even in principle. (I don’t think any current ontology is close to being able to resolve core questions about this universe’s primitives.)
There’s a history here of discussion of how to make good air purifiers (like this). Today I learned about ULPA filters and found someone’s DIY video using one of them.
A ULPA filter can remove from the air at least 99.999% of dust, pollen, mold, bacteria and any airborne particles with a minimum particle penetration size of 120 nanometres.
I recently moved to a place with worse air quality. The fatiguing effect on me is noticeable to me (though I suspect I might have vulnerable physiology). It makes me want to try to update far in the other direction: maybe any level of impurity causes bad effects, but I didn’t notice them (or associate it with air quality) cause some impurity was constant even with normal filters.
My current idea is to try to make a system like in the video, plus a tube so I can get the filtered air directly to my face. I don’t know how feasible it is to filter a whole room to that level.
Do you believe that actors can not protect themself from blackmail with pre-commitments?
I don’t believe that. If I could prove that, I could also prove the opposite (i.e. replace ‘cannot’ with ‘can always’), because what a decision problem is about is arbitrary. The arbitrariness means any abstract solution has to be symmetric. In example 1, an actor protects themself from blackmail. We can also imagine an inverted example 1, where the more sophisticated conditioner instead represents the blackmailer.
I think that what happens when both agents are advanced enough to fully understand this kind of problem is most similar to example 5. But in reality, they wouldn’t recursively simulate each other forever, because they’d think that would be a waste of resources. They’d have to make some choice eventually. They’d recognize that there is no asymmetric solution to the abstract problem, before making that choice. I don’t know what their choice would be.
I can give a guess, with much less confidence than what I wrote about the logic. Given they’re both maximally advanced, they’d know they’ll perform similar reasoning; it’s similar to the prisoners-dillema-with-clone situation. They could converge to a compromise policy-about-blackmail-in-general for their values in their universe, if there are any such compromises available for their values in their universe. I’m finding it hard to predict what such a ‘compromise’ could be when they’re not on relatively equal footing, though, e.g. when one can blackmail the other, and the other can’t do it back. When they are on equal footing, e.g. have equal incentive to blackmail each other, maybe they would do this: “give each other the things the other wants, in cases where this increases our average value” (which is like normal acausal trade).
After thinking about it more (38 minutes more, compared to when I first posted this comment. I’ve been heavily editing/expanding it), it does feel like a game of ‘mutually’ choosing where-they-end-up-in-the-logical-space, and not one of ‘committing’. Of course, to the extent the decisions are symmetric, they could choose to lock in “I commit to not give in to blackmail, you commit to make and follow through on blackmail”; they just both wouldn’t want that.
I don’t quite know what else there is to do in that situation other than “symmetrically converge to the mid-point”. Even though I dislike where that leads in “unequal” cases like I described two paragraphs up (<the better-situated superintelligence makes half the blackmail, and the worse-situated superintelligence gives in every time>). Logic doesn’t care what I dislike. If this is true, I’ll just have to hope the side of good wins situationally and can prevent this from manifesting in cases it cares about.
Disclaimer: the above is about two superintelligences in isolation, not humans.
- 6 Jul 2025 12:06 UTC; 1 point) 's comment on nowl’s Shortform by (
I’ve thought about this before too, and I no longer feel confused about it. It helps to reduce this into a decision problem. The decision problem could ‘be about’ programs deciding anything, in principle; it doesn’t need to be ‘agents deciding whether to blackmail’.
I’ll show decision structures symmetric to your examples, then give some more examples that might help.
Language of post’s examples Language for decision problems Crime boss
MayorProgram C
Program MCrime boss does not blackmail mayor
Crime boss blackmails mayorC outputs 0
C outputs 1Mayor does not give in to blackmail
Mayor gives in to blackmailM outputs 0
M outputs 1Your first example: M is a more advanced conditioner
C runs:if [M outputs 1 if C outputs 1], output 1; else, output 0
M runs:if C runs "If [M outputs 1 if C outputs 1], output 1; else, output 0", output 0; else, <doesn't occur, unspecified>
Outcome: Both output0
Your second example
C runs:output 1
[1]
M runs:<unspecified>
Outcome: unspecifiedWhen put like this, it seems clear to me that there’s no paradox here.
Below are examples not from the post. The last one where both try to condition is most interesting.
3. C is commit-rock[2], M is conditioner
C runs:output 1
M runs:if C runs "If [M outputs 1 if C outputs 1], output 1; else, output 0", output 0; else, output 1
Outcome: both output1
4. Both are commit-rocks
C runs:output 1
M runs:output 0
Outcome: C outputs1
, M outputs0
5. Both condition
C runs:run M. if M outputs 1 when C outputs 1, output 1; else, output 0
M runs:run C. if C outputs 0 when M outputs 0, output 0; else, output 1
Outcome: The programs run the other recursively and never halt, as coded.Again, there is no paradox here.
To directly answer the question in the title, I think a commitment “to not give into blackmail” and a commitment “to blackmail” are logically symmetric, because what a decision problem is about (what the
0
s and1
s correspond to in real life) is arbitrary. (Also, separately, there is no “commitment” primitive)- ^
I know in your second example you want the Crime boss’s decision to be conditional on the Mayor in some way, but it’s not specified how, so I’m going to just leave it like this with this footnote.
- ^
In some posts about decision dilemmas, the example is used of “a rock with the word defect written on it” to make it clear that the decision to defect was not conditional on the other player.
- 6 Jul 2025 12:06 UTC; 1 point) 's comment on nowl’s Shortform by (
- ^
I didn’t have cached what IABIED stands for before clicking this post. Maybe this would be more seen if it referred to the book directly, because the book seems to have lots of support behind it.
Because of the is-ought gap, value (how one wants the world to be) doesn’t inherently change in response to evidence/beliefs (how the world is).[1]
So a hypothetical competent AI designer[2] doesn’t have to go out of their way to make the value not update on evidence. Nor to make any beliefs not update on evidence.
(If an AI is more like a human then [what it acts like it values] could change in response to evidence though yea. I think most of the historical alignment theory texts aren’t about aligning human-like AIs (but rather hypothetical competently designed ones).)
I’ve had someone keep disagreeing with this once, so I’ll add that a value is not a statement about the world, so how would the Bayes equation update it?
a hypothetical competently designed AI could separately have a belief about “What I value”, or more specifically, about “the world contains something here running code for the decision process that is me, so its behavior correlates with my decision”, but regardless of how that belief gets manipulated by the hypothetical evidence-presenting demon (maybe it’s manipulated into “with high probability, the thing runs code that values y instead, and its actions don’t correlate with my decision”), the next step in the AI goes: “given all these beliefs, what output of the-decision-process-that-is-me best fulfills <hardcoded value function>”.
(if it believes there is nothing in the world whose behavior correlates with the decision, all decisions would do nothing and score equally in such case. it’d default to acting under world-possibilities which it assigns lower probability but where it has machines to control).
(one might ask) okay but could the hypothetical demon manipulate its platonic beliefs about what “the decision process that is me” is? well, maybe not, because that’s (as separate from the above) also not the kind of thing that inherently updates on evidence about a world.
but if it were manipulated, somehow—im not quite sure what to even imagine being manipulated, maybe parts of the process rely on ‘expectations’ about other parts so it’s those expectations (though only if they’re not hardcoded in? so some sort of AI designed to discover some parts of ‘what it is’ by observing its own behavior?) - there’d still be code at some point saying to [score considered decisions on how much they fulfill <hardcoded value function>, and output the highest scoring one]. it’s just parts of the process could be confused(?)/hijacked, in this hypothetical.
(not grower)