Morphism

Karma: 245

Morphism 3 Feb 2026 2:56 UTC
1 point
0
in reply to: Felix Choussat’s comment on: Pi Rogers’s Shortform

By the time you have AIs capable of doing substantial work on AI r&d, they will also be able to contribute effectively to alignment research (including, presumably, secret self-alignment).

Humans do substantial work on AI r&d, but we haven’t been very effective at alignment research. (At least, according to the view that says alignment is very hard, which typically also says that basically all of our current “alignment” techniques will not scale at all.)

Even if takeoff is harder than alignment, that problem becomes apparent at the point where the amount of AI labor available to work on those problems begins to explode, so it might still happen quickly from a calendar perspective.

Yup, this is very possible.

Morphism 3 Feb 2026 2:50 UTC
11 points
0
in reply to: yams’s comment on: Pi Rogers’s Shortform

Not sure why you replied in three different places. I will (try to) reply to all of them here.

I did this so that you could easily reply to them separately, since they were separate responses.

I do not consider linking to those Aella and Duncan posts a literature review, nor do I consider them central examples of work on this topic.

I did not link them for that reason. I linked them to ask whether my understanding of the general problem you’re pointing to is correct: “Especially bad consequences relative to other instances of this mistake because the topic relates to people’s relationship with their experience of suffering and potentially unfair dismissals of suffering, which can very easily cause damage to readers or encourage readers to cause damage to others.”

I am not going to do a literature review on your behalf.

Fair. I was simply wondering whether or not you had something to back up your claim that this topic has been covered “quite extensively”.

Your explanation of how you will be careful gave me no confidence; the cases I’m worried about are related to people modeling others as undergoing ‘fake’ suffering, and ignoring their suffering on that basis. This is one of the major nexuses of abuse stumbled into by people interested in cognition. You have to take extreme care not to be misread and wielded in this way, and it just really looks like you have no interest in exercising that care. You’re just not going to anticipate all of the different ways this kind of frame can be damaging to someone and forbid them one by one.

I would like to be clear that I do not intend to claim that Newcomblike suffering is fake in any way. Suffering is a subjective experience. It is equally real whether it comes from physical pain, emotional pain, or an initially false belief that quickly becomes true. Hopefully posting it in a place like Lesswrong will keep it mostly away from the eyes of those who will fail to see this point.

I again ask though, how would a literature review help at all?

I’d look at Buddhist accounts of suffering as a starting point.

This does vibe as possibly relevant.

If you’re going to invite people to sink hundreds of cumulative person hours into reading your thing, you really should actually try to make it good, and part of that is having any familiarity at all with relevant background material.

I’m not sure how to feel about this general attitude towards posting. I think with most things I would rather err on the side of posting something bad; I think a lot of great stuff goes unwritten because people’s standards on themselves are too high (of course, Scott’s law of advice reversal applies here, but I think, given I’ve only posted a handfull of times, I’m on the “doesn’t post enough” end of the spectrum). I try to start all of my posts with a TLDR, so that people who aren’t interested or who think they might be harmed by my post can steer clear. Beyond this, I think it’s the readers’ responsibility to avoid content that will harm them or others.

Morphism 2 Feb 2026 21:31 UTC
2 points
−1
in reply to: koanchuk’s comment on: Pi Rogers’s Shortform
Psych wards are horrible Kafkaesque nightmares, but I don’t think they are Out to Get You in the way Zvi describes. Things that are Out to Get You feed on your slack. For example, social media apps consume your attention. Casinos consume your money. They are incentivized to go after those who have a lot of slack to lose (“whales”), and those who have few defenses against their techniques (see Tsvi’s comment about desperation.

Psych wards are, to a first approximation, prisons: one of their primary functions is to destroy your slack so that you cannot use it to do something that society at large dislikes. In the prison case: committing crimes; in the psych ward case (for depression): killing yourself. They destroy your slack because they don’t want you to have it. Things that Get You consume your slack because they want it for themselves.

Morphism 2 Feb 2026 21:18 UTC
1 point
0
in reply to: yams’s comment on: Pi Rogers’s Shortform
How does reviewing literature help avoid this failure mode?

Morphism 2 Feb 2026 21:17 UTC
1 point
0
in reply to: yams’s comment on: Pi Rogers’s Shortform
Could you point me to some specific examples of this? Or at least, could you tell me if these seem like correct examples:
- Thresholding by Duncan Sabien
- Frame Control by Aella
If I write a post about Newcomblike suffering, I would probably want to encourage people to escape such situations without hurting others, and emphasize that, even if someone is ~directly inflicting this on you, thinking of it as “their fault” is counterproductive. Hate the game, not the players. They are in traps much the same as yours.

Morphism 2 Feb 2026 21:12 UTC
4 points
1
in reply to: yams’s comment on: Pi Rogers’s Shortform
Where might I find such pre-existing literature? I have never seen this discussed before, though it’s sort of eluded* to in many of Zvi’s posts, especially in the immoral mazes sequence.

I must admit, if you’re talking about literature in the world of social psych outside Lesswrong, I don’t have much exposure to it, and I don’t really consider it worth my time to take a deep dive there, since their standards for epistemic rigor are abysmal.

But if you have pointers to specific pieces of research, I’d love to see them.

*eluded or alluded? idk?

Morphism 2 Feb 2026 21:05 UTC
6 points
0
on: Tell people as early as possible it’s not going to work out
This post seems to presuppose that:
1. There exist people who are irredeemably a Bad Fit for the rationalist community (and/or {insert subcommunity or adjacent community})
2. It is possible and not extremely costly to detect these people with high confidence early on.
These both seem false to me, or at least, not obviously true.

Morphism 1 Feb 2026 21:25 UTC
13 points
8
on: Pi Rogers’s Shortform
Newcomblike suffering

Many things in the world want you to suffer. Signalling suffering is useful in many social situations. For example, suffering is a sign that one has little slack, and so entities that are out to get you will target those who signal suffering less.

Through Newcomblike self-deception, a person can come to believe that they are suffering. The easiest way to make yourself think that you are suffering is to actually suffer. In this way, the self-deception hyperstitions itself into reality. Perhaps a large amount of human suffering is caused by this.

Solving this problem may be of great interest to those who want to reduce human suffering.

I may write a longer post about this with more details and a more complete argument. If you particularly want this, please comment or dm, as that will make me more likely to write it.

Morphism 6 Dec 2025 1:48 UTC
15 points
0
on: Pi Rogers’s Shortform
RSI should be at least as hard as alignment, since in order to recursively self-improve, an AI must itself be able to solve the alignment problem wrt its own values. Thus, “alignment is hard” and “takeoff is fast” are anti-correlated.

What, if anything, is wrong with this line of reasoning?

Morphism 30 Nov 2025 8:35 UTC
2 points
1
on: The Joke
Interesting. I have thought about these questions a lot, and came to a possibly different conclusion about verifying the reality of a tree.

<spoiler?>
1. All you can be certain of is the existence and content of your own experience, in the present moment. You currently experience looking at a tree. Now you experience remembering looking at that tree as you ponder the question “How do I know if the tree is real?”
2. Your present experience comes with access to this think you seem to call “memory”. In particular, some memory is “episodic”. The type signature of “episodic memory” feels vaguely similar to that of an experience. This, and your intuitive sense of mostly-linear time, are enough to reasonably guess that past memories actually correspond to experiences, and that these experiences seem to form a linear order called “time”. By observing memories of having observed memories, you conclude that an experience only has access to those before it on its time-curve, and that these memories are somewhat unreliable and imprecise.
3. You notice that the “visual sensory input” part of your experience contains data on multiple levels of abstraction, spanning from the texture of a red smudge on a ball to the fact that there is a ball in the first place. You call the latter type of data “object detection”. You notice that nearby time-slices of experience mostly share detected objects (object permanence).
4. Based on memories of experiences containing observations of “humans” and on experiences of interacting with these humans and of observing your own physical body in various ways (e.g. via mirrors, generalizing their apparent effect on your viewing of other objects to yourself), you notice that your physical presence is of a similar type to these “humans”, and conclude that the experience that is you is simply that of a human, and so other humans have similar experience. Thus, experiences interact with one another in at least two ways: first is through episodic memory, and second is through physical interaction between the carriers of said experience.
5. You notice that the medium through which both kinds of these interactions, along with most sensory observations, seem to occur, (“physical reality”) very stably obeys certain rules, including a more general form of object permanence, which you refer to as “existence” of a particular object.
6. You note that the way you conceptualize and use language was pretty much entirely learned from interaction with other nearby humans, and conclude that, at least for the most basic things like object detection, humans mostly use the same words to describe the same things. Thus, the exact meaning of “existence” as it relates to the tree is pretty much agreed upon.
7. According to your memory you saw the tree. You also touched and felt the tree. You noticed that the tree grants you the ability to remain suspended above the ground for a lot longer than you otherwise could, via you climbing it. You talk to others and confirm that they also observed the tree through multiple channels. You conclude that the tree exists.
</probably-not-much-of-a-spoiler>

Morphism 20 Jul 2025 6:01 UTC
6 points
1
on: Pi Rogers’s Shortform
Dear past-me of [exact time glomarized; .5-5 years ago],

You are about to be recruited to a secret world-saving org. (Y’know, like Leverage, except it’s a member of the dark forest of Leverage-likes that operate even less publicly than Leverage).

Don’t join.

They will give you very compelling reaons to join. Don’t ignore them. But take into account that I, your future self, also heard all of those things, decided to join, and now regret it.

Don’t. Instead, please continue that other thing you were doing, before they asked you to join there thing. The other thing will probably have better results for you and the world.

This warrants a longer post, but on pain of that post sitting in my obsidian with a “draft” tag for ages, having approximately zero causal impact on the outside world, I’m posting this now.

(all of my replies to messages concerning this will be delayed by 0 or more months for glomarization purposes)

Morphism 5 Jul 2025 2:54 UTC
2 points
0
in reply to: Karl Krueger’s comment on: Pi Rogers’s Shortform
I’ve been working on applying the anti-infohazard to the “infohazards” I know.

Morphism 2 Jul 2025 20:31 UTC
6 points
0
in reply to: habryka’s comment on: Pi Rogers’s Shortform
I’d categorize that as an exfohazard rather than an infohazard.

Info on how to build a nuke using nothing but parts of a microwave doesn’t harm the bearer, except possibly by way of some other cognitive flaw/vulnerability (e.g. difficulty keeping secrets)

Maybe “cognitohazard” is a closer word to the thing I’m trying to point towards. Though, I would be interested in learning about pure infohazards that aren’t cognitohazards.

(If you know of one and want to share it with me, it may be prudent to dm rather than comment here)

Morphism 2 Jul 2025 6:25 UTC
5 points
0
on: Pi Rogers’s Shortform
All “infohazards” I’ve seen seem to just be more and more complicated versions of “Here’s a Löbian proof that you’re now manually breathing”. A sufficiently well-designed mind would recognize these sorts of things before allowing them to fully unfold.

Morphism 25 Jun 2025 18:05 UTC
1 point
0
on: Luna Lovegood and the Chamber of Secrets—Part 2
Luna tapped the Hufflepuff Common room entrance to the tune of “fish and chips” and it dosed her in vinegar.
*doused

Morphism 6 Feb 2025 22:10 UTC
5 points
0
on: Pi Rogers’s Shortform
Convex agents are practically invisible.

We currently live in a world full of double-or-nothing gambles on resources. Bet it all on black. Invest it all in risky options. Go on a space mission with a 99% chance of death, but a 1% chance of reaching Jupiter, which has about 300 times the mass-energy of earth, and none of those pesky humans that keep trying to eat your resources. Challenge one such pesky human to a duel.

Make these bets over and over again and your chance of total failure (i.e. death) approaches 100%. When convex agents appear in real life, they do this, and very quickly die. For these agents, that is all part of the plan. Their death is worth it for a fraction of a percent chance of getting a ton of resources.

But we, as concave agents, don’t really care. We might as well be in completely logically disconnected worlds. Convex agents feel the same about us, since most of their utility is concentrated on those tiny-probability worlds where a bunch of their bets pay off in a row (for most value functions, that means we die). And they feel even more strongly about each other.

This serves as a selection argument for why agents we see in real life (including ourselves) tend to be concave (with some notable exceptions). The convex ones take a bunch of double-or-nothing bets in a row, and, in almost all worlds, eventually land on “nothing”.

Morphism 24 Dec 2024 11:32 UTC
5 points
−2
on: Pi Rogers’s Shortform

If you’re thinking without writing, you only think you’re thinking.

-Leslie Lamport

This seems..… straightforwardly false. People think in various different modalities. Translating that modality into words is not always trivial. Even if by “writing”, Lamport means any form of recording thoughts, this still seems false. Often times, an idea incubates in my head for months before I find a good way to represent it as words or math or pictures or anything else.

Also, writing and thinking are separate (albiet closely related) skills, especially when you take “writing” to mean writing for an audience, so the thesis of this Paul Graham post is also false. I’ve been thinking reasonably well for about 16 years, and only recently have I started gaining much of an ability to write.

Are Lamport and Graham just wordcels making a typical mind fallacy, or is there more to this that I’m not seeing? What’s the steelman of this claim that good thinking == good writing?

Morphism 14 Dec 2024 11:22 UTC
0 points
−2
in reply to: kave’s comment on: carado’s Shortform
If you want to get huge profits to solve alignment, and are smart/capable enough to start a successful big AI lab, you are probably also smart/capable enough to do some other thing that makes you a lot of money without the side effect of increasing P(doom).

Morphism 11 Dec 2024 11:53 UTC
4 points
3
in reply to: Tamsin Leake’s comment on: carado’s Shortform
Moral Maze dynamics push corporations not just to pursue profit at all other costs, but also to be extremely myopic. As long as the death doesn’t happen before the end of the quarter, the big labs, being immoral mazes, have no reason to give a shit about x-risk. Of course, every individual member of a big lab has reason to care, but the organization as an egregore does not (and so there is strong selection pressure for these organizations to have people that have low P(doom) and/or don’t (think they) value the future lives of themselves and others).

Morphism 11 Dec 2024 11:41 UTC
11 points
2
on: Pi Rogers’s Shortform
Contrary to what the current wiki page says, Simulacrum levels 3 and 4 are not just about ingroup signalling. See these posts and more, as well as Beaudrillard’s original work if you’re willing to read dense philosophy.

Here is an example where levels 3 and 4 don’t relate to ingroups at all, which I think may be more illuminating than the classic “lion across the river” example:

Alice asks “Does this dress makes me look fat?” Bob says “No.”

Depending on the simulacrum level of Bob’s reply, he means:
1. “I believe that the dress does not make you look fat.”
2. “I want you to believe that the dress does not make you look fat, probably because I want you to feel good about yourself.”
3. “Niether you nor I are autistic truth-obsessed rationalists, and therefore I recognize that you did not ask me this question out of curiosity as to whether or not the dress makes you look fat. Instead, due to frequent use of simulacrum level 2 to respond to these sorts of queries in the past, a new social equilibrium has formed where this question and its answer are detached from object-level truth, instead serving as a signal that I care about your feelings. I do care about your feelings, so I play my part in the signalling ritual and answer ‘No.’”
4. “Similar to 3, except I’m a sociopath and don’t necessarily actually care about your feelings. Instead, I answer ‘No’ because I want you to believe that I care about your feelings.”
Here are some potentially better definitions, of which the group association definitions are a clear special case:
1. Communication of object-level truth.
2. Optimization over the listener’s belief that the speaker is communicating on simulacrum level 1, i.e. desire to make the listener believe what the listener says.
These are the standard old definitions. The transition from 1 to 2 is pretty straightforward. When I use 2, I want you to believe I’m using 1. This is not necessarily lying. It is more like Frankfurt’s bullshit. I care about the effects of this belief on the listener, regardless of its underlying truth value. This is often (naively considered) prosocial, see this post for some examples.

Now, the transition from 2 to 3 is a bit tricky. Level 3 is a result of a social equilibrium that emerges after communication in that domain gets flooded by prosocial level 2. Eventually, everyone learns that these statements are not about object-level reality, so communication on levels 1 and 2 become futile. Instead, we have:
1. Signalling of some trait or bid associated with historical use of simulacrum level 2.
E.g. that Alice cares about Bob’s feelings, in the case of the dress, or that I’m with the cool kids that don’t cross the river, in the case of the lion. Another example: bids to hunt stag.

3 to 4 is analogous to 1 to 2.
1. Optimization over the listener’s belief that the speaker is comminicating on simulacrum level 3, i.e. desire to make the listener believe that the speaker has the trait signalled by simulacrum level 3 communication (i.e. the trait that was historically associated with prosocial level 2 communication).
Like with the jump from 1 to 2, the jump from 3 to 4 has the quality of bullshit, not necessarily lies. Speaker intent matters here.

Morphism

Newcomblike suffering

Convex agents are practically invisible.