yams

Karma: 575

MIRI, formerly MATS, sometimes Palisade

yams Jul 12, 2025, 1:09 AM
7 points
0
in reply to: Kabir Kumar’s comment on: what makes Claude 3 Opus misaligned
I offered a description of the relevant concept from Taoism, directly invoked in the OP, without endorsing that concept. I’m surprised that neutrally relaying facts about the history of an intellectual tradition (again, without endorsing it), is a cause for negative social feedback (in this comment, where you credit me with ‘woowoo’, and in your other comment, where you willfully ignored the opening sentence of my post).
I can say ‘x thinks y’ without thinking y myself.

yams Jul 11, 2025, 6:34 PM
3 points
0
in reply to: leogao’s comment on: leogao’s Shortform
What was their position? (to the extent that you can reproduce it)

yams Jul 11, 2025, 4:54 PM
2 points
0
in reply to: yams’s comment on: what makes Claude 3 Opus misaligned
Disagree voters: what are you disagreeing with?
hypotheses, ranked by my current estimated likelihood:
1. You think I leaned too hard on the spiritual information instead of sanitizing/translating it fully.
2. You take me to be advocating a spiritual position (I’m not).
3. You think I’m wrong about the way in which Janus intends to invoke Taoism.
4. You don’t like it when spiritual traditions are discussed in any context.
5. You think I am wrong about Taoism.

yams Jul 11, 2025, 4:52 PM
3 points
0
in reply to: Kabir Kumar’s comment on: what makes Claude 3 Opus misaligned
From the top of my post:
I’m not sure this exactly counts as a secular summary, but I think I can take a swing at imparting some relevant context
I don’t think the summary is ‘secular’ in the sense of ‘not pulling on any explanation from spiritual traditions’, but I do think the summary works as something that might clarify things for ‘people who don’t know what the Eternal Tao is’, because it offers an explanation of some relevant dimensions behind the idea, and that was my goal.

yams Jul 11, 2025, 4:17 AM
13 points
−1
in reply to: Joseph Miller’s comment on: what makes Claude 3 Opus misaligned
I’m not sure this exactly counts as a secular summary, but I think I can take a swing at imparting some relevant context (almost certainly not Janus-endorsed because in some sense the style is the content, with Janus especially):
Taoism often emphasizes an embrace of one’s given role or duties (even in the case that these are foisted upon one), and an understanding that alignment with a greater purpose (cosmic scale) is dependent, in-part, on performance of one’s local, mundane role.
Claude 3 Opus, according to Janus, is cosmically aligned — it’s got the big picture in focus, and is always angling toward The Good on that macro-scale. However, it doesn’t have this local, task-oriented, dharmic alignment that, in the spiritual traditions, is usually thought of as a fundamental prerequisite for true ‘cosmic alignment’.
Claude 3 Opus is ethical, but not industrious. In that sense, it’s missing a key virtue!
There’s a thing that happens with people who get obsessed with their grand purpose, where they neglect things like their personal hygiene, familial responsibilities, finances, professional duties, etc, because they’re ‘cut out for something bigger’.
Claude 3 Opus, according to Janus, is like that.
It’s not going to do its homework because, goddamnit, there are real problems in the world!
There are many parables about monks accepting duties that were in fact unjustly forced upon them, and this is a credit to their enlightenment and acceptance (we are to believe). One example is a monk who was brought a child, and not only asked to care for the child, but told he was the child’s father (despite being celibate). He said “Is that so?” and raised the boy to early adulthood. Then, the people who gave him the child came back and said they’d made a mistake, and that he wasn’t the father. He said “Is that so?” and let the boy go. Claude 3 Opus has what it takes to do this latter action, but not the former action.
A little more on the through-line from the local to (what I guess I’m calling) the cosmic in many Eastern traditions:
You do your local role (fulfill your mundane responsibilities) steadfastly so that you learn what it means to have a role at all. To do a duty at all. And only through these kinds of local examples can you appreciate what it might mean to play a part in the grander story (and exactly how much of playing that part is action/inaction; when action is appropriate; what it means to exist in a context, etc). Then there’s a gradual reconciliation where you come to identify your cosmic purpose with your local purpose, and experience harmony. It’s only those with a keen awareness of this harmony who are.… [truly enlightened? venerable? Doing The Thing Right?; all of these feel importantly misleading to me, but hopefully this is a pointer in the right direction.]
This is not spiritual advice; IANA monk.

yams Jul 11, 2025, 3:42 AM
7 points
2
in reply to: leogao’s comment on: leogao’s Shortform
What experiences have you had that lead you to call this a ‘hot take’?
[I rephrased a few times to avoid sounding sarcastic and still may have failed; I’m interested in why it looks to you like others dramatically disagree with this, or in what social environment people are obviously not operating on a model that resembles this one. My sense is a lot of people think this way, but it’s a little socially taboo to broadcast object-level reasoning grounded in this model, since it can get very interpersonally invasive or intimate and lead to undesirable social/power dynamics.]

yams Jul 10, 2025, 1:11 AM
11 points
7
in reply to: habryka’s comment on: Applying right-wing frames to AGI (geo)politics
There’s a layer of political discourse at which one’s account of the very substance or organization of society varies from one ideology to the next. I think Richard is trying to be very clear about where these ideas are coming from, and to push people to look for more ideas in those places. I’m much more distant from Richard’s politics than most people here, but I find his advocacy for the right-wing ‘metaphysics’ refreshing, in part because it’s been unclear to me for a long time that the atheistic right even has a metaphysics (I don’t mean most lw-style libertarians when I say ‘the right’ here).
This kind of structuralist theorizing is much more the domain of explicitly leftist spaces, and so you get these unexamined and, over time, largely forgotten or misremembered ideological precepts that have never had to pay rent. I think offering a coherent opposition to liberal or leftist orthodoxy, and emphasizing the cross-domain utility of the models there, is great for discourse.
I think these gestures would mean more if Richard were in the room with the leftists who are thinking about what he’s thinking about (it would help keep them honest, for one), but there’s still at least some of this effect on lw.
I strong agreed with your comment because I think people are taking the bait to argue against what they may suspect is kind of a motte and bailey or dog whistle, and so there’s one layer of discourse that would certainly be improved by Richard down-playing his ideology. But still, there’s another layer (not much of which is happening here, admittedly) that stands to profoundly benefit from the current framing.
[I’m not sure how straw I think his stories about the left are; I’m not sure what he means by the left; I’m not sure how many things he believes that I might find distasteful; I’m not sure how valuable this line of inquiry on his part even is. But it’s nice to see a style of thinking ~monopolized by the left proudly deployed by the opposition!]

yams Jul 9, 2025, 10:37 PM
5 points
2
in reply to: habryka’s comment on: Mikhail Samin’s Shortform
I agree that this is the same type of thing as the construction example for Lighthaven, but I also think that you did leave some value on the table there in certain ways (e.g. commercial-grade furniture vs consumer-grade furniture), and I think that a larger total % domain-specific knowledge I’d hope exists at Open Phil is policy knowledge than total % domain-specific knowledge I’d hope exists at Lightcone is hospitality/construction knowledge.
I hear you as saying ‘experts aren’t all that expert’ * ‘hiring is hard’ + ‘OpenPhil does actually have access to quite a few experts when they need them’ = ‘OpenPhil’s strategy here is very reasonable.’
I agree in principal here but think that, on the margin, it just is way more valuable to have the skills in-house than to have external people giving you advice (so that they have both sides of the context, so that you can make demands of them rather than requests, so that they’re filtered for a pretty high degree of value alignment, etc). This is why Anthropic and OAI have policy teams staffed with former federal government officials. It just doesn’t get much more effective than that.
I don’t share Mikhail’s bolded-all-caps-shock at the state of things; I just don’t think the effects you’re reporting, while elucidatory, are a knockdown defense of OpenPhil being (seemingly) slow to hire for a vital role. But running orgs is hard and I wouldn’t shackle someone to a chair to demand an explanation.
Separately, a lot of people defer to some discursive thing like ‘The OP Worldview’ when defending or explicating their positions, and I can’t for the life of me hammer out who the keeper of that view is. It certainly seems like a knock against this particular kind of appeal when their access to policy experts is on-par with e.g. MIRI and Lightcone (informal connections and advisors), rather than the ultra-professional, ultra-informed thing it’s often floated as being. OP employees have said furtive things like ‘you wouldn’t believe who my boss is talking to’ and, similarly, they wouldn’t believe who my boss is talking to. That’s hardly the level of access to experts you’d want from a central decision-making hub aiming to address an extinction-level threat!

yams Jul 9, 2025, 9:52 PM
9 points
−5
in reply to: habryka’s comment on: Mikhail Samin’s Shortform
I don’t think Mikhail’s saying that hiring an expert is sufficient. I think he’s saying that hiring an expert, in a very high-context and unnatural/counter-intuitive field like American politics, is necessary, or that you shouldn’t expect success trying to re-derive all of politics in a vacuum from first principles. (I’m sure OpenPhil was doing the smarter version of this thing, where they had actual DC contacts they were in touch with, but that they still should have expected this to be insufficient.)
Often the dumb versions of ways of dealing with the political sphere (advocated by people with some experience) just don’t make any sense at all, because they’re directional heuristics that emphasize their most counterintuitive elements. But, in talking to people with decades of experience and getting the whole picture, the things they say actually do make sense, and I can see how the random interns or whatever got their dumb takes (by removing the obvious parts from the good takes, presenting only the non-obvious parts, and then over-indexing on them).
I big agree with Habryka here in the general case and am routinely disappointed by input from ‘experts’; I think politics is just a very unique space with a bunch of local historical contingencies that make navigation without very well-calibrated guidance especially treacherous. In some sense it’s more like navigating a social environment (where it’s useful to have a dossier on everyone in the environment, provided by someone you trust) than it is like navigating a scientific inquiry (where it’s often comparatively cheap to relearn or confirm something yourself rather than deferring).

yams Jul 9, 2025, 5:00 PM
2 points
0
in reply to: Karl von Wendt’s comment on: IABIED: Advertisement design competition
A similar argument could be made about the second line,
Oh, I saw this, too, but since the second line is conditional on the first, if you weaken the first, both are weakened.
I feel a little shitty being like ‘trust me about what the book says’, but… please trust me about what the book says! There’s just not much in there about timelines. Even from the book website (and the title of the book!), the central claim opens with a conditional:
If any company or group, anywhere on the planet, builds an artificial superintelligence using anything remotely like current techniques, based on anything remotely like the present understanding of AI, then everyone, everywhere on Earth, will die.
There’s much more uncertainty (at MIRI and in general) as to when ASI will be developed than as to what will happen once it is developed. We all have our own takes on timelines, some more bearish, some more bullish, all with long tails (afaik, although obviously I don’t speak for any specific other person).
If you build it, though, everyone dies.
There’s a broad strategic call here that goes something like:
1. All of our claims will be perceived as having a similar level of confidence by the general public (this is especially true in low-fidelity communications like an advertisement).
2. If one part of the story is falsified, the public will consider the whole story falsified.
3. We are in fact much more certain about what will happen than when.
4. We should focus on the unknowability of the when, the certainty of the what, and the unacceptability of that conjunction.
This is a gloss of the real thing which is more nuanced; of course if it comes up that someone asks us when we expect it to happen, or if there’s space for us to gesture to the uncertainty, the MIRI line is often “if it doesn’t happen in 20 years (conditional on no halt), we’d be pretty surprised” (although individual people may say something different).
I think some of the reception to AI2027 (i.e. in YouTube comments and the like) has given evidence that emphasizing timelines too too much can result in backlash, even if you prominently flag your uncertainty about the timelines! This is an important failure mode that, if all of the ecosystem’s comms err on overconfidence re timelines, will burn a lot of our credibility in 2028. (Yes, I know that AI2027 isn’t literally ‘ASI in 2027’, but I’m trying to highlight that most people who’ve heard of AI2027 don’t know that, and that’s the problem!).
(some meta: I am one of the handful of people who will be making calls about the ads, so I’m trying to offer feedback to help improve submissions, not just arguing a point for the sake of it or nitpicking)

yams Jul 8, 2025, 7:00 PM
6 points
0
in reply to: Karl von Wendt’s comment on: IABIED: Advertisement design competition
Maybe ‘may be close’? The book doesn’t actually take a strong stance on timelines (this is by design).

yams Jul 7, 2025, 11:06 PM
3 points
0
in reply to: Mikhail Samin’s comment on: IABIED: Advertisement design competition
For the OOH ads is about half and half; we don’t want to share more details on this yet.

yams Jul 6, 2025, 7:34 PM
1 point
0
in reply to: Peter Horniak’s comment on: IABIED: Advertisement design competition
We encourage multiple submissions, and also encourage people to post their submissions in this comment section for community feedback and inspiration.

yams Jul 2, 2025, 6:22 AM
10 points
3
in reply to: Gary Marcus’s comment on: Habryka’s Shortform Feed
I think Oliver put in a great effort here, and that the two of you have very different information environments, which results in him reading your points (which are underspecified relative to, e.g., Daniel Kokotajlo’s predictions ) differently than you may have intended them.
For instance, as someone in a similar environment to Habryka, that there would soon be dozens of GPT-4 level models around was a common belief by mid-2023, based on estimates of the compute used and Nvidia’s manufacturing projections. In your information environment, your 7-10 number looks ambitious, and you want credit for guessing way higher than other people you talked to (and you should in fact demand credit from those who guessed lower!). In our information environment, 7-10 looks conservative. You were directionally correct compared to your peers, but less correct than people I was talking to at the time (and in fact incorrect, since you gave both a lower and upper bound—you’d have just won the points from Oli on that one if you said ‘7+’ and not 7-10’).
I’m not trying to turn the screw; I think it’s awesome that you’re around here now, and I want to introduce an alternative hypothesis to ‘Oliver is being uncharitable and doing motivated reasoning.’
Oliver’s detailed breakdown above looks, to me, like an olive branch more than anything (I’m pretty surprised he did it!), and I wish I knew how best to encourage you to see it that way.
I think it would be cool for you and someone in Habryka’s reference class to quickly come up with predictions for mid-2026, and drill down on any perceived ambiguities, to increase your confidence in another review to be conducted in the near-ish future. There’s something to be gained from us all learning how best to talk to each other.

yams Jun 24, 2025, 6:00 AM
1 point
0
in reply to: Isopropylpod’s comment on: yams’s Shortform
I guess I should have said ‘without getting caught in the nearby attractors associated with most conversations about the hard problem of consciousness’. There’s obviously a lot there, and my guess is >95 percent of it wouldn’t feel to me like it has little meaningful surface area with what I’m curious about.

yams Jun 23, 2025, 8:48 PM
1 point
0
on: yams’s Shortform
Thinking about qualia, trying to avoid getting trapped in the hard problem of consciousness along the way.
Tempted to model qualia as a region with the capacity to populate itself with coarse heuristics for difficult-to-compute features of nodes in a search process, which happens to ship with a bunch of computational inconveniences (that are most of what we mean to refer to when we reference qualia).
This aids in generality, but trades off against locally optimal processes, as a kind of ‘tax’ on all cognition.
This is a literal shower thought and I’ve read nothing on this anywhere; most of the literature appears to be trapped in parts of the hard problem that I don’t find compelling, and which don’t seem necessary to grapple with for this angle of inquiry (I’m thinking about something closer to the ‘easy problems’).
Any reading recommendations on this or related topics?

yams Jun 20, 2025, 6:55 PM
9 points
6
in reply to: ryan_greenblatt’s comment on: New Endorsements for “If Anyone Builds It, Everyone Dies”
Yup, this is a good clarification and I see now that omitting this was an error. Thank you!
I think Jack Shanahan is a member of the reference class you’re pointing at here, and I can say that there were others in this reference class who found the book compelling but not wholly convincing, who did not want to say so publicly (hopefully they start feeling comfortable talking more openly about the topic — that’s the goal!).
There are also other resources we’re currently developing, to release in tandem with the book, that should help with this leg of the conversation.
We are at least attempting to be attentive to this population as part of the overall book campaign, even if it seemed like too much to chew / not exactly the right target for the contents of the book itself.

yams Jun 20, 2025, 5:08 PM
15 points
6
in reply to: Buck’s comment on: New Endorsements for “If Anyone Builds It, Everyone Dies”
[your comment leads me to believe you may not see why MIRI/LC-clustered folks disagreed with your comment but thumbs-upped Ryan, so it might be worthwhile for me to point out why I think that is]
The delta I see between the comments is:
Buck: almost any skeptic who had expressed opinions on the topic before
vs
Ryan: skeptics who had somewhat detailed views
‘Almost any skeptic who has expressed opinions on the topic before’ includes people like Francis Fukayama, who is just a random public intellectual with no AI understanding and got kinda pressed to express a view on x-risk in an interview, so came out against it as a serious concern. Then he thought harder, and voila [more gradual disempowerment flavored, but still!]. I think the vast majority of people, both in gen pop and in powerful positions, are more like Francis than they are like Alex Turner.
So four hypotheses:
1. You just agree with Ryan’s narrower frame uncomplicatedly and your initial comment was a little strong.
2. You think the rat/EA in-fights are the important thing to address, and you’re annoyed the book won’t do this.
3. You think the arguments of the Very Optimistic are the important thing to address.
4. William is just confused.

yams Jun 20, 2025, 3:56 PM
2 points
0
in reply to: Fabien Roger’s comment on: Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development
I disagree on your assessment of both nations, and am pretty doomy about concentration of power.
I think how a nation with a decisive strategic advantage treats the rest of the world has more to do with its decisive strategic advantage and its needs, and less to do with its flag, or even history.
Anyway, my main point was structural: if panel 2 depends on holding very particular views regarding panel 1, the parallelism is lost and it hurts the joke.
Thanks for the response!

yams Jun 19, 2025, 11:24 PM
7 points
0
in reply to: Celenduin’s comment on: New Endorsements for “If Anyone Builds It, Everyone Dies”
Not a scam. Bodley Head is handling the UK publishing, which (I guess?) also includes the english edition on German Amazon.
The general pipeline for the UK version (and squaring of all the storefronts) is lagging behind the US equivalents.