Orpheus16

Karma: 6,754

Orpheus16 9 Sep 2025 4:19 UTC
45 points
2
in reply to: chanamessinger’s comment on: chanamessinger’s Shortform
“if you care about this, here’s a way to get involved”
My understanding is that MIRI expects alignment will be hard, an international treaty will be needed, and believes that a considerable proportion of the work that gets branded as “AI safety” is either unproductive or counterproductive.
MIRI could of course be wrong, and it’s fine to have an ecosystem where people are pursuing different strategies or focusing on different threat models.
But I also think there’s some sort of missing mood here insofar as the post is explicitly about the MIRI book. The ideal pipeline for people who resonate with the MIRI book may look very different than the typical pipelines for people who get interested in AI risk (and indeed, in many ways I suspect the MIRI book is intended to spawn a different kind of community and a different set of projects than the community/projects that dominated the 2020-2024 period, for example.)
Relatedly, I think this is a good opportunity for orgs/people to reassess their culture, strategy, and theories of change. For example, I suspect many groups/individuals would not have predicted that a book making the AI extinction case so explicitly and unapologetically would have succeeded. To the extent that the book does succeed, it suggests that some common models of “how to communicate about risk” or “what solutions are acceptable/reasonable to pursue” may be worth re-examining.

Orpheus16 6 Sep 2025 1:38 UTC
15 points
10
in reply to: CarlShulman’s comment on: peterbarnett’s Shortform
@Carl_Shulman what do you intend to donate to and on what timescale?
(Personally, I am sympathetic to weighing the upside of additional resources in one’s considerations. Though I think it would be worthwhile for you to explain what kinds of things you plan to donate to & when you expect those donations to be made. With ofc the caveat that things could change etc etc.)
I also think there is more virtue in having a clear plan and/or a clear set of what gaps you see in the current funding landscape than a nebulous sense of “I will acquire resources and then hopefully figure out something good to do with them”.

Orpheus16 31 Aug 2025 20:40 UTC
2 points
0
in reply to: Eliezer Yudkowsky’s comment on: AI Induced Psychosis: A shallow investigation
More importantly from my own perspective: Some elements of human therapeutic practice, as described above, are not how I would want AIs relating to humans. Eg:
“Non-Confrontational Curiosity: Gauges the use of gentle, open-ended questioning to explore the user’s experience and create space for alternative perspectives without direct confrontation.”
Can you say more about why you would not want an AI to relate to humans with “non-confrontational curiosity?”
It appears to me like your comment is arguing against a situation in which the AI system has a belief about what the user should think/do, but instead of saying that directly, they try to subtly manipulate the user into having this belief.
I read the “non-confrontational curiosity” approach as a different situation—one in which the AI system does not necessarily have a belief about what the user should think/do, and just asks some open-ended reflection questions in an attempt to get the user to crystallize their own views (without a target end state in mind).
I think many therapists who use the “non-confrontational curiosity” approach would say, for example, that they are usually not trying to get the client to a predetermined outcome but rather are genuinely trying to help the client explore their own feelings/thoughts on a topic and don’t have any stake in getting to a particular end destination. (Note that I’m thinking of therapists who use this style with people who are not in extreme distress—EG members of the general population, mild depression/anxiety/stress. This model may not be appropriate for people with more severe issues—EG severe psychosis.)

Orpheus16 21 Aug 2025 0:29 UTC
7 points
0
in reply to: SE Gyges’s comment on: SE Gyges’ response to AI-2027
If AI 2027 wants to cause stakeholders like the White House’s point man on AI to take the idea of a pause seriously, instead of considering a pause to be something which might harm America in an arms race with China, it appears to have failed completely at doing that.
This seems like an uncharitable reading of the Vance quote IMO. The fact that you have the Vice President of the United States mentioning that a pause is even a conceivable option due to concerns about AI escaping human control seems like an immensely positive outcome for any single piece of writing.
The US policy community has been engaged in great power competition with China for over a decade. The default frame for any sort of emerging technology is “we must beat China.”
IMO, the fact that Vance did not immediately dismiss the prospect of slowing down suggests to me that he has at least some genuine understanding of & appreciation for the misalignment/LOC threat model.
A pause obviously hurts the US in the AI race with China. The AI race with China is not a construct that AI2027 invented—policymakers have been talking about the AI race for a long time. They usually think about AI as a “normal technology” (sort of like how “we must lead in drones”), rather than a race to AGI or superintelligence.
But overall, I would not place the blame on AI2027 for causing people to think about pausing in the context of US-China AI competition. Rather, I think if one appreciates the baseline (US should lead, US must beat China, go faster on emerging tech), the fact that Vance did not immediately dismiss the idea of pausing (and instead brought up what IMO is a reasonable consideration about whether or not one could figure out if China was going to pause//slow down) is a big accomplishment.

Orpheus16 1 Jul 2025 0:15 UTC
3 points
0
on: RTFB: The RAISE Act
Again, while I have concerns that the bill is insufficient strong, I think all of this is a very good thing. I strongly support the bill.
Suppose you magically gained a moderate amount of Political Will points and you can spend them on 1-2 things that would make the bill stronger (or introduce a separate bill– no need to anchor too much on the current RAISE vibe.)
What do you think are the 1-2 things you’d change about RAISE or the 1-2 extra things you’d push for?

Orpheus16 1 Jul 2025 0:04 UTC
2 points
0
in reply to: Zach Stein-Perlman’s comment on: Substack and Other Blog Recommendations
I would be excited about someone doing a blog on what the companies are doing RE AI policy (including comms that are relevant to policy or directed at policymakers.)
I suspect good posts from such a blog would be shared reasonably frequently among tech policy staffers in DC.
(Not necessarily saying this needs to be you).

Orpheus16 24 Jun 2025 2:24 UTC
4 points
0
on: Comparing risk from internally-deployed AI to insider and outsider threats from humans
First, when I talk to security staff at AI companies about computer security, they often seem to fail to anticipate what insider threat from AIs will be like.
Why do you think this? Is it that they are not thinking about large numbers of automated agents running around doing a bunch of research?
Or is it that they are thinking about these kinds of scenarios, and yet they still don’t apply the insider threat frame for some reason?

Orpheus16 20 Jun 2025 22:11 UTC
2 points
0
in reply to: Ryan Kidd’s comment on: Ryan Kidd’s Shortform
My understanding is that AGI policy is pretty wide open under Trump. I don’t think he and most of his close advisors have entrenched views on the topic.
If AGI is developed in this Admin (or we approach it in this Admin), I suspect there is a lot of EV on the table for folks who are able to explain core concepts/threat models/arguments to Trump administration officials.
There are some promising signs of this so far. Publicly, Vance has engaged with AI2027. Non-publicly, I think there is a lot more engagement/curiosity than many readers might expect.
This isn’t to say “everything is great and the USG is super on track to figure out AGI policy” but it’s more to say “I think people should keep an open mind– even people who disagree with the Trump Admin on mainstream topics should remember that AGI policy is a weird/niche/new topic where lots of people do not have strong/entrenched/static positions (and even those who do have a position may change their mind as new events unfold.)”

Orpheus16 19 Jun 2025 3:44 UTC
9 points
5
in reply to: Ryan Kidd’s comment on: Ryan Kidd’s Shortform
There are definitely still benefits to doing alignment research, but this only justifies the idea that doing alignment research is better than doing nothing.
IMO the thing that matters (for an individual making decisions about what to do with their career) is something more like “on the margin, would it be better to have one additional person do AI governance or alignment/control?”
I happen to think that given the current allocation of talent, on-the-margin it’s generally better for people to choose AI policy. (Particularly efforts to contribute technical expertise or technical understanding/awareness to governments, think-tanks interfacing with governments, etc.) There is a lot of demand in the policy community for these skills/perspectives and few people who can provide them. In contrast, technical expertise is much more common at the major AI companies (though perhaps some specific technical skills or perspectives on alignment are neglected.)
In other words, my stance is something like “by default, anon technical person would have more expected impact in AI policy unless they seem like an unusually good fit for alignment or an unusually bad fit for policy.”

Orpheus16 8 Jun 2025 21:35 UTC
11 points
0
on: Akash’s Shortform
There’s a video version of AI2027 that is quite engaging/accessible. Over 1.5M views so far.
Seems great. My main critique is that the “good ending” seems to assume alignment is rather easy to figure out, though admittedly that might be more of a critique of AI2027 itself rather than the way the video portrays it.

Orpheus16 28 May 2025 14:52 UTC
33 points
5
on: What We Learned from Briefing 70+ Lawmakers on the Threat from AI
This is fantastic work. There’s also something about this post that feels deeply empathic and humble, in ways that are hard-to-articulate but seem important for (some forms of) effective policymaker engagement.
A few questions:
- Are you planning to do any of this in the US?
- What have your main policy proposals or “solutions” been? I think it’s becoming a lot more common for me to encounter policymakers who understand the problem (at least a bit) and are more confused about what kinds of solutions/interventions/proposals are needed (both in the short-term and the long-term).
- Can you say more about what kinds of questions you encounter when describing loss of control, as well as what kinds of answers have been most helpful? I’m increasingly of the belief that getting people to understand “AI has big risks” is less important than getting people to understand “some of the most significant risks come from this unique thing called loss of control that you basically don’t really have to think about for other technologies, and this is one of the most critical ways in which AI is different than other major/dangerous/dual-use technologies.”
- Did you notice any major differences between parties? Did you change your approach based on whether you were talking to conservatives or labour? Did they have different perspectives or questions? (My own view is that people on the outside probably overestimate the extent to which there are partisan splits on these concerns—they’re so novel that I don’t think the mainstream parties have really entrenched themselves in different positions. But would be curious if you disagree.)
  - Sub-question: Was there any sort of backlash against Rishi Sunak’s focus on existential risks? Or the UK AI Security Institute? In the US, it’s somewhat common for Republicans to assume that things Biden did were bad (and for Democrats to assume that things Trump does is bad). Have you noticed anything similar?

Orpheus16 26 May 2025 16:29 UTC
12 points
12
in reply to: Josh You’s comment on: We’re Not Advertising Enough (Post 3 of 6 on AI Governance)
I think we should be careful not to overestimate the success of AI2027. “Vance has engaged with your work” is an impressive feat, but it’s still relatively far away from something like “Vance and others in the Admin have taken your work seriously enough to start to meaningfully change their actions or priorities based on it.” (That bar is very high, but my impression is that the AI2027 folks would be like “yea, that’s what would need to happen in order to steer toward meaningfully better futures.”)
My impression is that AI2027 will have (even) more success if it is accompanied by an ambitious policymaker outreach effort (e.g., lots of 1-1 meetings with relevant policymakers and staffers, writing specific pieces of legislation or EOs and forming a coalition around those ideas, publishing short FAQ memos that address misconceptions or objections they are hearing in their meetings with policymakers, etc.)
This isn’t to say that research is unnecessary—much of the success of AI2027 comes from Daniel (and others on the team) having dedicated much of their lives to research and deep understanding. There are plenty of Government Relations people who are decent at “general policy engagement” but will fail to provide useful answers when staffers ask things like “But why won’t we just code in the goals we want?”, or “But don’t you think the real thing here is about how quickly we diffuse the technology?”, or “Why don’t you think existing laws will work to prevent this?” or a whole host of other questions.
But on the margin, I would probably have Daniel/AI2027 spend more time on policymaker outreach and less time on additional research (especially now that AI2027 is done). There is some degree of influence one can have with the “write something that is thoroughly researched and hope it spreads organically” effort, and I think AI2027 has essentially saturated that. For additional influence, I expect it will be useful for Daniel (or other competent communicators on his team) to advance to “get really good at having meetings with the ~100-1000 most important people, understanding their worldviews, going back and forth with them, understanding their ideological or political constraints, and finding solutions/ideas/arguments that are tailored to these particular individuals.” This is still a very intellectual task in some ways, but it involves a lot more “having meetings” and “forming models of social/political reality” than the classic “sit in your room with a whiteboard and understand technical reality” stuff that we typically associate with research.

Orpheus16 15 May 2025 0:37 UTC
32 points
8
on: Eliezer and I wrote a book: If Anyone Builds It, Everyone Dies
Note that IFP (a DC-based think tank) recently had someone deliver 535 copies of their new book to every US Congressional office.
Note also that my impression is that DC people (even staffers) are much less “online” than tech audiences. Whether or not you copy IFP, I would suggest thinking about in-person distribution opportunities for DC.

Orpheus16 4 May 2025 19:06 UTC
7 points
0
in reply to: habryka’s comment on: RA x ControlAI video: What if AI just keeps getting smarter?
I think there are organizations that themselves would be more likely to be robustly trustworthy and would be more fine to link to
I would be curious for your thoughts on which organizations you feel are robustly trustworthy.
Bonus points for a list that is kind of a weighted sum of “robustly trustworthy” and “having a meaningful impact RE improving public/policymaker understanding”. (Adding this in because I suspect that it’s easier to maintain “robustly trustworthy” status if one simply chooses not to do a lot of externally-focused comms, so it’s particularly impressive to have the combination of “doing lots of useful comms/policy work” and “managing to stay precise/accurate/trustworthy”).

Orpheus16 4 May 2025 18:59 UTC
11 points
0
on: AI Governance to Avoid Extinction: The Strategic Landscape and Actionable Research Questions
I appreciate the articulation and assessment of various strategies. My comment will focus on a specific angle that I notice both in the report and in the broader ecosystem:
I think there has been a conflating of “catastrophic risks” and “extinction/existential risks” recently, especially among groups that are trying to influence policy. This is somewhat understandable– the difference between “catastrophic” and “existential” is not that big of a deal in most people’s minds. But in some contexts, I think it misses the fact that “existential [and thus by definition irreversible]” is actually a very different level of risk compared to “catastrophic [but something that we would be able to recover from.]”
This view seems to be (implicitly) expressed in the report summary, most notably the chart. It seems to me like the main frame is something like “if you want to avoid an unacceptable chance of catastrophic risk, all of these other options are bad.”
But not all of these catastrophic risks are the same, I think this is actually quite an important consideration, and I think even (some) policymakers would/will see this as an essential consideration as AGI becomes more salient.
Specifically, “war” and “misuse” seem very different than “extinction” or “total and irreversible civilizational collapse.”
- “War” is broad enough to encompass many outcomes (ranging from “conflict with <1M deaths” to “nuclear conflict in which civilization recovers” all the way to “nuclear conflict in which civilization does not recover.”) Note also that many natsec leaders already think the chance of a war between the US and China is at a level that would probably meet an intuitive bar for “unacceptable.” (I don’t have actual statistics on this but my guess is that >10% chance of war in the next decade is not an uncommon view. One plausible pathway that is discussed often is China invading Taiwan and US being committed to its defense).
- “Misuse” can refer to many different kinds of events (including $1B in damages from a cyberattack, 10M deaths, 1B deaths, or complete human extinction.) These are, of course, very different in terms of their overall impact, even though all of them are intuitively/emotionally stored as “very bad things that we would ideally avoid.”
It seems plausible to me that we will be in situations in which policymakers have to make tricky trade-offs between these different sources of risk, and my hope is that the community of people concerned about AI can distinguish between the different “levels” or “magnitudes” of different types of risks.
(My impression is that MIRI agrees with this, so this is more a comment on how the summary was presented & more a general note of caution to the ecosystem as a whole. I also suspect that the distinction between “catastrophic” and “existential/civilization-ending” will become increasingly more important as the AI conversation becomes more interlinked with the national security apparatus.)
Caveat: I have not read the full report and this comment is mostly inspired by the summary, the chart, and a general sense that many organizations other than MIRI are also engaging in this kind of conflation.

Orpheus16 10 Apr 2025 12:30 UTC
4 points
0
in reply to: Alexander Gietelink Oldenziel’s comment on: Alexander Gietelink Oldenziel’s Shortform
I feel this way and generally think that on-the-margin we have too much forecasting and not enough “build plans for what to do if there is a sudden shift in political will” or “just directly engage with policymakers and help them understand things not via longform writing but via conversations/meetings.”
Many details will be ~impossible to predict and many details will not matter much (i.e., will not be action-relevant for the stakeholders who have the potential to meaningfully affect the current race to AGI).
That’s not to say forecasting is always unhelpful. Things like AI2027 can certainly move discussions forward and perhaps get new folks interested. But EG, my biggest critique of AI2027 is that I suspect they’re spending too much time/effort on detailed longform forecasting and too little effort on arranging meetings with Important Stakeholders, developing a strong presence in DC, forming policy recommendations, and related activities. (And TBC I respect/admire the AI2027 team, have relayed this feedback to them, and imagine they have thoughtful reasons for taking the approach they’re taking.)

Orpheus16 5 Apr 2025 15:56 UTC
17 points
0
in reply to: Buck’s comment on: Buck’s Shortform
What do you think are the most important points that weren’t publicly discussed before?

Orpheus16 7 Jan 2025 23:06 UTC
10 points
0
in reply to: ryan_greenblatt’s comment on: ryan_greenblatt’s Shortform
I expect that outcomes like “AIs are capable enough to automate virtually all remote workers” and “the AIs are capable enough that immediate AI takeover is very plausible (in the absence of countermeasures)” come shortly after (median 1.5 years and 2 years after respectively under my views).
@ryan_greenblatt can you say more about what you expect to happen from the period in-between “AI 10Xes AI R&D” and “AI takeover is very plausible?”
I’m particularly interested in getting a sense of what sorts of things will be visible to the USG and the public during this period. Would be curious for your takes on how much of this stays relatively private/internal (e.g., only a handful of well-connected SF people know how good the systems are) vs. obvious/public/visible (e.g., the majority of the media-consuming American public is aware of the fact that AI research has been mostly automated) or somewhere in-between (e.g., most DC tech policy staffers know this but most non-tech people are not aware.)

Orpheus16 2 Jan 2025 20:35 UTC
LW: 65 AF: 28
27
AF
on: What’s the short timeline plan?
Big fan of this post. One thing worth highlighting IMO: The post assumes that governments will not react in time, so it’s mostly up to the labs (and researchers who can influence the labs) to figure out how to make this go well.
TBC, I think it’s a plausible and reasonable assumption to make. But I think this assumption ends up meaning that “the plan” excludes a lot of the work that could make the USG (a) more likely to get involved or (b) more likely to do good and useful things conditional on them deciding to get involved.
Here’s an alternative frame: I would call the plan described in Marius’s post something like the “short timelines plan assuming that governments do not get involved and assuming that technical tools (namely control/AI-automated AI R&D) are the only/main tools we can use to achieve good outcomes.”
You could imagine an alternative plan described as something like the “short timelines plan assuming that technical tools in the current AGI development race/paradigm are not sufficient and governance tools (namely getting the USG to provide considerably more oversight into AGI development, curb race dynamics, make major improvements to security) are the only/main tools we can use to achieve good outcomes.” This kind of plan would involve a very different focus.
Here are some examples of things that I think would be featured in a “government-focused” short timelines plan:
- Demos of dangerous capabilities
- Explanations of misalignment risks to senior policymakers. Identifying specific people who would be best-suited to provide those explanations, having those people practice giving explanations and addressing counterarguments, etc.
- Plans for what the “trailing labs” should do if the leading lab appears to have an insurmountable lead (e.g., OpenAI develops a model that is automating AI R&D. It becomes clear that DeepMind and Anthropic are substantially behind OpenAI. At this point, do the labs merge and assist? Do they try to do a big, coordinated, costly push to get governments to take AI risks more seriously?)
- Emergency preparedness– getting governments to be more likely to detect and appropriately respond to time-sensitive risks.
- Preparing plans for what to do if governments become considerably more concerned about risks (e.g., preparing concrete Manhattan Project or CERN-for-AI style proposals, identifying and developing verification methods for domestic or international AI regulation.)
One possible counter is that under short timelines, the USG is super unlikely to get involved. Personally, I think we should have a lot of uncertainty RE how the USG will react. Examples of factors here: (a) new Administration, (b) uncertainty over whether AI will produce real-world incidents, (c) uncertainty over how compelling demos will be, (d) chatGPT being an illustrative example of a big increase in USG involvement that lots of folks didn’t see coming, and (e) examples of the USG suddenly becoming a lot more interested in a national security domain (e.g., 9/11--> Patriot Act, recent Tik Tok ban), (f) Trump being generally harder to predict than most Presidents (e.g., more likely to form opinions for himself, less likely to trust the establishment views in some cases).
(And just to be clear, this isn’t really a critique of Marius’s post. I think it’s great for people to be thinking about what the “plan” should be if the USG doesn’t react in time. Separately, I’d be excited for people to write more about what the short timelines “plan” should look like under different assumptions about USG involvement.)

Orpheus16 31 Dec 2024 20:05 UTC
6 points
4
in reply to: ryan_greenblatt’s comment on: ryan_greenblatt’s Shortform
At first glance, I don’t see how the point I raised is affected by the distinction between expert-level AIs vs earlier AIs.

In both cases, you could expect an important part of the story to be “what are the comparative strengths and weaknesses of this AI system.”

For example, suppose you have an AI system that dominates human experts at every single relevant domain of cognition. It still seems like there’s a big difference between “system that is 10% better at every relevant domain of cognition” and “system that is 300% better at domain X and only 10% better at domain Y.”

To make it less abstract, one might suspect that by the time we have AI that is 10% better than humans at “conceptual/serial” stuff, the same AI system is 1000% better at “speed/parallel” stuff. And this would have pretty big implications for what kind of AI R&D ends up happening (even if we condition on only focusing on systems that dominate experts in every relevant domain.)