Researching donation opportunities. Previously: ailabwatch.org.
Zach Stein-Perlman
I wrote this in a conflict-theory way but the mistake-theory version might be equally important. Ruthlessness/malice aren’t necessary. Sometimes an agent doesn’t appreciate the costs here; improper credit assignment often naively makes sense. Sometimes a bureaucracy misallocates credit by default despite not even being agent-y enough to be ruthlessly goal-directed. And sometimes people just incorrectly opt not to take credit.
To be clear this is all epistemic status: a priori musings, which might be helpful but you shouldn’t defer to, inspired by some real events. It is not all observed facts.
Something I was wrong about: credit assignment.
I used to think: I‘m an altruist; it doesn’t matter whether I get credit for my contributions. Now I think getting credit is often important. In some contexts, when you do a good thing, a lot of the value comes indirectly via you getting empowered. And if others systematically steal your credit or block you from taking credit, you should be scared of the prospect that (1) you’re giving them power which they will use poorly or (2) you become dependent on them — if you’d gotten credit, then you’d be empowered and people would listen to you, but since they took credit, you’re stuck only able to exert influence by advising them, and they can stop listening to you.
There’s often reasonable-sounding arguments that it’s better if someone else takes credit, but you should (1) be suspicious and (2) just pay some costs to preserve proper credit assignment.
Great post. It’s too bad it didn’t get frontpage visibility.
Proposition: some kinds of agents don’t need to make object-level deals in advance. Especially for evidential/acausal reasons, or possibly because they made a meta-deal in the past. In the future, they do a bunch of thinking and then an insurance company pays everyone whose houses burned down (and everyone else pays the insurance company a little).
Proposition (perhaps based on controversial intuitions that various attitudes are objective/convergent): the system above is preferable to locking in low-level deals on e.g. power-sharing. But it’s fine to make such deals as long as we clarify that if the galaxy-brained stuff works out, we’ll follow that instead. (And if you feel confident that the galaxy-brained stuff will work out without need to coordinate in advance, you should still endorse deals to e.g. reduce conflict between agents who don’t share your view.)
some deals require uncertainty
Skill issue? (But no guarantee that all agents will be high-skill by the first crucial time.)
Surprised you didn’t bring this up. Curious what you think.
My reaction to the announcement is:
I really like the “Updating our Responsible Scaling Policy” section. Focusing on unilateral if-then commitments turned out to be a bad approach, but this is a great approach. (I haven’t read the new RSP, safety roadmap, or risk report; I don’t have takes on object-level goodness.)
Apparently there’s now a sixth person on Anthropic’s board. Previously their certificate of incorporation said the board was Dario’s seat, Yasmin’s seat, and 3 LTBT-controlled seats. I assume they’ve updated the COI to add more seats. You can pay a delaware registered agent to get you the latest copy of the COI; I don’t really have capacity to engage in this discourse now.
Regardless, my impression is that the LTBT isn’t providing a check on Anthropic; changes in the number of board seats isn’t a crux.
I mostly agree. And you can do better with better investments. I observe that this does not imply that scope-sensitive altruists should accumulate capital in order to buy galaxies: buying one millionth of the lightcone costs around $20M invested well now (my independent number; Thomas says $5-500M iiuc). And I think there are donation opportunities 100x that good for scope-sensitive altruists on the margin.
I would bet that lawyers, scholars, and chatbots would basically disagree with you. Your literal reading of the Constitution is less important than norms/practice.
Importantly, as far as I can tell, from a purely constitutional perspective the supreme court has no more authority to direct any members of the executive branch to do anything than I do. Their only constitutional power is to call for federal officers to refuse to do something. Asking them to do anything proactive would go relatively clearly against their mandate.
In case nobody else has mentioned it: this is false, courts order actions all the time, e.g. ordering agencies to do stuff that they’re legally required to do. Ask your favorite chatbot.
This ~assumes that your primary projects have diminishing returns; the third copy of Tsvi is less valuable than the first. But note that Increasing returns to effort are common.
No. (But people should still feel free to reach out if they want.)
Anthropic: Sharing our compliance framework for California’s Transparency in Frontier AI Act.
2.5 weeks ago Anthropic published a framework for SB 53. I haven’t read it; it might have interesting details. And it might be noteworthy that compliance is somewhat separate from RSP, possibly due to CA law. I’m pretty ignorant on state law stuff like SB 53 but I wasn’t expecting a new framework.
Framework here (you can’t easily download it or copy from it, which is annoying). Anthropic’s full Trust Portal here. Anthropic’s blogpost below.
On January 1, California’s Transparency in Frontier AI Act (SB 53) will go into effect. It establishes the nation’s first frontier AI safety and transparency requirements for catastrophic risks.
While we have long advocated for a federal framework, Anthropic endorsed SB 53 because we believe frontier AI developers like ourselves should be transparent about how they assess and manage these risks. Importantly, the law balances the need for strong safety practices, incident reporting, and whistleblower protections—while preserving flexibility in how developers implement their safety measures, and exempting smaller companies from unnecessary regulatory burdens.
One of the law’s key requirements is that frontier AI developers publish a framework describing how they assess and manage catastrophic risks. Our Frontier Compliance Framework (FCF) is now available to the public, here. Below, we discuss what’s included within it, and highlight what we think should come next for frontier AI transparency.
What’s in our Frontier Compliance Framework
Our FCF describes how we assess and mitigate cyber offense, chemical, biological, radiological, and nuclear threats, as well as the risks of AI sabotage and loss of control, for our frontier models. The framework also lays out our tiered system for evaluating model capabilities against these risk categories and explains our approach to mitigations. It also covers how we protect model weights and respond to safety incidents.
Much of what’s in the FCF reflects an evolution of practices we’ve followed for years. Since 2023, our Responsible Scaling Policy (RSP) has outlined our approach to managing extreme risks from advanced AI systems and informed our decisions about AI development and deployment. We also release detailed system cards when we launch new models, which describe capabilities, safety evaluations, and risk assessments. Other labs have voluntarily adopted similar approaches. Under the new law going into effect on January 1, those types of transparency practices are mandatory for those building the most powerful AI systems in California.
Moving forward, the FCF will serve as our compliance framework for SB 53 and other regulatory requirements. The RSP will remain our voluntary safety policy, reflecting what we believe best practices should be as the AI landscape evolves, even when that goes beyond or otherwise differs from current regulatory requirements.
The need for a federal standard
The implementation of SB 53 is an important moment. By formalizing achievable transparency practices that responsible labs already voluntarily follow, the law ensures these commitments can’t be abandoned quietly later once models get more capable, or as competition intensifies. Now, a federal AI transparency framework enshrining these practices is needed to ensure consistency across the country.
Earlier this year, we proposed a framework for federal legislation. It emphasizes public visibility into safety practices, without trying to lock in specific technical approaches that may not make sense over time. The core tenets of our framework include:
Requiring a public secure development framework: Covered developers should publish a framework laying out how they assess and mitigate serious risks, including chemical, biological, radiological, and nuclear harms, as well as harms from misaligned model autonomy.
Publishing system cards at deployment: Documentation summarizing testing, evaluation procedures, results, and mitigations should be publicly disclosed when models are deployed and updated if models are substantially modified.
Protecting whistleblowers: It should be an explicit violation of law for a lab to lie about compliance with its framework or punish employees who raise concerns about violations.
Flexible transparency standards: A workable AI transparency framework should have a minimum set of standards so that it can enhance security and public safety while accommodating the evolving nature of AI development. Standards should be flexible, lightweight requirements that can adapt as consensus best practices emerge.
Limit application to the largest model developers: To avoid burdening the startup ecosystem and smaller developers with models at low risk for causing catastrophic harm, requirements should apply only to established frontier developers building the most capable models.
As AI systems grow more powerful, the public deserves visibility into how they’re being developed and what safeguards are in place. We look forward to working with Congress and the administration to develop a national transparency framework that ensures safety while preserving America’s AI leadership.
The move I was gesturing at—which I haven’t really seen other people do—is saying
This sequence of reasoning [is/isn’t] cruxy for me
as opposed to the usual use of “cruxiness” which is like
I think P and you think ¬P. We agree that Q would imply P and ¬Q would imply ¬P; the crux is Q.
or
I believe P. This is based on a bunch of stuff but Q is the step/assumption/parameter/whatever that I expect is most interesting to interlocutors or you’re most likely to change my mind on or something.
If [your job is automated] . . . the S&P 500 will probably at least double (assuming no AI takeover)
Is this true?
Prior discussion: Tail SP 500 Call Options.
I currently have 7% of my portfolio in such calls.
Hmm, yeah, oops. I forgot about “cruxiness.”
Concept: epistemic loadbearingness.
I write an explanation of what I believe on a topic. If the explanation is very load-bearing for my belief, that means that if someone convinces me that a parameter is wrong or points out a methodology/math error, I’ll change my mind to whatever the corrected result is. In other words, my belief is very sensitive to the reasoning in the explanation. If the explanation is not load-bearing, my belief is really determined by a bunch of other stuff; the explanation might be helpful to people for showing one way I think I think about the question, but quibbling with the explanation wouldn’t change my mind.
This is related to Is That Your True Rejection?; I find my version of the concept more helpful for working with collaborators who share almost all background assumptions with me.
Yeah, I think the big contribution of the bioanchors work was the basic framework: roughly, have a probability distribution over effective compute for TAI, and have a distribution for effective compute over time, and combine them. Not the biological anchors.
(Indeed, I am not following up with users. Nor maintaining the sheet. And I don’t remember where I stored the message I hashed four years ago; uh oh...)
Context: Bay Area Secular Solstice 2025
Relatedly, I fear that outside of Redwood and Forethought, the “AI Consciousness and Welfare” field is focused on the stuff in this post plus advocacy rather than stuff I like: (1) making deals with early schemers to reduce P(AI takeover) and (2) prioritizing by backchaining from considerations about computronium and von Neumann probes.
Edit: here’s a central example of a proposition in an important class of propositions/considerations that I expect the field basically just isn’t thinking about and lacks generators to notice:
In the long run, when we’re colonizing the galaxies, the crucial thing is that we fill the universe with axiologically-good minds. In the short run, what matters is more about being cooperative with the AIs (and maybe the small scale means deontology is more relevant); the AIs’ preferences, not scope-sensitive direct axiological considerations, are what matters.
I agree in part. But redistributing money/power from funders to workers is good for the world only insofar as the workers are better at turning money/power into goodness-for-the-world than the funders. Is that true for the AI safety ecosystem? There’s certainly some cases where it’s true, e.g. your principle would have produced good results in the case of Lightcone, but I think it’s mostly false; I think the funders are better at turning money into goodness than almost all of the workers. (Plus insofar as the workers aren’t altruists they’ll waste money on consumption.)