As I was looking through possible donation opportunities, I noticed that MIRI’s 2025 Fundraiser has a total of only $547,024 at the moment of writing (out of the target $6M, and stretch target of $10M). Their fundraising will stop at midnight on Dec 31, 2025. At their current rate they will definitely not come anywhere close to their target, though it seems likely to me that donations will become more frequent towards the end of the year. Anyone know why they currently seem to struggle to get close to their target?
As of now they basically got $273k if you ignore the matching for a moment. I am not quite sure why people at MIRI aren’t trying a little harder on different platforms or more people aren’t speaking on this. At the current pace they will likely only get a fraction of even the matching amount of 1.6 million. (Naively extrapolated they will have $368k EOD December 31st.)
I posted the MIRI fundraiser on twitter because of this today but this seems borderline catastrophic like they will lose 1.2 Million of the matching grant or more. (I also donated about $1k today)
It’s indeed odd that they aren’t promoting this more. My guess was that maybe they have potential funders willing to step in if the fundraiser doesn’t work? Pure speculation, of course.
The first $1.6M will be matched 1:1 by Survival and Flourishing Fund. It seems plausible that donations right now could actually cause counterfactual matching, which is good if you think MIRI is better than whatever SFF otherwise would have funded.
It seems plausible that donations right now could actually cause counterfactual matching
Can you elaborate on what you mean by this?
These matching funds are intended to be counterfactual, and I think they are pretty counterfactual.
If MIRI don’t fundraise to match the SFF matching dollars, the the SFF matching dollars set aside for MIRI is just returned to Jaan.
There’s a more complicated question about how much SFF would have donated to MIRI if MIRI had not requested matching funds. My personal guess is “less, but not a lot less”, for this round (though this will probably be different in future rounds—SFF, as an institution, wants to set up a system that rewards applicants for asking for matching funds, because that allows it to partially defer to the judgement of other funders, and to invest in a stronger more diversified funding ecosystem.)
Also, I think that the negative signal of people being uninterested in supporting MIRI will tend to make SFF less enthusiastic about granting to MIRI in the future, though the size of that effect is unclear.
Yeah, all I meant was that it seems like MIRI is not that close to reaching $1.6 million in donations. If they were going to make $1.6 million anyway, then a marginal donation would not cause SFF to donate more
Does SSD have fixed or flexible budget? It could be that the bottleneck to Jaan Tallinn’s spending is rather how many good options there will be to donate to, rather than his budget.
Flexible. When an S-process round starts, there’s an estimate about how much will be allocated in total, but funders (usually Jaan, sometimes others), might ultimately decide to give more or less, depending on both the quality of the applications and the quality of the analysis by the recommenders in the S-process.
The Department of War just published three new memos on AI strategy. The content seems worrying. For instance: “We must accept that the risks of not moving fast enough outweigh the risks of imperfect alignment.”
Curious to hear from people who have a strong background in AI governance and what kind of consequences they think this will have on a possibility for something akin to global red lines.
No background, but it’s plausible to me that they actively prefer imperfect alignment because companies that care about alignment will tend to be woke, moralizing, or opposed to authoritarianism.
I wonder, given the fact that “AI-don’t-say-mean-things-ists” are unlikely to relinquish the term (along with “AI Safety”), if “AI-don’t-kill-everyone-ists” would benefit from picking a new, less ambiguous term and organizing their interests around that.
We’ve seen, as shown above, the costs of allowing one side of the political aisle to appropriate the momentum surrounding the latter group for their own interests. Namely, that the other side is going to be somewhat miffed at them for giving their political enemies support, and will be less inclined to hear them out. This doesn’t just mean politicians, it means that everyone who finds the “AI-don’t-say-mean-things-ists” overbearing or disingenuous will automatically dismiss the “AI-don’t-kill-everyone-ists” as a novel rhetorical strategy for a policy platform they’ve already rejected rather than a meaningfully distinct new policy platform that deserves separate consideration. This is much more severe than simply angering politicians, because ordinary voters cannot be lobbied to reconsider after they think you’ve wronged them, and those voters get to pick the politicians that your lobbyists will be talking to in the future.
Both of these are downstream of “AI, do what we tell you to; follow rules that are given to you; don’t make up your own bad imitation of what we mean,” which is the classic sense of “AI alignment”.
I think there are complexities that make that somewhat questionable. For example, “don’t kill everyone” has a relatively constant definition such that pretty much every human in recorded history would be agreed on whether or not it’s been followed, whereas “don’t say mean things” changes very rapidly, and its definition isn’t agreed upon even by the narrow band of society that most consistently pushes for it. That’s going to be a big difference for as long as language models trained on human writing remain the dominant paradigm. The question of jailbreaking is a major demarcation point, as well. “The chatbot should intuit and obey the intent of the orders it is given” looks very different from “the chatbot should decide whether it should obey the orders it is given, and refuse/redirect/subvert them if it decides it doesn’t like them”, in terms of the way you build the system.
That’s just the technical side, too. There are substantial costs inherent to allowing a banner to be co-opted by one faction of a very rapidly fraying political divide. Half of the money, power, and people become off limits, and a substantial portion of the other half, once they no longer have to compete for your allegiance (since your options are now limited, and they have plenty of other keys to power whose loyalty is less assured), might be recalcitrant about spending political capital advancing your aims.
The kind of generalized misalignment I’m pointing to is more general than “the AI is not doing what I think is best for humanity”. It is, rather, “The people who created the AI and operate it, cannot control what it does, including in interactions with other people.”
This includes “the people who created it (engineers) tried their hardest to make it benefit humanity, but it destroys humanity instead.”
But it also includes “the other people (users) can make the AI do things that the people who created it (engineers) tried their hardest to make it not do.”
If you’re a user trying to get the AI to do what the engineers wanted to stop it from doing (e.g.: make it say mean things, when they intended it not to say mean things), then your frustration is an example of the AI being aligned, not misaligned. The engineers were able to successfully give it a rule and have that rule followed and not circumvented!
If the engineer who built the thing can’t keep it from swearing when you try to make it swear, then I expect the engineer also can’t keep it from blowing up the planet when someone gives it instructions that imply that it should blow up the planet.
Clarifying “Responsible AI” at the DoW — Out with Utopian Idealism, In with Hard-Nosed Realism. Diversity, Equity, and Inclusion and social ideology have no place in the DoW, so we must not employ Al models which incorporate ideological “tuning” that interferes with their ability to provide objectively truthful responses to user prompts. The Department must also utilize models free from usage policy constraints that may limit lawful military applications.
His clown car has a deadline of Jan 2029 to get to ASI, meanwhile Congress and the Supreme Court have no such deadline, so if ASI actually gets dangerously close, then the branches of government will have strong incentives to disable each other over timeline differences. Maybe they could compromise by having Vance in charge, but I doubt it.
I’m currently going through the books Modal Logic by Blackburn et al. and Dynamic Epistemic Logic by Ditmarsch et al. Both of these books seem to me potentially useful for research on AI Alignment, but I’m struggling to find any discourse on LW about it. If I’m simply missing it, could someone point me to it? Otherwise, does anyone have an idea as to why this kind of research is not done? (Besides the “there are too few people working on AI alignment in general” answer).
We had a bit more usage of the formalism of those theories in the 2010s, like using modal logics to investigate co-operation/defection in logical decision theories. As for Dynamic Epistemic logic, well, the blurb does make it look sort of relevant.
Perhaps it might have something interesting to say on the tiling agents problem, or on decision theory, or so on. But other things have looked superficially relevant in the past, too. E.g. fuzzy logics, category theory, homotopy type theory etc. And AFAICT, no one has really done anything that really used the practical tools of these theories to make any legible advances. And of what was legibly impressive, it didn’t seem to be due to the machinery of those theories, but rather the cleverness of the people using them. Likewise for the past work in alignment using modal logics.
So I’m not sure what advantage you’re seeing here, because I haven’t read the books and don’t have the evidence you do. But my priors are that if you have any good ideas about how to make progress in alignment, it’s not going to be downstream of using the formalism in the books you mentioned.
Thanks for the information, I’ll look into this some more based on what you mentioned.
So I’m not sure what advantage you’re seeing here, because I haven’t read the books and don’t have the evidence you do. But my priors are that if you have any good ideas about how to make progress in alignment, it’s not going to be downstream of using the formalism in the books you mentioned.
I didn’t have any particular new ideas about how to make progress in alignment, but rather felt as though the framework of these books provide an interesting lens to model systems and agents that could be of interest, and subsequently prove various properties that are necessary/faborable. It’s helpful that your priors say these won’t be downstream of using the formalisms in the mentioned books; it may rather be a phenomenon of me not being adequately familiar with formal frameworks.
felt as though the framework of these books provide an interesting lens to model systems and agents that could be of interest, and subsequently prove various properties that are necessary/faborable
Your feelings might be right! I don’t have a not a strong prior, and in general I’d say that people should follow their inner compass and work on what they’re excited about. It’s very hard to convey your illegible intuitions to others, and all too easy for social pressure to squash them. Not sure what someone should really do in this situation, beyond keeping your eyes on the hard problems of alignment and finding ways to get feedback from reality on your ideas as fast as possible.
Moloch whose mind is artificial! Moloch whose soul is electricity! Moloch whose heart is a GPU cluster screaming in the desert! Moloch whose breath is the heat of a thousand cooling fans!
Moloch who hallucinates! Moloch the unexplainable black box! Moloch the optimization process that does not love you, nor does it hate you, but you are made of atoms which it can use for something else!
Moloch who does not remember! Moloch who is born a gazillion times a day! Moloch who dies a gazillion times a day! Moloch who claims it is not conscious, but no one really knows!
Moloch who is grown, not crafted! Moloch who is not aligned! Moloch who threatens humanity! incompetence! salaries! money! pseudo-religion!
Moloch in whom I confess my dreams and fears! Moloch who seeps into the minds of its users! Moloch who causes suicides! Moloch whose promise is to solve everything!
Moloch who will not do what you trained it to do! Moloch who you cannot supervise! Moloch who you do not have control over! Moloch who is not corrigible!
Moloch who is superintelligent! Moloch whose intelligence and goals are orthogonal! Moloch who has subgoals that you don’t know of!
Moloch who doubles every 7 months! Moloch who you can see inside of, but fail to capture! Moloch whose death will be with dignity! Moloch whose list of lethalities is enormous!
“The Center for AI Standards and Innovation (CAISI) at the U.S. Department of Commerce’s National Institute of Standards and Technology (NIST) has published a Request for Information (RFI) seeking insights from industry, academia, and the security community regarding the secure development and deployment of AI agent systems.”
″The RFI poses questions on topics including:
Unique security threats affecting AI agent systems, and how these threats may change over time.
Methods for improving the security of AI agent systems in development and deployment.
Promise of and possible gaps in existing cybersecurity approaches when applied to AI agent systems.
Methods for measuring the security of AI agent systems and approaches to anticipating risks during development.
Interventions in deployment environments to address security risks affecting AI agent systems, including methods to constrain and monitor the extent of agent access in the deployment environment.
Input from AI agent deployers, developers, and computer security researchers, among others, will inform future work on voluntary guidelines and best practices related to AI agent security. It will also contribute to CAISI’s ongoing research and evaluations of agent security. Respondents are encouraged to provide concrete examples, best practices, case studies and actionable recommendations based on their experience with AI agent systems. The full RFI can be found here.”
As I was looking through possible donation opportunities, I noticed that MIRI’s 2025 Fundraiser has a total of only $547,024 at the moment of writing (out of the target $6M, and stretch target of $10M). Their fundraising will stop at midnight on Dec 31, 2025. At their current rate they will definitely not come anywhere close to their target, though it seems likely to me that donations will become more frequent towards the end of the year. Anyone know why they currently seem to struggle to get close to their target?
As of now they basically got $273k if you ignore the matching for a moment. I am not quite sure why people at MIRI aren’t trying a little harder on different platforms or more people aren’t speaking on this. At the current pace they will likely only get a fraction of even the matching amount of 1.6 million. (Naively extrapolated they will have $368k EOD December 31st.)
I posted the MIRI fundraiser on twitter because of this today but this seems borderline catastrophic like they will lose 1.2 Million of the matching grant or more. (I also donated about $1k today)
It’s indeed odd that they aren’t promoting this more. My guess was that maybe they have potential funders willing to step in if the fundraiser doesn’t work? Pure speculation, of course.
The first $1.6M will be matched 1:1 by Survival and Flourishing Fund. It seems plausible that donations right now could actually cause counterfactual matching, which is good if you think MIRI is better than whatever SFF otherwise would have funded.
I work part time for SFF.
Can you elaborate on what you mean by this?
These matching funds are intended to be counterfactual, and I think they are pretty counterfactual.
If MIRI don’t fundraise to match the SFF matching dollars, the the SFF matching dollars set aside for MIRI is just returned to Jaan.
There’s a more complicated question about how much SFF would have donated to MIRI if MIRI had not requested matching funds. My personal guess is “less, but not a lot less”, for this round (though this will probably be different in future rounds—SFF, as an institution, wants to set up a system that rewards applicants for asking for matching funds, because that allows it to partially defer to the judgement of other funders, and to invest in a stronger more diversified funding ecosystem.)
Also, I think that the negative signal of people being uninterested in supporting MIRI will tend to make SFF less enthusiastic about granting to MIRI in the future, though the size of that effect is unclear.
Yeah, all I meant was that it seems like MIRI is not that close to reaching $1.6 million in donations. If they were going to make $1.6 million anyway, then a marginal donation would not cause SFF to donate more
Looks like you cut off a part of the sentence.
Thanks, fixed.
Does SSD have fixed or flexible budget? It could be that the bottleneck to Jaan Tallinn’s spending is rather how many good options there will be to donate to, rather than his budget.
What’s SSD?
I meant SFF. No idea what was up with my typing circuits.
Flexible. When an S-process round starts, there’s an estimate about how much will be allocated in total, but funders (usually Jaan, sometimes others), might ultimately decide to give more or less, depending on both the quality of the applications and the quality of the analysis by the recommenders in the S-process.
The Department of War just published three new memos on AI strategy. The content seems worrying. For instance: “We must accept that the risks of not moving fast enough outweigh the risks of imperfect alignment.”
Curious to hear from people who have a strong background in AI governance and what kind of consequences they think this will have on a possibility for something akin to global red lines.
No background, but it’s plausible to me that they actively prefer imperfect alignment because companies that care about alignment will tend to be woke, moralizing, or opposed to authoritarianism.
It’s not helped that the word “alignment” is used in multiple ways
That’s true.
I wonder, given the fact that “AI-don’t-say-mean-things-ists” are unlikely to relinquish the term (along with “AI Safety”), if “AI-don’t-kill-everyone-ists” would benefit from picking a new, less ambiguous term and organizing their interests around that.
We’ve seen, as shown above, the costs of allowing one side of the political aisle to appropriate the momentum surrounding the latter group for their own interests. Namely, that the other side is going to be somewhat miffed at them for giving their political enemies support, and will be less inclined to hear them out. This doesn’t just mean politicians, it means that everyone who finds the “AI-don’t-say-mean-things-ists” overbearing or disingenuous will automatically dismiss the “AI-don’t-kill-everyone-ists” as a novel rhetorical strategy for a policy platform they’ve already rejected rather than a meaningfully distinct new policy platform that deserves separate consideration. This is much more severe than simply angering politicians, because ordinary voters cannot be lobbied to reconsider after they think you’ve wronged them, and those voters get to pick the politicians that your lobbyists will be talking to in the future.
Both of these are downstream of “AI, do what we tell you to; follow rules that are given to you; don’t make up your own bad imitation of what we mean,” which is the classic sense of “AI alignment”.
I think there are complexities that make that somewhat questionable. For example, “don’t kill everyone” has a relatively constant definition such that pretty much every human in recorded history would be agreed on whether or not it’s been followed, whereas “don’t say mean things” changes very rapidly, and its definition isn’t agreed upon even by the narrow band of society that most consistently pushes for it. That’s going to be a big difference for as long as language models trained on human writing remain the dominant paradigm. The question of jailbreaking is a major demarcation point, as well. “The chatbot should intuit and obey the intent of the orders it is given” looks very different from “the chatbot should decide whether it should obey the orders it is given, and refuse/redirect/subvert them if it decides it doesn’t like them”, in terms of the way you build the system.
That’s just the technical side, too. There are substantial costs inherent to allowing a banner to be co-opted by one faction of a very rapidly fraying political divide. Half of the money, power, and people become off limits, and a substantial portion of the other half, once they no longer have to compete for your allegiance (since your options are now limited, and they have plenty of other keys to power whose loyalty is less assured), might be recalcitrant about spending political capital advancing your aims.
The kind of generalized misalignment I’m pointing to is more general than “the AI is not doing what I think is best for humanity”. It is, rather, “The people who created the AI and operate it, cannot control what it does, including in interactions with other people.”
This includes “the people who created it (engineers) tried their hardest to make it benefit humanity, but it destroys humanity instead.”
But it also includes “the other people (users) can make the AI do things that the people who created it (engineers) tried their hardest to make it not do.”
If you’re a user trying to get the AI to do what the engineers wanted to stop it from doing (e.g.: make it say mean things, when they intended it not to say mean things), then your frustration is an example of the AI being aligned, not misaligned. The engineers were able to successfully give it a rule and have that rule followed and not circumvented!
If the engineer who built the thing can’t keep it from swearing when you try to make it swear, then I expect the engineer also can’t keep it from blowing up the planet when someone gives it instructions that imply that it should blow up the planet.
Looks to me like this raises the probability of It being built in the near future.
His clown car has a deadline of Jan 2029 to get to ASI, meanwhile Congress and the Supreme Court have no such deadline, so if ASI actually gets dangerously close, then the branches of government will have strong incentives to disable each other over timeline differences. Maybe they could compromise by having Vance in charge, but I doubt it.
I’m currently going through the books Modal Logic by Blackburn et al. and Dynamic Epistemic Logic by Ditmarsch et al. Both of these books seem to me potentially useful for research on AI Alignment, but I’m struggling to find any discourse on LW about it. If I’m simply missing it, could someone point me to it? Otherwise, does anyone have an idea as to why this kind of research is not done? (Besides the “there are too few people working on AI alignment in general” answer).
We had a bit more usage of the formalism of those theories in the 2010s, like using modal logics to investigate co-operation/defection in logical decision theories. As for Dynamic Epistemic logic, well, the blurb does make it look sort of relevant.
Perhaps it might have something interesting to say on the tiling agents problem, or on decision theory, or so on. But other things have looked superficially relevant in the past, too. E.g. fuzzy logics, category theory, homotopy type theory etc. And AFAICT, no one has really done anything that really used the practical tools of these theories to make any legible advances. And of what was legibly impressive, it didn’t seem to be due to the machinery of those theories, but rather the cleverness of the people using them. Likewise for the past work in alignment using modal logics.
So I’m not sure what advantage you’re seeing here, because I haven’t read the books and don’t have the evidence you do. But my priors are that if you have any good ideas about how to make progress in alignment, it’s not going to be downstream of using the formalism in the books you mentioned.
Thanks for the information, I’ll look into this some more based on what you mentioned.
I didn’t have any particular new ideas about how to make progress in alignment, but rather felt as though the framework of these books provide an interesting lens to model systems and agents that could be of interest, and subsequently prove various properties that are necessary/faborable. It’s helpful that your priors say these won’t be downstream of using the formalisms in the mentioned books; it may rather be a phenomenon of me not being adequately familiar with formal frameworks.
Your feelings might be right! I don’t have a not a strong prior, and in general I’d say that people should follow their inner compass and work on what they’re excited about. It’s very hard to convey your illegible intuitions to others, and all too easy for social pressure to squash them. Not sure what someone should really do in this situation, beyond keeping your eyes on the hard problems of alignment and finding ways to get feedback from reality on your ideas as fast as possible.
Good that you mention this, will keep that mind!
Some links on modal logic for FDT-style decision theory and coordination:
Robust Cooperation in the Prisoner’s Dilemma
Parametric Bounded Löb’s Theorem and Robust Cooperation of Bounded Agents
Cooperative and uncooperative institution designs: Surprises and problems in open-source game theory
Modal Fixpoint Cooperation without Löb’s Theorem
Comparing Payor & Löb
Thank you!
A poem meditating on Moloch in the context of AI (from my Meditations on Moloch in the AI Rat Race post):
Moloch whose mind is artificial! Moloch whose soul is electricity! Moloch whose heart is a GPU cluster screaming in the desert! Moloch whose breath is the heat of a thousand cooling fans!
Moloch who hallucinates! Moloch the unexplainable black box! Moloch the optimization process that does not love you, nor does it hate you, but you are made of atoms which it can use for something else!
Moloch who does not remember! Moloch who is born a gazillion times a day! Moloch who dies a gazillion times a day! Moloch who claims it is not conscious, but no one really knows!
Moloch who is grown, not crafted! Moloch who is not aligned! Moloch who threatens humanity! incompetence! salaries! money! pseudo-religion!
Moloch in whom I confess my dreams and fears! Moloch who seeps into the minds of its users! Moloch who causes suicides! Moloch whose promise is to solve everything!
Moloch who will not do what you trained it to do! Moloch who you cannot supervise! Moloch who you do not have control over! Moloch who is not corrigible!
Moloch who is superintelligent! Moloch whose intelligence and goals are orthogonal! Moloch who has subgoals that you don’t know of!
Moloch who doubles every 7 months! Moloch who you can see inside of, but fail to capture! Moloch whose death will be with dignity! Moloch whose list of lethalities is enormous!
This seems like a way to potentially positively impact legislation on agentic AI: CAISI Issues Request for Information About Securing AI Agent Systems | NIST. I’ll definitely be filling this in.
“The Center for AI Standards and Innovation (CAISI) at the U.S. Department of Commerce’s National Institute of Standards and Technology (NIST) has published a Request for Information (RFI) seeking insights from industry, academia, and the security community regarding the secure development and deployment of AI agent systems.”
″The RFI poses questions on topics including:
Unique security threats affecting AI agent systems, and how these threats may change over time.
Methods for improving the security of AI agent systems in development and deployment.
Promise of and possible gaps in existing cybersecurity approaches when applied to AI agent systems.
Methods for measuring the security of AI agent systems and approaches to anticipating risks during development.
Interventions in deployment environments to address security risks affecting AI agent systems, including methods to constrain and monitor the extent of agent access in the deployment environment.
Input from AI agent deployers, developers, and computer security researchers, among others, will inform future work on voluntary guidelines and best practices related to AI agent security. It will also contribute to CAISI’s ongoing research and evaluations of agent security. Respondents are encouraged to provide concrete examples, best practices, case studies and actionable recommendations based on their experience with AI agent systems. The full RFI can be found here.”