These investors were Dustin Moskovitz, Jaan Tallinn and Sam Bankman-Fried
nitpick: SBF/FTX did not participate in the initial round—they bought $500M worth of non-voting shares later, after the company was well on its way.
more importantly, i often get the criticism that “if you’re concerned with AI then why do you invest in it”. even though the critics usually (and incorrectly) imply that the AI would not happen (at least not nearly as fast) if i did not invest, i acknowledge that this is a fair criticism from the FDT perspective (as witnessed by wei dai’s recent comment how he declined the opportunity to invest in anthropic).
i’m open to improving my policy (which is—empirically—also correllated with the respective policies of dustin as well as FLI) of—roughly—“invest in AI and spend the proceeds on AI safety”—but the improvements need to take into account that a) prominent AI founders have no trouble raising funds (in most of the alternative worlds anthropic is VC funded from the start, like several other openAI offshoots), b) the volume of my philanthropy is correllated with my net worth, and c) my philanthropy is more needed in the worlds where AI progresses faster.
i acknowledge that this is a fair criticism from the FDT perspective (as witnessed by wei dai’s recent comment how he declined the opportunity to invest in anthropic).
To clarify a possible confusion, I do not endorse using “FDT” (or UDT or LDT) here, because the state of decision theory research is such that I am very confused about how to apply these decision theories in practice, and personally mostly rely on a mix of other views about rationality and morality, including standard CDT-based game theory and common sense ethics.
(My current best guess is that there is minimal “logical correlation” between humans so LDT becomes CDT-like when applied to humans, and standard game theory seems to work well enough in practice or is the best tool that we currently have when it comes to multiplayer situations. Efforts to ground human moral/ethical intuitions on FDT-style reasoning do not seem very convincing to me so far, so I’m just going to stick with the intuitions themselves for now.)
In this particular case, I mainly wanted to avoid signaling approval of Anthropic’s plans and safety views or getting personally involved in activities that increase x-risk in my judgement. Avoiding conflicts of interest (becoming biased in favor of Anthropic in my thoughts and speech) was also an important consideration.
ah, sorry about mis-framing your comment! i tend to use the term “FDT” casually to refer to “instead of individual acts, try to think about policies and how would they apply to agents in my reference class(es)” (which i think does apply here, as i consider us sharing a plausible reference class).
My suspicion is that if we were to work out the math behind FDT (and it’s up in the air right now whether this is even possible) and apply it to humans, the appropriate reference class for a typical human decision would be tiny, basically just copies of oneself in other possible universes.
One reason for suspecting this is that humans aren’t running clean decision theories, but have all kinds of other considerations and influences impinging on their decisions. For example psychological differences between us around risk tolerance and spending/donating money, different credences for various ethical ideas/constraints, different intuitions about AI safety and other people’s intentions, etc., probably make it wrong to think of us as belonging to the same reference class.
Does the appriopriate [soft] reference class scale with intersimulizability of agents? i.e. generally greater more computationally powerful agents are better at simulating other agents and this will generically push towards the regime where FDT gives a larger reference class.
The asymptote would be some sort of acausal society of multiverse higher-order cooperators.
Yes, I imagine that powerful agents could eventually adopt clean (easy to reason about) decision theories, simulate other agents until they also adopt clean decision theories, and then they can reason about things like, “If I decide to X, that logically implies these other agents making decisions Y and Z”.
(Except it can’t be this simple, because this runs into problems with commitment races, e.g., while I’m simulating another agent, they suspect this and as a result make a bunch of commitments that give themselves more bargaining power. But something like this, more sophisticated in some way, might turn out to work.)
I think FDT/UDT only allows you to influence the decisions of other people who also believe in FDT/UDT.[1]
No matter how strongly you cooperate, if the reason you decide to cooperate is because of FDT/UDT, then that means you still would have defected if you didn’t believe FDT/UDT, and therefore other people (whose decisions correlate with you) will still defect just like before, regardless of how strongly FDT/UDT makes you cooperate.
Re “invest in AI and spend the proceeds on AI safety”—another consideration other than the ethical (/FDT) concerns, is that of liquidity. Have you managed to pull out any profits from Anthropic yet? If not, how likely do you think it is that you will be able to[1] before the singularity/doom?
indeed, illiquidity is a big constraint to my philanthropy, so in very short timelines my “invest (in startups) and redistribute” policy does not work too well.
You’re right – I put Series A and Series B together here. I should make the distinction.
but the improvements need to take into account that a) prominent AI founders have no trouble raising funds (in most of the alternative worlds anthropic is VC funded from the start, like several other openAI offshoots)
There is a question about whether the safety efforts your money supported at or around the companies ended up compensating for the developments at / extra competition encouraged by the companies.
It seems that if Dustin and you had not funded Series A of Anthropic, they would have had a harder time starting up. If moreso, the broader community had oriented much differently around whether to support Anthropic at the time – considering the risk of accelerating work at another company – Anthropic could have lost a large part of its support base. The flipside here is that the community would have actively alienated Anthropic’s founders and would have lost influence over work at Anthropic as well as any monetary gains. I think it would have been better to instead coordinate to not enable another AGI-development company to get off the ground, but curious for your and others’ thoughts.
b) the volume of my philanthropy is correllated with my net worth, and c) my philanthropy is more needed in the worlds where AI progresses faster.
This part makes sense to me.
I’ve been wondering about this. Just looking at the SFF grants (recognising that you might still make grants elsewhere), the amounts have definitely been going up (from $16 million in 2022 to $41 million in 2024). At the same time there have been rounds where SFF evaluators could not make grants that they wanted to. Does this have to do with liquidity issues or something else? It seems that your net worth can cover larger grants, but there must be a lot of considerations here.
There is a question about whether the safety efforts your money supported at or around the companies ended up compensating for the developments
yes. more generally, sign uncertainty sucks (and is a recurring discussion topic in SFF round debates).
It seems that if Dustin and you had not funded Series A of Anthropic, they would have had a harder time starting up.
they certainly would not have had harder time setting up the company nor getting the equivalent level of funding (perhaps even at a better valuation). it’s plausible that pointing to “aligned” investors helped with initial recruiting — but that’s unclear to me. my model of dario/founders just did not want the VC profit-motive to play a big part in the initial strategy.
Does this have to do with liquidity issues or something else?
yup, liquidity (also see the comments below), crypto prices, and about half of my philanthropy not being listed on that page. also SFF s-process works with aggregated marginal value functions, so there is no hard cutoff (hence the “evaluators could not make grants that they wanted to” sentence makes less sense than in traditional “chunky and discretionary” philanthropic context).
This is clarifying. Appreciating your openness here.
I can see how Anthropic could have started out with you and Dustin as ‘aligned’ investors, but that around that time (the year before ChatGPT) there was already enough VC interest that they could probably have raised a few hundred millions anyway
Thinking about your invitation here to explore ways to improve:
i’m open to improving my policy (which is—empirically—also correllated with the respective policies of dustin as well as FLI) of—roughly—“invest in AI and spend the proceeds on AI safety”
Two thoughts:
When you invest in an AI company, this could reasonably be taken as a sign that you are endorsing their existence. Doing so can also make it socially harder later to speak out (e.g. on Anthropic) in public.
Has it been common for you to have specific concerns that a start-up could or would likely do more harm than good – but you decide to invest because you expect VCs would cover the needed funds anyway (but not grant investment returns to ‘safety’ work, nor advise execs to act more prudently)?
In that case, could you put out those concerns in public before you make the investment? Having that open list seems helpful for stakeholders (e.g. talented engineers who consider applying) to make up their own mind and know what to watch out for. It might also help hold the execs accountable.
Pursuing these priorities imposes little to no actualpressure on AI corporations to refrain from reckless model development and releases. They’re too complicated and prone to actors finding loopholes, and most of them lack broad-based legitimacy and established enforcement mechanisms.
Sharing my honest impressions here, but recognising that there is a lot of thought put behind these proposals and I may well be misinterpreting them (do correct me):
The liability laws proposal I liked at the time. Unfortunately, it’s become harder since then to get laws passed given successful lobbying of US and Californian lawmakers who are open to keeping AI deregulated. Though maybe there are other state assemblies that are less tied up by tech money and tougher on tech that harms consumers (New York?).
The labelling requirements seem like low-hanging fruit. It’s useful for informing the public, but applies little pressure on AI corporations to not go further ‘off the rails’.
The veto committee proposal provides a false sense of security with little teeth behind it. In practice, we’ve seen supposedly independent boards, trusts, committees and working groups repeatedly fail to carry out their mandates (at DM, OAI, Anthropic, UK+US safety institute, the EU AI office, etc) because nonaligned actors could influence them to, or restructure them, or simply ignore or overrule their decisions. The veto committee idea is unworkable, in my view, because we first need to deal with a lack of real accountability and capacity for outside concerned coalitions to impose pressure on AI corporations.
Unless the committee format is meant as a basis for wider inquiry and stakeholder empowerment? A citizen assembly for carefully deliberating a crucial policy question (not just on e.g. upcoming training runs) would be useful because it encourages wider public discussion and builds legitimacy. If the citizen’s assembly mandate gets restricted into irrelevance or its decision gets ignored, a basis has still been laid for engaged stakeholders to coordinate around pushing that decision through.
The other proposals – data centre certification, speed limits, and particularly the global off-switch – appear to be circuitous, overly complicated and mostly unestablished attempts at monitoring and enforcement for mostly unknown future risks. They look technically neat, but create little ingress capacity for different opinionated stakeholders to coordinate around restricting unsafe AI development. I actually suspect that they’d be a hidden gift for AGI labs who can go along with the complicated proceedings and undermine them once no longer useful for corporate HQ’s strategy.
Direct and robust interventions could e.g. build off existing legal traditions and widely shared norms, and be supportive of concerned citizens and orgs that are already coalescing to govern clearly harmful AI development projects.
An example that comes to mind: You could fund coalition-building around blocking the local construction of and tax exemptions for hyperscale data centers by relatively reckless AI companies (e.g. Meta). Some seasoned organisers just started working there, and they are supported by local residents, environmentalist orgs, creative advocates, citizen education media, and the broader concerned public. See also Data Center Watch.
1. i agree. as wei explicitly mentions, signalling approval was a big reason why he did not invest, and it definitely gave me a pause, too (i had a call with nate & eliezer on this topic around that time). still, if i try to imagine a world where i declined to invest, i don’t see it being obviously better (ofc it’s possible that the difference is still yet to reveal itself).
concerns about startups being net negative are extremely rare (outside of AI, i can’t remember any other case—though it’s possible that i’m forgetting some). i believe this is the main reason why VCs and SV technologists tend to be AI xrisk deniers (another being that it’s harder to fundraise as a VC/technologist if you have sign uncertainty) -- their prior is too strong to consider AI an exception. a couple of years ago i was at an event in SF where top tech CEOs talked about wanting to create “lots of externalties”, implying that externalities can only be positive.
2. yeah, the priorities page is now more than a year old and in bad need of an update. thanks for the criticism—fwded to the people drafting the update.
I revised the footnote to make the Series A v.s B distinction. I also interpreted your comment to mean that series A investments were in voting shares but do please correct:
These investors were Dustin Moskovitz and Jaan Tallinn in Series A, and Sam Bankman-Fried about a year later in Series B.
Dustin was advised to invest by Holden Karnofsky. Sam invested $500 million through FTX, by far the largest investment, though it was in non-voting shares.
nitpick: SBF/FTX did not participate in the initial round—they bought $500M worth of non-voting shares later, after the company was well on its way.
more importantly, i often get the criticism that “if you’re concerned with AI then why do you invest in it”. even though the critics usually (and incorrectly) imply that the AI would not happen (at least not nearly as fast) if i did not invest, i acknowledge that this is a fair criticism from the FDT perspective (as witnessed by wei dai’s recent comment how he declined the opportunity to invest in anthropic).
i’m open to improving my policy (which is—empirically—also correllated with the respective policies of dustin as well as FLI) of—roughly—“invest in AI and spend the proceeds on AI safety”—but the improvements need to take into account that a) prominent AI founders have no trouble raising funds (in most of the alternative worlds anthropic is VC funded from the start, like several other openAI offshoots), b) the volume of my philanthropy is correllated with my net worth, and c) my philanthropy is more needed in the worlds where AI progresses faster.
EDIT: i appreciate the post otherwise—upvoted!
To clarify a possible confusion, I do not endorse using “FDT” (or UDT or LDT) here, because the state of decision theory research is such that I am very confused about how to apply these decision theories in practice, and personally mostly rely on a mix of other views about rationality and morality, including standard CDT-based game theory and common sense ethics.
(My current best guess is that there is minimal “logical correlation” between humans so LDT becomes CDT-like when applied to humans, and standard game theory seems to work well enough in practice or is the best tool that we currently have when it comes to multiplayer situations. Efforts to ground human moral/ethical intuitions on FDT-style reasoning do not seem very convincing to me so far, so I’m just going to stick with the intuitions themselves for now.)
In this particular case, I mainly wanted to avoid signaling approval of Anthropic’s plans and safety views or getting personally involved in activities that increase x-risk in my judgement. Avoiding conflicts of interest (becoming biased in favor of Anthropic in my thoughts and speech) was also an important consideration.
ah, sorry about mis-framing your comment! i tend to use the term “FDT” casually to refer to “instead of individual acts, try to think about policies and how would they apply to agents in my reference class(es)” (which i think does apply here, as i consider us sharing a plausible reference class).
My suspicion is that if we were to work out the math behind FDT (and it’s up in the air right now whether this is even possible) and apply it to humans, the appropriate reference class for a typical human decision would be tiny, basically just copies of oneself in other possible universes.
One reason for suspecting this is that humans aren’t running clean decision theories, but have all kinds of other considerations and influences impinging on their decisions. For example psychological differences between us around risk tolerance and spending/donating money, different credences for various ethical ideas/constraints, different intuitions about AI safety and other people’s intentions, etc., probably make it wrong to think of us as belonging to the same reference class.
re first paragraph that seems wrong, a continuous relaxation of FDT seems like it ought to do what people seem to intuitively think FDT does
Does the appriopriate [soft] reference class scale with intersimulizability of agents?
i.e. generally greater more computationally powerful agents are better at simulating other agents and this will generically push towards the regime where FDT gives a larger reference class.
The asymptote would be some sort of acausal society of multiverse higher-order cooperators.
Yes, I imagine that powerful agents could eventually adopt clean (easy to reason about) decision theories, simulate other agents until they also adopt clean decision theories, and then they can reason about things like, “If I decide to X, that logically implies these other agents making decisions Y and Z”.
(Except it can’t be this simple, because this runs into problems with commitment races, e.g., while I’m simulating another agent, they suspect this and as a result make a bunch of commitments that give themselves more bargaining power. But something like this, more sophisticated in some way, might turn out to work.)
I think FDT/UDT only allows you to influence the decisions of other people who also believe in FDT/UDT.[1]
No matter how strongly you cooperate, if the reason you decide to cooperate is because of FDT/UDT, then that means you still would have defected if you didn’t believe FDT/UDT, and therefore other people (whose decisions correlate with you) will still defect just like before, regardless of how strongly FDT/UDT makes you cooperate.
Assuming there are no complicated simulations or acausal trade commitments.
Re “invest in AI and spend the proceeds on AI safety”—another consideration other than the ethical (/FDT) concerns, is that of liquidity. Have you managed to pull out any profits from Anthropic yet? If not, how likely do you think it is that you will be able to[1] before the singularity/doom?
Maybe this would require an IPO?
indeed, illiquidity is a big constraint to my philanthropy, so in very short timelines my “invest (in startups) and redistribute” policy does not work too well.
You’re right – I put Series A and Series B together here. I should make the distinction.
There is a question about whether the safety efforts your money supported at or around the companies ended up compensating for the developments at / extra competition encouraged by the companies.
It seems that if Dustin and you had not funded Series A of Anthropic, they would have had a harder time starting up. If moreso, the broader community had oriented much differently around whether to support Anthropic at the time – considering the risk of accelerating work at another company – Anthropic could have lost a large part of its support base. The flipside here is that the community would have actively alienated Anthropic’s founders and would have lost influence over work at Anthropic as well as any monetary gains. I think it would have been better to instead coordinate to not enable another AGI-development company to get off the ground, but curious for your and others’ thoughts.
This part makes sense to me.
I’ve been wondering about this. Just looking at the SFF grants (recognising that you might still make grants elsewhere), the amounts have definitely been going up (from $16 million in 2022 to $41 million in 2024). At the same time there have been rounds where SFF evaluators could not make grants that they wanted to. Does this have to do with liquidity issues or something else? It seems that your net worth can cover larger grants, but there must be a lot of considerations here.
yes. more generally, sign uncertainty sucks (and is a recurring discussion topic in SFF round debates).
they certainly would not have had harder time setting up the company nor getting the equivalent level of funding (perhaps even at a better valuation). it’s plausible that pointing to “aligned” investors helped with initial recruiting — but that’s unclear to me. my model of dario/founders just did not want the VC profit-motive to play a big part in the initial strategy.
yup, liquidity (also see the comments below), crypto prices, and about half of my philanthropy not being listed on that page. also SFF s-process works with aggregated marginal value functions, so there is no hard cutoff (hence the “evaluators could not make grants that they wanted to” sentence makes less sense than in traditional “chunky and discretionary” philanthropic context).
This is clarifying. Appreciating your openness here.
I can see how Anthropic could have started out with you and Dustin as ‘aligned’ investors, but that around that time (the year before ChatGPT) there was already enough VC interest that they could probably have raised a few hundred millions anyway
Thinking about your invitation here to explore ways to improve:
Two thoughts:
When you invest in an AI company, this could reasonably be taken as a sign that you are endorsing their existence. Doing so can also make it socially harder later to speak out (e.g. on Anthropic) in public.
Has it been common for you to have specific concerns that a start-up could or would likely do more harm than good – but you decide to invest because you expect VCs would cover the needed funds anyway (but not grant investment returns to ‘safety’ work, nor advise execs to act more prudently)?
In that case, could you put out those concerns in public before you make the investment? Having that open list seems helpful for stakeholders (e.g. talented engineers who consider applying) to make up their own mind and know what to watch out for. It might also help hold the execs accountable.
The grant priorities for restrictive efforts seem too soft.
Pursuing these priorities imposes little to no actual pressure on AI corporations to refrain from reckless model development and releases. They’re too complicated and prone to actors finding loopholes, and most of them lack broad-based legitimacy and established enforcement mechanisms.
Sharing my honest impressions here, but recognising that there is a lot of thought put behind these proposals and I may well be misinterpreting them (do correct me):
The liability laws proposal I liked at the time. Unfortunately, it’s become harder since then to get laws passed given successful lobbying of US and Californian lawmakers who are open to keeping AI deregulated. Though maybe there are other state assemblies that are less tied up by tech money and tougher on tech that harms consumers (New York?).
The labelling requirements seem like low-hanging fruit. It’s useful for informing the public, but applies little pressure on AI corporations to not go further ‘off the rails’.
The veto committee proposal provides a false sense of security with little teeth behind it. In practice, we’ve seen supposedly independent boards, trusts, committees and working groups repeatedly fail to carry out their mandates (at DM, OAI, Anthropic, UK+US safety institute, the EU AI office, etc) because nonaligned actors could influence them to, or restructure them, or simply ignore or overrule their decisions. The veto committee idea is unworkable, in my view, because we first need to deal with a lack of real accountability and capacity for outside concerned coalitions to impose pressure on AI corporations.
Unless the committee format is meant as a basis for wider inquiry and stakeholder empowerment? A citizen assembly for carefully deliberating a crucial policy question (not just on e.g. upcoming training runs) would be useful because it encourages wider public discussion and builds legitimacy. If the citizen’s assembly mandate gets restricted into irrelevance or its decision gets ignored, a basis has still been laid for engaged stakeholders to coordinate around pushing that decision through.
The other proposals – data centre certification, speed limits, and particularly the global off-switch – appear to be circuitous, overly complicated and mostly unestablished attempts at monitoring and enforcement for mostly unknown future risks. They look technically neat, but create little ingress capacity for different opinionated stakeholders to coordinate around restricting unsafe AI development. I actually suspect that they’d be a hidden gift for AGI labs who can go along with the complicated proceedings and undermine them once no longer useful for corporate HQ’s strategy.
Direct and robust interventions could e.g. build off existing legal traditions and widely shared norms, and be supportive of concerned citizens and orgs that are already coalescing to govern clearly harmful AI development projects.
An example that comes to mind: You could fund coalition-building around blocking the local construction of and tax exemptions for hyperscale data centers by relatively reckless AI companies (e.g. Meta). Some seasoned organisers just started working there, and they are supported by local residents, environmentalist orgs, creative advocates, citizen education media, and the broader concerned public. See also Data Center Watch.
1. i agree. as wei explicitly mentions, signalling approval was a big reason why he did not invest, and it definitely gave me a pause, too (i had a call with nate & eliezer on this topic around that time). still, if i try to imagine a world where i declined to invest, i don’t see it being obviously better (ofc it’s possible that the difference is still yet to reveal itself).
concerns about startups being net negative are extremely rare (outside of AI, i can’t remember any other case—though it’s possible that i’m forgetting some). i believe this is the main reason why VCs and SV technologists tend to be AI xrisk deniers (another being that it’s harder to fundraise as a VC/technologist if you have sign uncertainty) -- their prior is too strong to consider AI an exception. a couple of years ago i was at an event in SF where top tech CEOs talked about wanting to create “lots of externalties”, implying that externalities can only be positive.
2. yeah, the priorities page is now more than a year old and in bad need of an update. thanks for the criticism—fwded to the people drafting the update.
I revised the footnote to make the Series A v.s B distinction. I also interpreted your comment to mean that series A investments were in voting shares but do please correct: