boazbarak

Karma: 1,403

Thoughts by a non-economist on AI and economics

boazbarak4 Nov 2025 17:06 UTC

36 points

2 comments14 min readLW link

boazbarak 24 Oct 2025 11:35 UTC
4 points
0
in reply to: yams’s comment on: Which side of the AI safety community are you in?
I agree no one can make absolute guarantees about the future. Also some people may worry about impact in the future if they will work in another place.

This is why I suggest people talk to me if they have concerns.

boazbarak 23 Oct 2025 2:27 UTC
49 points
14
on: Which side of the AI safety community are you in?
I personally would not sign this statement because I disagree with it, but I encourage any OpenAI employee that wants to sign to do so. I do not believe they will suffer any harmful professional consequences. If you are at OpenAI and want to talk about this, feel free to slack me. You can also ask colleagues who signed the petition supporting SB1047 if they felt any pushback. As far as I know, no one did.

I agree that there is a need for thoughtful regulations for AI. The reason I personally would not sign this statement is because it is vague, hard to operationalize, and attempts to make it as a basis for laws will (in my opinion) lead to bad results.
There is no agreed upon definition of “superintelligence” let alone a definition of what it means to work on developing it as separate from developing AI in general. A “prohibition” is likely to lead to a number of bad outcomes. I believe that for AI to go well, transparency will be key. Companies or nations developing AI in secret is terrible for safety, and I believe this will be the likely outcome of any such prohibition.

My own opinions notwithstanding, other people are entitled to their own, and no one at OpenAI should feel intimidated from signing this statement.

boazbarak 21 Oct 2025 14:48 UTC
11 points
2
in reply to: Jacob_Hilton’s comment on: Jacob_Hilton’s Shortform
Oh right sorry I missed the derivation that among $n_{M} + n_{T}$ samples, the maximum is equally likely to be any of them and so the probability that the largest number from the model the largest of them is
$\frac{n_{M}}{n_{M} + n_{T}} = \frac{1}{1 + n_{T} / n_{M}} = \frac{1}{1 + e x p (- (log n_{M} - log n_{T}))}$
This model then predicts that models “ELO ratings” - $log n_{M}$ would grow linearly over time, which (based on this chart GPT5 gave me) I think corresponds roughly with the progress in chess from 2007 onwards

boazbarak 21 Oct 2025 13:04 UTC
2 points
0
in reply to: Jacob_Hilton’s comment on: Jacob_Hilton’s Shortform
In figure 5 the X axis is log time horizon and not time horizon—does this fit with your model?

boazbarak 16 Oct 2025 16:20 UTC
22 points
2
on: boazbarak’s Shortform
Students are continuing to post lecture notes on the AI safety course, and I am posting videos on youtube. Students experiments are also posted with the lecture notes: I’ve been learning a lot from them!

boazbarak 3 Oct 2025 2:23 UTC
2 points
0
on: boazbarak’s Shortform
Wrote one long comment in my non review of IABIED as response to a bunch of other comments.

boazbarak 2 Oct 2025 15:00 UTC
8 points
2
on: A non-review of “If Anyone Builds It, Everyone Dies”
Thanks to everyone who commented! Since there are too many comments for me to respond to all, let me try to summarize here where I disagree with the binay “before vs. after” of EY & NS. (For a very high level of the “continuous” point of view, see OpenAI’s blog post.) As I wrote, I also disagree with the “grown” vs “crafted” as a hard binary dichotomy, but won’t focus on this in this comment.
The way I see it, this framework makes the following assumptions, which I do not believe are currently well supported:
Singular takeover event:
The assumption is that all that matters is a singular time where the AI “takes over.” Even the notion of “take over” is not well defined. For example, is “taking over” the united states enough? I would imagine so. Is “taking over” north Korea enough? Maybe also—DPRK has already been taken over by a hostile entity but most countries in the world are not eager to risk a nuclear confrontation to remedy this. Is “taking over” some company and growing over time in its power also enough, maybe so?

In reality I think there is going to be a gradual increase both in capabilities of AI and in its integration into society and amount of control it is handed over critical systems. There are still much room to grow in both dimensions: both capabilities are still very far from working autonomously at typical human level, let alone superhuman, and the integration in society is still its infancy. EY&NS make the point that to some extent you could trade lack of power for intelligence—e.g. if you are not already in charge of the power grid, you can hack into it—but there is also a lot of friction in such an exchange.

It is unclear why, if AI systems have a propensity for acting covertly in pursuit on their own goals, we would not see this propensity materializes in harmful ways of growing magnitude well before they are capable of taking over the world. It seems that the underlying assumption is their ability to be perfect at hiding their intention and “lying in wait”, but current AI systems are not perfect at anything.
Treating “AI” as a singular entity:
EY&NS essentially treat AI as a singular entity that waits until it is powerful enough to strike. Part of treating it as a single entity means that they don’t model humans as being augmented with AI (or they treat these AIs as insignificant since they are not ASI). In reality there would likely be many AI systems from different vendors with varying capabilities. There may be some degree of collusion and/or affinity between different systems, but to the extent this is an issue I believe it can be measured and tracked over time. However, the EY&NS requires AIs to essentially view themselves as one unit. If an AI system is already in control of a decent-size company and could pursure its goals, the EY&NS model is that it will still not do that, but continue pretending to be perfectly aligned, so that its successor would be able to take over the world.
This is also somewhat related to the “grown” vs “crafted” issue. AI systems today sometimes scheme, hack, and lie. But why they do that is actually not so mysterious as EY&NS make it to be. Often we can trace back how certain aspects in training—e.g. rewarding models for user preferences, or for passing coding tests—will give rise to these bad behaviors. This is not good and we need to fix that, but it’s not some arbitrary behavior either.

Recursive self improvement

EY&NS don’t talk about this enough, but I think the only potentially true story for an actual singularity is via recursive self improvement (RSI). That would be the real point where there is a singularity rather than “take over” which not well defined.

One way to think about this is that RSI happens when a model can completely autonomously train its successor. But for true RSI it should be the case that if it took $N$ flops to train model n, then it would take $c \cdot N$ for $c < 1$ flops to train model n+1 that is more intelligent than model n, and so on and so forth. (And even such an improvement chain would take non trivial amount of time—e.g., if it took 8 months to train model n, then even in the optimistic setting of $c = 1 / 2$ maybe it would take 4 months to train model n+1, and 2 months to train model n+1, etc.. which does converge, but it’s also not happening in split seconds either.)

I think we will see if we are headed in this direction, but right now it does not seems this way. First, there are multiple “doublings” that need to happen before we reach the “train your successor” phase. (See the screenshot below.) Second, to a first approximation, our current paradigm in AI is:

power --> compute --> intelligence

There is certainly a lot of room to improve both of these but:

1. Radically improving the power --> compute efficiency will likely require building new hardware, datacenters, etc.. that takes time.

2. For improving the compute --> intelligence , note that even the most significant ideas, like transformers, were mostly about improving utilization of existing compute—so it was not so much about creating more intelligence per FLOP but about being able to use more of the FLOPs of the existing hardware of many GPUs. So these are also tied to existing hardware. There is definitely some room in increasing utilization of existing compute, but there is a limit to how many OOMs you can get this way before you need to design new hardware. There is also room for improving intelligence/FLOP without it, but I don’t think we have evidence that there is huge number of OOMs that can be saved.

Given the costs involved and huge incentive to save on both 1 and 2, I expect to continue to see improvements in both directions, including improvements that are using AIs, but I expect these will be gradual and help “maintain the exponential.”

Sum up

It is possible that over the next few years, new evidence will emerge that points out more to the EY&NS point of view. Maybe we will see a certain “threshold” crossed where AI systems behave in a completely strategic way. Or maybe we would see some evidence for a completely new paradigm of getting arbitrary levels of intelligence without needing more compute. Even in the “gradual” point of view, it does not mean we will be safe. Perhaps we will see tendencies such as collusion, scheming, deception increasing with model capabilities, with alignment methods unable to keep up. Perhaps as AIs will be deployed in the world, we will see catastrophes of continually growing magnitudes with signs pointing out to safety getting worse, but for some reason or another, humanity would not be able to get its act together. I am cautiously optimistic at the moment (e.g., on Monday Anthropic released Claude Sonnet 4.5 that they claimed—and I think with good reason—that it was both the most capable model and the most aligned model that they ever released). But I think there is till much that we don’t know and still a lot of work to be done.

A screenshot from the first lecture in my AI class.
What links here?
- boazbarak's comment on boazbarak’s Shortform by boazbarak (3 Oct 2025 2:23 UTC; 2 points)

boazbarak 30 Sep 2025 14:00 UTC
12 points
0
in reply to: Raemon’s comment on: A non-review of “If Anyone Builds It, Everyone Dies”
It is interesting that the analogy here is to evaluating your co worker Bob on being the mock dictator of your town and then deploying them as the actual dictator of the country.

In reality in AI there would be multiple generations of different “Bobs” (although it is possible—and needs to be studied—that they would have affinity for one another and try to communicate or influence each other via hidden means). These “bobs” will be growing in both capability but also actually deployed in the world with real-world consequences. Our current “Bobs” are starting to be deployed as coding and personal assistant for short term well-defined coding tasks, but these will grow from minutes, to hours to days to months and more. “Bob” will start doing work of an intern, then a junior worker, then more senior one, managing teams of other “Bobs”. To continue the analogy, maybe “Bob” will be the actual (non mock) mayor of a town, then a province, and maybe then rule over a whole country. But all the while all the Bobs would keep seeming perfectly helpful. (Getting 100% success in the task of hiding their true intentions, even though they are not yet good enough to get 100% success in other tasks) until the day when (by design or by accident) we make the Bobs dictators of the whole world.

I am not saying such a scenario is logically impossible. It just seems highly unlikely to me. To be clear, the part that seems unlikely is not that AI will be eventually so powerful and integrated in our systems, that it could cause catastrophic outcomes if it behaved in an arbitrarily malicious way. The part I find unlikely is that we would not be able to see multiple failures along the way that are growing in magnitude. Of course it is also possible that we will “explain away” these failures and still end up in a very bad place. I just think that it wouldn’t be the case that we had one shot but we missed it, but rather had many shots and missed them all. This is the reason why we (alignment researchers at various labs, universities, non profits) are studying questions such as scheming, colluding, situational awareness, as well as studying methods for alignment and monitoring. We are constantly learning and updating based on what we find out.

I am wondering if there is any empirical evidence from current AIs that would modify your / @Eliezer Yudkowsky ’s expectations of how likely this scenario is to materialize.

boazbarak 29 Sep 2025 23:56 UTC
5 points
1
in reply to: ryan_greenblatt’s comment on: A non-review of “If Anyone Builds It, Everyone Dies”
Hi Ryan, will be brief but generally:
1. I agree that scheming and collusion are some of the more difficult settings to study, also understanding the impact of situational awareness on evaluations.
2. I still think it is possible to study these in current and upcoming models, and get useful insights. It may well be that these insights will be that the problems are becoming worse with scale and we don’t have good solutions for them yet..

boazbarak 29 Sep 2025 14:40 UTC
16 points
2
in reply to: Aaron_Scher’s comment on: A non-review of “If Anyone Builds It, Everyone Dies”
I think it’s more like we have problems A_1, A_2, A_3, ….. and we are trying to generalize from A_1 ,...., A_n to A_{n+1}.
We are not going to go from jailbreaking the models to give a meth recipe to taking over the world. We are constantly deploying AIs in more and more settings, with time horizons and autonomy that are continuously growing. There isn’t one “Game Day.” Models are already out in the field right now, and both their capabilities as well as the scope that they are deployed in is growing all the time.

So my mental model is there is a sequence of models M_1,M_2,.… of growing capabilities with no clear one point where we reach AGI or ASI but more of a continuum. (Also models might come from different families or providers and have somewhat incomparable capabilities.)

Now suppose you have such a sequence of models M_1,M_2,..… of growing capabilities. I don’t think it would be the case that model M_n develops the propensity to act covertly and pursue its own goals, but the only goal it cares about is taking over the world, and also identifies with future models, and so it decides to “lie in wait” until generation M_{n+k} where it would act on that.

I think that if the propensity to act covertly and pursue misaligned goals would change continuously between generations of models, and it may grow, stay the same, or shrink, but in any case it will be possible to observe it well before we reach ASI.

Regarding your second question of whether AIs would be powerful enough to take over the world at some point:

My assumption is that AIs will grow in capabilities and integration in the world economy. If progress continues on the current trajectory then there would be a point where a variety of AI models are deeply integrated in our infrastructure. My hope (and what I and other alignment and safety researchers are working on) is that by then we would have strong ways to measure, monitor, and predict the envelope of potential risks for these models.

I am not sure it would make sense to think about these models as a singular entity but I agree that at the point we reach such deep integration and reliance, if all of these models were to suddenly and simultaneously act maliciously then they would be successful in causing an arbitrary amount of damage, quite possibly up to an extinction level event.

boazbarak 29 Sep 2025 14:21 UTC
4 points
0
in reply to: Simon Lermen’s comment on: A non-review of “If Anyone Builds It, Everyone Dies”
I also believe that our current alignment methods will not scale and that we need to develop new ones. In particular I am a co author of the scheming paper mentioned in the first link you say.

As I said multiple times, I don’t think we will succeed by default. I just think that if we fail we will do so multiple times with failures continually growing in magnitude and impact.

boazbarak 29 Sep 2025 12:36 UTC
2 points
0
in reply to: Aaron_Scher’s comment on: A non-review of “If Anyone Builds It, Everyone Dies”
It seems that in P[success] you are assuming that AI only cares about escaping and taking over the world. I believe that if we fail at alignment and our systems will have the propensity to pursue covert goals then we will be able to observe and track it. Indeed, we have a paper (see also my twitter thread) where we do observe some propensity, and tackling scheming is a priority for us at OpenAI. If we fail, then I believe that unfortunately we will see bad consequences of deploying such models in the real world well before they are capable of taking over.
Also it seems that you are assuming that the AI is a singular entity. If there are N copies of the AIs out there (not to mention if there are K different companies releasing AIs of simialr capabilities), even if all of them mak a correct expected utility calculation on whether to escape, some might be detected. (Unless the utility of being detected is negative infinity but in this case they would never attempt exfiltration.)

boazbarak 29 Sep 2025 12:26 UTC
9 points
0
in reply to: Daniel Kokotajlo’s comment on: A non-review of “If Anyone Builds It, Everyone Dies”
I am also short in time, but re AI 2027. There are some important points I agree with, which is why I wrote in Machines of Faithful Obedience that I think the scenario where there is no competition and only internal deployment is risky.

I mostly think that the timelines were too aggressive and that we are more likely to continue on the METR path than explode, as well as multiple companies training and releasing models at a fast cadence. So it’s more like “Agent-X-n” (for various companies X and some large n) than “Agent 4“ and the difference between “Agent-X-n” and “Agent-X-n+1” will not be as dramatic.

Also, if we do our job right, Agent-X-n+1 will be more aligned than Agent-X-n.

boazbarak 29 Sep 2025 12:20 UTC
0 points
−28
in reply to: Simon Lermen’s comment on: A non-review of “If Anyone Builds It, Everyone Dies”
Note that this is somewhat of an anti-empirical stance—by hypothesizing that superintelligence will arrive by some unknown breakthrough that would both take advantage of current capabilities and render current alignment methods moot—you are essentially saying that no evidence can update you.

boazbarak 29 Sep 2025 12:18 UTC
8 points
−7
in reply to: Aaron_Scher’s comment on: A non-review of “If Anyone Builds It, Everyone Dies”
Treating “takeover” as a single event brushes a lot under the carpet.

There are a number of capabilities involved—cybersecurity, bioweapon, etc.. - that models are likely to develop at different stages. I agree AI will ultimately far surpass our 2025 capabilities in all these areas. Whether that would be enough to take over the world at that point in time is a different questoin.

Then there are propensities. Taking over requires the model to have the propensity to “resist our attempts to change its goal” as well to act covertly in pursuit of its own objectives, which are not the ones it was instructed. (I think these days we are not really thinking models are going to misunderstand their instructions in a “monkey’s paws” style.)

If we do our job right in alignment, we would be able to drive these propensities down to zero.
But if we fail, I believe these propensities will grow over time, and as we iteratively deploy AI systems with growing capabilities, even if we fail to observe these issues in the lab, we will observe them in the real world well before the scale of killing everyone.

There are a lot of bad things that AIs can do before literally taking over the world. I think there is another binary assumption which is that AIs utility function is binary—somehow the expected value calculations work out such that we get no signal until the takeover.

Re my comment on the 16 hour 200K GPU run. I agree that things can be different at scale and it is important to keep measuring them as scale increases. What I meant is that even when things get worse with scale we would be able to observe it. But the exampe of the book—as I understood it—was not a “scale up.” Scale up is when you do a completely new training run, in the book that run was just some “cherry on top”—one extra gradient step—which presumably was minor in terms of compute compared to all that came before it. I don’t think one step will make the model suddenly misaligned. (Unless it completely borks it, which would be very observable.)

boazbarak 29 Sep 2025 2:25 UTC
15 points
1
in reply to: Daniel Kokotajlo’s comment on: A non-review of “If Anyone Builds It, Everyone Dies”
Thank you Daniel. I’m generally a fan of as much transparency as possible. In my research (and in general) I try to be non dogmatic and so if you believe that there are aspects I am wrong about, then I’d love to hear about them. (Especially if those can be empirically tested.)

boazbarak 29 Sep 2025 0:13 UTC
7 points
0
in reply to: the gears to ascension’s comment on: A non-review of “If Anyone Builds It, Everyone Dies”
I am not sure I 100% understand what you are saying. Again, like I wrote elsewhere, it is possible that for one reason or another rather than systems becoming safer and more controlled, they will become less safe and riskier over time. It is possible we will have a sequence of failures growing in magnitude over time, but for one reason or another do not address them, and hence since end up in a very large scale catastrophe.

It is possible that current approaches are not good enough and will not improve fast enough to match the stakes at which we want to deploy AI. If that is the case then it will end badly, but I believe that we will see many bad outcomes well before an extinction event. To put it crudely, I would expect that if we are on a path to that ending, the magnitude of harms that will be caused by AI will climb on an exponential scale over time similar to how other capabilities are growing.

boazbarak 29 Sep 2025 0:04 UTC
16 points
7
on: A non-review of “If Anyone Builds It, Everyone Dies”
Wrote the following twitter and thought I would share here:
My biggest disagreement with Yudkowsky and Soares is that I believe we will have many shots of getting AI safety right well before the consequences are world ending.
However humanity is still perfectly capable of blowing all its shots.
Just to make sure no one gets the impression that I think AI could not have catastrophic consequences or that it will be safe by default. However, the continuous worldview also implies very different approaches for policies than the essentially total AI development ban proposed in the book.

boazbarak 28 Sep 2025 23:17 UTC
3 points
1
in reply to: boazbarak’s comment on: A non-review of “If Anyone Builds It, Everyone Dies”
Also I don’t think sense of “self” is a singular event either, indeed already today’s systems are growing in their situational awareness which can be thought as some sense of self. See our scheming paper https://www.antischeming.ai/