Epistemic status: I don’t fully endorse all this, but I think it’s a pretty major mistake to not at least have a model like this sandboxed in one’s head and check it regularly.
Full-cynical model of the AI safety ecosystem right now:
There’s OpenAI, which is pretending that it’s going to have full AGI Any Day Now, and relies on that narrative to keep the investor cash flowing in while they burn billions every year, losing money on every customer and developing a product with no moat. They’re mostly a hype machine, gaming metrics and cherry-picking anything they can to pretend their products are getting better. The underlying reality is that their core products have mostly stagnated for over a year. In short: they’re faking being close to AGI.
Then there’s the AI regulation activists and lobbyists. They lobby and protest and stuff, pretending like they’re pushing for regulations on AI, but really they’re mostly networking and trying to improve their social status with DC People. Even if they do manage to pass any regulations on AI, those will also be mostly fake, because (a) these people are generally not getting deep into the bureaucracy which would actually implement any regulations, and (b) the regulatory targets themselves are aimed at things which seem easy to target (e.g. training FLOP limitations) rather than actually stopping advanced AI. The activists and lobbyists are nominally enemies of OpenAI, but in practice they all benefit from pushing the same narrative, and benefit from pretending that everyone involved isn’t faking everything all the time.
Then there’s a significant contingent of academics who pretend to produce technical research on AI safety, but in fact mostly view their job as producing technical propaganda for the regulation activists and lobbyists. (Central example: Dan Hendrycks, who is the one person I directly name mainly because I expect he thinks of himself as a propagandist and will not be particularly offended by that description.) They also push the narrative, and benefit from it. They’re all busy bullshitting research. Some of them are quite competent propagandists though.
There’s another significant contingent of researchers (some at the labs, some independent, some academic) who aren’t really propagandists, but mostly follow the twitter-memetic incentive gradient in choosing their research. This tends to generate paper titles which sound dramatic, but usually provide pretty little conclusive evidence of anything interesting upon reading the details, and very much feed the narrative. This is the main domain of Not Measuring What You Think You Are Measuring and Symbol/Referent Confusions.
Then of course there’s the many theorists who like to build neat toy models which are completely toy and will predictably not generalize useful to real-world AI applications. This is the main domain of Ad-Hoc Mathematical Definitions, the theorists’ analogue of Not Measuring What You Think You Are Measuring.
Benchmarks. When it sounds like a benchmark measures something reasonably challenging, it nearly-always turns out that it’s not really measuring the challenging thing, and the actual questions/tasks are much easier than the pitch would suggest. (Central examples: software eng, GPQA, frontier math.) Also it always turns out that the LLMs’ supposedly-impressive achievement relied much more on memorization of very similar content on the internet than the benchmark designers expected.
Then there’s a whole crowd of people who feel real scared about AI (whether for good reasons or because they bought the Narrative pushed by all the people above). They mostly want to feel seen and validated in their panic. They have discussions and meetups and stuff where they fake doing anything useful about the problem, while in fact they mostly just emotionally vibe with each other. This is a nontrivial chunk of LessWrong content, as e.g. Val correctly-but-antihelpfully pointed out. It’s also the primary motivation behind lots of “strategy” work, like e.g. surveying AI researchers about their doom probabilities, or doing timeline forecasts/models.
… and of course none of that means that LLMs won’t reach supercritical self-improvement, or that AI won’t kill us, or [...]. Indeed, absent the very real risk of extinction, I’d ignore all this fakery and go about my business elsewhere. I wouldn’t be happy about it, but it wouldn’t bother me any more than all the (many) other basically-fake fields out there.
Man, I really just wish everything wasn’t fake all the time.
What makes you confident that AI progress has stagnated at OpenAI? If you don’t have the time to explain why I understand, but what metrics over the past year have stagnated?
Chris Olah and Dan Murfet in the at-least-partially empirical domain. Myself in the theory domain, though I expect most people (including theorists) would not know what to look for to distinguish fake from non-fake theory work. In the policy domain, I have heard that Microsoft’s lobbying team does quite non-fake work (though not necessarily in a good direction). In the capabilities domain, DeepMind’s projects on everything except LLMs (like e.g. protein folding, or that fast matrix multiplication paper) seem consistently non-fake, even if they’re less immediately valuable than they might seem at first glance. Also Conjecture seems unusually good at sticking to reality across multiple domains.
The entire field is based on fears that consequentialism provides an extremely powerful but difficult-to-align method of converting intelligence into agency. This is basically wrong. Yes, people attempt to justify it with coherence theorems, but obviously you can be approximately-coherent/approximately-consequentialist and yet still completely un-agentic, so this justification falls flat. Since the field is based on a wrong assumption with bogus justification, it’s all fake.
(IMO this is kinda unrelated to the OP, but I want to continue this thread.)
Have you elaborated on this anywhere?
Perhaps you missed it, but some guy in 2022 wrote this great post which claimed that “Consequentialism, broadly defined, is a general and useful way to develop capabilities.” ;-)
I’m actually just in the course of writing something about why “consequentialism provides an extremely powerful but difficult-to-align method of converting intelligence into agency” … maybe I can send you the draft for criticism when it’s ready?
(IMO this is kinda unrelated to the OP, but I want to continue this thread.)
I think it’s quite related to the OP. If a field is founded on a wrong assumption, then people only end up working in the field if they have some sort of blind spot, and that blind spot leads to their work being fake.
Have you elaborated on this anywhere?
Not hugely. One tricky bit is that it basically ends up boiling down to “the original arguments don’t hold up if you think about them”, but the exact way they don’t hold up depends on what the argument is, so it’s kind of hard to respond to in general.
Perhaps you missed it, but some guy in 2022 wrote this great post which claimed that “Consequentialism, broadly defined, is a general and useful way to develop capabilities.” ;-)
Haha! I think I mostly still stand by the post. In particular, “Consequentialism, broadly defined, is a general and useful way to develop capabilities.” remains true; it’s just that intelligence relies on patterns and thus works much better on common things (which must be small, because they are fragments of a finite world), than on rare things (which can be big, though don’t have to). This means that consequentialism isn’t very good at developing powerful capabilities unless it works in an environment that has already been highly filtered to be highly homogenous, because an inhomogenous environment is going to BTFO the intelligence.
(I’m not sure I stand 101% by my post; there’s some funky business about how to count evolution that I still haven’t settled on yet. And I was too quick to go from “imitation learning isn’t going to lead to far-superhuman abilities” to “consequentialism is the road to far-superhuman abilities”. But yeah I’m actually surprised at how well I stand by my old view despite my massive recent updates.)
I’m actually just in the course of writing something about why “consequentialism provides an extremely powerful but difficult-to-align method of converting intelligence into agency” … maybe I can send you the draft for criticism when it’s ready?
I think you’re conflating consequentialism and understanding in a weird-to-me way. (Or maybe I’m misunderstanding.)
I think consequentialism is related to choosing one action versus another action. I think understanding (e.g. predicting the consequence of an action) is different, and that in practice understanding has to involve self-supervised learning.
(I think human brains have both [partly-] consequentialist decisions and self-supervised updating of the world-model.) (They’re not totally independent, but rather they interact via training data: e.g. [partly-] consequentialist decision-making determines how you move your eyes, and then whatever your eyes are pointing at, your model of the visual world will then update by self-supervised learning on that particular data. But still, these are two systems that interact, not the same thing.)
I think self-supervised learning is perfectly capable of discovering rare but important patterns. Just look at today’s foundation models, which seem pretty great at that.
I think self-supervised learning is perfectly capable of discovering rare but important patterns. Just look at today’s foundation models, which seem pretty great at that.
This I’d dispute. If your model if underparameterized (which I think is true for the typical model?), then it can’t learn any patterns that only occurs once in the data. And even if the model is overparameterized, it still can’t learn any pattern that never occurs in the data.
I think you’re conflating consequentialism and understanding in a weird-to-me way. (Or maybe I’m misunderstanding.)
I think consequentialism is related to choosing one action versus another action. I think understanding (e.g. predicting the consequence of an action) is different, and that in practice understanding has to involve self-supervised learning.
I’m saying that intelligence is the thing that allows you to handle patterns. So if you’ve got a dataset, intelligence allows you to build a model that makes predictions for other data based on the patterns it can find in said dataset. And if you have a function, intelligence allows you to find optima for said function based on the patterns it can find in said function.
Consequentialism is a way to set up intelligence to be agent-ish. This often involves setting up something that’s meant to build an understanding of actions based on data or experience.
One could in principle cut my definition of consequentialism up into self-supervised learning and true consequentialism (this seems like what you are doing..?). One disadvantage with that is that consequentialist online learning is going to have a very big effect on the dataset one ends up training the understanding on, so they’re not really independent of each other. Either way that just seems like a small labelling thing to me.
If your model if underparameterized (which I think is true for the typical model?), then it can’t learn any patterns that only occurs once in the data. And even if the model is overparameterized, it still can’t learn any pattern that never occurs in the data.
Dunno if anything’s changed since 2023, but this says LLMs learn things they’ve seen exactly once in the data.
I can vouch that you can ask LLMs about things that are extraordinarily rare in the training data—I’d assume well under once per billion tokens—and they do pretty well. E.g. they know lots of random street names.
Humans successfully went to the moon, despite it being a quite different environment that they had never been in before. And they didn’t do that with “durability, strength, healing, intuition, tradition”, but rather with intelligence.
Speaking of which, one can apply intelligence towards the problem of being resilient to unknown unknowns, and one would come up with ideas like durability, healing, learning from strategies that have stood the test of time (when available), margins of error, backup systems, etc.
Speaking of which, one can apply intelligence towards the problem of being resilient to unknown unknowns,
I guess to add, I’m not talking about unknown unknowns. Often the rare important things are very well known (after all, they are important, so people put a lot of effort into knowing them), they just can’t efficiently be derived from empirical data (except essentially by copying someone else’s conclusion blindly, and that leaves you vulnerable to deception).
Dunno if anything’s changed since 2023, but this says LLMs learn things they’ve seen exactly once in the data.
I don’t have time to read this study in detail until later today, but if I’m understanding it correctly, the study isn’t claiming that neural networks will learn rare important patterns in the data, but rather that they will learn rare patterns that they were recently trained on. So if you continually train on data, you will see a gradual shift towards new patterns and forgetting old ones.
I can vouch that you can ask LLMs about things that are extraordinarily rare in the training data—I’d assume well under once per billion tokens—and they do pretty well. E.g. they know lots of random street names.
Random street names aren’t necessarily important though? Like what would you do with them?
Humans successfully went to the moon, despite it being a quite different environment that they had never been in before. And they didn’t do that with “durability, strength, healing, intuition, tradition”, but rather with intelligence.
I didn’t say that intelligence can’t handle different environments, I said it can’t handle heterogenous environments. The moon is nearly a sterile sphere in a vacuum; this is very homogenous, to the point where pretty much all of the relevant patterns can be found or created on Earth. It would have been more impressive if e.g. the USA could’ve landed a rocket with a team of Americans in Moscow than on the moon.
Also people did use durability, strength, healing, intuition and tradition to go the moon. Like with strength, someone had to build the rockets (or build the machines which built the rockets). And without durability and healing, they would have been damaged too much in the process of doing that. Intuition and healing are harder to clearly attribute, but they’re part of it too.
Speaking of which, one can apply intelligence towards the problem of being resilient to unknown unknowns, and one would come up with ideas like durability, healing, learning from strategies that have stood the test of time (when available), margins of error, backup systems, etc.
Learning from strategies that stood the test of time would be tradition moreso than intelligence. I think tradition requires intelligence, but it also requires something else that’s less clear (and possibly not simple enough to be assembled manually, idk).
Margins of error and backup systems would be, idk, caution? Which, yes, definitely benefit from intelligence and consequentialism. Like I’m not saying intelligence and consequentialism are useless, in fact I agree that they are some of the most commonly useful things due to the frequent need to bypass common obstacles.
Learning from strategies that stood the test of time would be tradition moreso than intelligence. I think tradition requires intelligence, but it also requires something else that’s less clear (and possibly not simple enough to be assembled manually, idk).
Right, that’s what I was gonna say. You need intelligence to sort out which traditions should be copied and which ones shouldn’t. There was a 13-billion-year “tradition” of not building e-commerce megastores, but Jeff Bezos ignored that “tradition”, and it worked out very well for him (and I’m happy about it too). Likewise, the Wright Brothers explicitly followed the “tradition” of how birds soar, but not the “tradition” of how birds flap their wings.
I do think there’s a “something else” (most [but not all] humans have an innate drive to follow and enforce social norms, more or less), but I don’t think it’s necessary. The Wright Brothers didn’t have any innate drive to copy anything about bird soaring tradition, but they did it anyway purely by intelligence.
Random street names aren’t necessarily important though?
I feel like I’ve lost the plot here. If you think there are things that are very important, but rare in the training data, and that LLMs consequently fail to learn, can you give an example?
Often the rare important things are very well known (after all, they are important, so people put a lot of effort into knowing them), they just can’t efficiently be derived from empirical data (except essentially by copying someone else’s conclusion blindly, and that leaves you vulnerable to deception).
I guess you’re using “empirical data” in a narrow sense. If Joe tells me X, I have gained “empirical data” that Joe told me X. And then I can apply my intelligence to interpret that “data”. For example, I can consider a number of hypotheses: the hypothesis that Joe is correct and honest, that Joe is mistaken but honest, that Joe is trying to deceive me, that Joe said Y but I misheard him, etc. And then I can gather or recall additional evidence that favors one of those hypotheses over another. I could ask Joe to repeat himself, to address the “I misheard him” hypothesis. I could consider how often I have found Joe to be mistaken about similar things in the past. I could ask myself whether Joe would benefit from deceiving me. Etc.
This is all the same process that I might apply to other kinds of “empirical data” like if my car was making a funny sound. I.e., consider possible generative hypotheses that would match the data, then try to narrow down via additional observations, and/or remain uncertain and prepare for multiple possibilities when I can’t figure it out. This is a middle road between “trusting people blindly” versus “ignoring everything that anyone tells you”, and it’s what reasonable people actually do. Doing that is just intelligence, not any particular innate human tendency—smart autistic people and smart allistic people and smart callous sociopaths etc. are all equally capable of traveling this middle road, i.e. applying intelligence towards the problem of learning things from what other people say.
(For example, if I was having this conversation with almost anyone else, I would have quit, or not participated in the first place. But I happen to have prior knowledge that you-in-particular have unusual and well-thought-through ideas, and even they’re wrong, they’re often wrong in very unusual and interesting ways, and that you don’t tend to troll, etc.)
I feel like I’m misunderstanding you somehow. You keep saying things that (to me) seem like you could equally well argue that humans cannot possibly survive in the modern world, but here we are. Do you have some positive theory of how humans survive and thrive in (and indeed create) historically-unprecedented heterogeneous environments?
Right, that’s what I was gonna say. You need intelligence to sort out which traditions should be copied and which ones shouldn’t.
I think the necessity of intelligence for tradition exists on a much more fundamental level than that. Intelligence allows people to from an extremely rich model of the world with tons of different concepts. If one had no intelligence at all, one wouldn’t even be able to copy the traditions. Like consider a collection of rocks or a forest; it can’t pass any tradition onto itself.
But conversely, just as intelligence cannot be converted into powerful agency, I don’t think it can be used to determine which traditions should be copied and which ones shouldn’t.
There was a 13-billion-year “tradition” of not building e-commerce megastores, but Jeff Bezos ignored that “tradition”, and it worked out very well for him (and I’m happy about it too). Likewise, the Wright Brothers explicitly followed the “tradition” of how birds soar, but not the “tradition” of how birds flap their wings.
It seems to me that you are treating any variable attribute that’s highly correlated across generations as a “tradition”, to the point where not doing something is considered on the same ontological level as doing something. That is the sort of ontology that my LDSL series is opposed to.
I’m probably not the best person to make the case for tradition as (despite my critique of intelligence) I’m still a relatively strong believer in equillibration and reinvention.
I feel like I’ve lost the plot here. If you think there are things that are very important, but rare in the training data, and that LLMs consequently fail to learn, can you give an example?
Whenever there’s any example of this that’s too embarrassing or too big of an obstacle for applying them in a wide range of practical applications, a bunch of people point it out, and they come up with a fix that allows the LLMs to learn it.
The biggest class of relevant examples would all be things that never occur in the training data—e.g. things from my job, innovations like how to build a good fusion reactor, social relationships between the world’s elites, etc.. Though I expect you feel like these would be “cheating”, because it doesn’t have a chance to learn them?
The things in question often aren’t things that most humans have a chance to learn, or even would benefit from learning. Often it’s enough if just 1 person realizes and handles them, and alternately often if nobody handles them then you just lose whatever was dependent on them. Intelligence is a universal way to catch on to common patterns; other things than common patterns matter too, but there’s no corresponding universal solution.
I guess you’re using “empirical data” in a narrow sense. If Joe tells me X, I have gained “empirical data” that Joe told me X. And then I can apply my intelligence to interpret that “data”. For example, I can consider a number of hypotheses: the hypothesis that Joe is correct and honest, that Joe is mistaken but honest, that Joe is trying to deceive me, that Joe said Y but I misheard him, etc. And then I can gather or recall additional evidence that favors one of those hypotheses over another. I could ask Joe to repeat himself, to address the “I misheard him” hypothesis. I could consider how often I have found Joe to be mistaken about similar things in the past. I could ask myself whether Joe would benefit from deceiving me. Etc.
This is all the same process that I might apply to other kinds of “empirical data” like if my car was making a funny sound. I.e., consider possible generative hypotheses that would match the data, then try to narrow down via additional observations, and/or remain uncertain and prepare for multiple possibilities when I can’t figure it out. This is a middle road between “trusting people blindly” versus “ignoring everything that anyone tells you”, and it’s what reasonable people actually do. Doing that is just intelligence, not any particular innate human tendency—smart autistic people and smart allistic people and smart callous sociopaths etc. are all equally capable of traveling this middle road, i.e. applying intelligence towards the problem of learning things from what other people say.
(For example, if I was having this conversation with almost anyone else, I would have quit, or not participated in the first place. But I happen to have prior knowledge that you-in-particular have unusual and well-thought-through ideas, and even they’re wrong, they’re often wrong in very unusual and interesting ways, and that you don’t tend to troll, etc.)
You ran way deeper into the “except essentially by copying someone else’s conclusion blindly, and that leaves you vulnerable to deception” point than I meant you to. My main point is that humans have grounding on important factors that we’ve acquired through non-intelligence-based means. I bring up the possibility of copying other’s conclusions because for many of those factors, LLMs still have access to this via copying them.
It might be helpful to imagine what it would look like if LLMs couldn’t copy human insights. For instance, imagine if there was a planet with life much like Earth’s, but with no species that were capable of language. We could imagine setting up a bunch of cameras or other sensors on the planet and training a self-supervised learning algorithm on them. They could surely learn a lot about the world that way—but it also seems like they would struggle with a lot of things. The exact things they would struggle with might depend a lot on how much prior your build into the algorithm, and how dynamic the sensors are, and whether there’s also ways for it to perform interventions upon the planet. But for instance even recognizing the continuity of animal lives as they wander off the screen would either require a lot of prior knowledge built in to the algorithm, or a very powerful learning algorithm (e.g. Solomonoff induction can use a simplicity prior to infer that there must be an entire planet full of animals off-screen, but that’s computationally intractable).
(Also, again you still need to distinguish between “Is intelligence a useful tool for bridging lots of common gaps that other methods cannot handle?” vs “Is intelligence sufficient on its own to detect deception?”. My claim is that the the answer to the former is yes and the latter is no. To detect deception, you don’t just use intelligence but also other facets of human agency.)
I feel like I’m misunderstanding you somehow. You keep saying things that (to me) seem like you could equally well argue that humans cannot possibly survive in the modern world, but here we are. Do you have some positive theory of how humans survive and thrive in (and indeed create) historically-unprecedented heterogeneous environments?
First, some things that might seem like nitpicks but are moderately important to my position:
In many ways, our modern world is much less heterogeneous than the past. For instance thanks to improved hygeine, we are exposed to far fewer diseases, and thanks to improved policing/forensics, we are exposed to much less violent crime. International trade allows us to average away troubles with crop failures. While distribution shifts generically should make it harder for humans to survive, they can (especially if made by humans) make it easier to survive.
Humans do not in fact survive; our average lifespan is less than 100 years. Humanity as a species survives by birthing, nurturing, and teaching children, and by collaborating with each other. My guess would be that aging is driven to a substantial extent by heterogeneity (albeit perhaps endogenous heterogeneity?) that hasn’t been protected against. (I’m aware of John Wentworth’s ‘gears of aging’ series arguing that aging has a common cause, but I’ve come to think that his arguments don’t sufficiently much distinguish between ‘is eventually mediated by a common cause’ vs ‘is ultimately caused by a common cause’. By analogy, computer slowdowns may be said to be attributable to a small number of causes like CPU exhaustion, RAM exhaustion, network bandwidth exhaustion, etc., but these are mediators and the root causes will typically be some particular program that is using up those resources, and there’s a huge number of programs in the world which could be to blame depending on the case.)
We actually sort of are in a precarious situation? The world wars were unprecedentedly bloody. They basically ended because of the invention of nukes, which are so destructive that we avoid using them in war. But I don’t think we actually have a robust way to avoid that?
But more fundamentally, my objection to this question is that I doubt the meaningfulness of a positive theory of how humans survive and thrive. “Intelligence” and “consequentialism” are fruitful explanations of certain things because they can be fairly-straightforwardly constructed, have fairly well-characterizable properties, and even can be fairly well-localized anatomically in humans (e.g. parts of the brain).
Like one can quibble with the details of what counts as intelligence vs understanding vs consequentialism, but under the model where intelligence is about the ability to make use of patterns, you can hand a bunch of data to computer scientists and tell them to get to work untangling the patterns, and then it turns out there are some fairly general algorithms that can work on all sorts of datasets and patterns. (I find it quite plausible that we’ve already “achieved superhuman intelligence” in the sense that if you give both me and a transformer a big dataset that neither of us are pre-familiar with to study through, then (at least for sufficiently much data) eventually the transformer will clearly outperform me at predicting the next token.) And probably these fairly general algorithms are probably more-or-less the same sort of thing that much of the human brain is doing.
Thus “intelligence” factors out relatively nicely as a concept that can be identified as a major contributor to human success (I think intelligence is the main reason humans outperformed other primates). But this does not mean that the rest of human success can equally well be factored out into a small number of nicely attributable and implementable concepts. (Like, some of it probably can, but there’s not as much reason to presume that all of it can. “Durability” and “strength” are examples of things that fairly well can, and indeed we have definitely achieved far-superhuman strength. These are purely physical though, whereas a lot of the important stuff has a strong cognitive element to it—though I suspect it’s not purely cognitive...)
OK, here’s my argument that, if you take {intelligence, understanding, consequentialism} as a unit, it’s sufficient for everything:
If durability and strength are helpful, then {intelligence, understanding, consequentialism} can discover that durability and strength are helpful, and then build durability and strength.
Even if “the exact ways in which durability and strength will be helpful” does not constitute a learnable pattern, “durability and strength will be helpful” is nevertheless a (higher-level) learnable pattern.
If some other evolved aspects of the brain and body are helpful, then {intelligence, understanding, consequentialism} can likewise discover that they are helpful, and build them.
After all, if ‘those things are helpful’ wasn’t a learnable pattern, then evolution would not have discovered and exploited that pattern!
If the number of such aspects is dozens or hundreds or thousands, then whatever, {intelligence, understanding, consequentialism} can still get to work systematically discovering them all. The recipe for a human is not infinitely complex.
If reducing heterogeneity is helpful, then {intelligence, understanding, consequentialism} can discover that fact, and figure out how to reduce heterogeneity.
Writing the part that I didn’t get around to yesterday:
You could theoretically imagine e.g. scanning all the atoms of a human body and then using this scan to assemble a new human body in their image. It’d be a massive technical challenge of course, because atoms don’t really sit still and let you look and position them. But with sufficient work, it seems like someone could figure it out.
This doesn’t really give you artificial general agency of the sort that standard Yudkowsky-style AI worries are about, because you can’t assign them a goal. You might get an Age of Em-adjacent situation from it, though even not quite that.
To reverse-engineer people in order to make AI, you’d instead want to identify separate faculties with interpretable effects and reconfigurable interface. This can be done for some of the human faculties because they are frequently applied to their full extent and because they are scaled up so much that the body had to anatomically separate them from everything else.
However, there’s just no reason to suppose that it should apply to all the important human faculties, and if one considers all the random extreme events one ends up having to deal with when performing tasks in an unhomogenized part of the world, there’s lots of reason to think humans are primarily adapted to those.
One way to think about the practical impact of AI is that it cannot really expand on its own, but that people will try to find or create sufficiently-homogenous places where AI can operate. The practical consequence of this is that there will be a direct correspondence between each part of the human work to prepare the AI to each part of the activities the AI is engaging in, which will (with caveats) eliminate alignment problems because the AI only does the sorts of things you explicitly make it able to do.
The above is similar to how we don’t worry so much about ‘website misalignment’ because generally there’s a direct correspondence between the behavior of the website and the underlying code, templates and database tables. This didn’t have to be true, in the sense that there are many short programs with behavior that’s not straightforwardly attributable to their source code and yet still in principle could be very influential, but we don’t know how to select good versions of such programs, so instead we go for the ones with a more direct correspondence, even though they are larger and possibly less useful. Similarly with AI, since consequentialism is so limited, people will manually build out some apps where AI can earn them a profit operating on homogenized stuff, and because this building-out directly corresponds to the effect of the apps, they will be alignable but not very independently agentic.
(The major caveat is people may use AI as a sort of weapon against others, and this might force others to use AI to defend themselves. This won’t lead to the traditional doom scenarios because they are too dependent on overestimating the power of consequentialism, but it may lead to other doom scenarios.)
After all, if ‘those things are helpful’ wasn’t a learnable pattern, then evolution would not have discovered and exploited that pattern!
I’ve grown undecided about whether to consider evolution a form of intelligence-powered consequentialism because in certain ways it’s much more powerful than individual intelligence (whether natural or artificial).
Individual intelligence mostly focuses on information that can be made use of over a very short time/space-scale. For instance an autoregressive model relates the immediate future to the immediate past. Meanwhile, evolution doesn’t meaningfully register anything shorter than the reproductive cycle, and is clearly capable of registering things across the entire lifespan and arguably longer than that (like, if you set your children up in an advantageous situation, then that continues paying fitness dividends even after you die).
Of course this is somewhat counterbalanced by the fact that evolution has much lower information bandwidth. Though from what I understand, people also massively underestimate evolution’s information bandwidth due to using an easy approximation (independent Bernoulli genotypes, linear short-tailed genotype-to-phenotype relationships and thus Gaussian phenotypes, quadratic fitness with independence between organisms). Whereas if you have a large number of different niches, then within each niche you can have the ordinary speed of evolution, and if you then have some sort of mixture niche, that niche can draw in organisms from each of the other niches and thus massively increase its genetic variance, and then since the speed of evolution is proportional to genetic variance, that makes this shared niche evolve way faster than normally. And if organisms then pass from the mixture niche out into the specialized niches, they can benefit from the fast evolution too.
(Mental picture to have in mind: we might distinguish niches like hunter, fisher, forager, farmer, herbalist, spinner, potter, bard, bandit, carpenter, trader, king, warlord (distinct from king in that kings gain power through expanding their family while warlords gain power by sniping the king off a kingdom), concubine, bureaucrat, … . Each of them used to be evolving individually, but also genes flowed between them in various ways. Though I suspect this is undercounting the number of niches because there’s also often subniches.)
And then obviously beyond these points, individual intelligence and evolution focus on different things—what’s happening recently vs what’s happened deep in the past. Neither are perfect; society has changed a lot, which renders what’s happened deep in the past less relevant than it could have been, but at the same time what’s happening recently (I argue) intrinsically struggles with rare, powerful factors.
If some other evolved aspects of the brain and body are helpful, then {intelligence, understanding, consequentialism} can likewise discover that they are helpful, and build them.
If the number of such aspects is dozens or hundreds or thousands, then whatever, {intelligence, understanding, consequentialism} can still get to work systematically discovering them all. The recipe for a human is not infinitely complex.
Part of the trouble is, if you just study the organism in isolation, you just get some genetic or phenotypic properties. You don’t have any good way of knowing which of these are the important ones or not.
You can try developing a model of all the different relevant exogenous factors. But as I insist, a lot of them will be too rare to be practical to memorize. (Consider all the crazy things you hear people who make self-driving cars need to do to handle the long tail, and then consider that self-driving cars are much easier than many other tasks, with the main difficult part being the high energies involved in driving cars near people.)
The main theoretical hope is that one could use some clever algorithm to automatically sort of aggregate “small-scale” understanding (like an autoregressive convolutional model to predict next time given previous time) into “large-scale” understanding (being able to understand how a system could act extreme, by learning how it acts normally). But I’ve studied a bunch of different approaches for that, and ultimately it doesn’t really seem feasible. (Typically the small-scale understanding learned is only going to be valid near the regime that it was originally observed within, and also the methods to aggregate small-scale behavior into large-scale behavior either rely on excessively nice properties or basically require you to already know what the extreme behaviors would be.)
If durability and strength are helpful, then {intelligence, understanding, consequentialism} can discover that durability and strength are helpful, and then build durability and strength.
Even if “the exact ways in which durability and strength will be helpful” does not constitute a learnable pattern, “durability and strength will be helpful” is nevertheless a (higher-level) learnable pattern.
First, I want to emphasize that durability and strength are near the furthest towards the easy side because e.g. durability is a common property seen in a lot of objects, and the benefits of durability can be seen relatively immediately and reasoned about locally. I brought them up to dispute the notion that we are guaranteed a sufficiently homogenous environment because otherwise intelligence couldn’t develop.
Another complication is, you gotta consider that e.g. being cheap is also frequently useful, especially in the sort of helpful/assistant-based role that current AIs typically take. This trades off against agency because profit-maximizing companies don’t want money tied up into durability or strength that you’re not typically using. (People, meanwhile, might want durability or strength because they find it cool, sexy or excellent—and as a consequence, those people would then gain more agency.)
Also, I do get the impression you are overestimating the feasibility of ““durability and strength will be helpful” is nevertheless a (higher-level) learnable pattern”. I can see some methods where maybe this would be robustly learnable, and I can see some regimes where even current methods would learn it, but considering its simplicity, it’s relatively far from falling naturally out of the methods.
One complication here is, currently AI is ~never designing mechanical things, which makes it somewhat harder to talk about.
(I should maybe write more but it’s past midnight and also I guess I wonder how you’d respond to this.)
Filter for homogenity of environment is anthropic selection—if environment is sufficiently heterogeneous, it kills everyone who tries to reach out of its ecological niche, general intelligence doesn’t develop and we are not here to have this conversation.
Nah, there are other methods than intelligence for survival and success. E.g. durability, strength, healing, intuition, tradition, … . Most of these developed before intelligence did.
I mean, we exist and we are at least somewhat intelligent, which implies strong upper bound on heterogenity of environment.
On the other hand, words like “durability” imply possibility of categorization, which itself implies intelligence. If environment is sufficiently heterogenous, you are durable at one second and evaporate at another.
I mean, we exist and we are at least somewhat intelligent, which implies strong upper bound on heterogenity of environment.
We don’t just use intelligence.
On the other hand, words like “durability” imply possibility of categorization, which itself implies intelligence. If environment is sufficiently heterogenous, you are durable at one second and evaporate at another.
???
Vaporization is prevented by outer space which drains away energy.
Not clear why you say durability implies intelligence, surely trees are durable without intelligence.
I feel like I’m failing to convey the level of abstraction I intend to.
I’m not saying that durability of object implies intelligence of object. I’m saying that if the world is ordered in a way that allows existence of distinct durable and non-durable objects, that means the possibility of intelligence which can notice that some objects are durable and some are not and exploit this fact.
If the environment is not ordered enough to contain intelligent beings, it’s probably not ordered enough to contain distinct durable objects too.
To be clear, by “environment” I mean “the entire physics”. When I say “environment not ordered enough” I mean “environment with physical laws chaotic enough to not contain ordered patterns”.
It seems like you are trying to convince me that intelligence exists, which is obviously true and many of my comments rely on it. My position is simply that consequentialism cannot convert intelligence into powerful agency, it can only use intelligence to bypass common obstacles.
If there’s some big object, then it’s quite possible for it to diminish into a large number of similar obstacles, and I’d agree this is where most obstacles come from, to the point where it seems reasonable to say that intelligence can handle almost all obstacles.
However, my assertion wasn’t that intelligence cannot handle almost all obstacles, it was that consequentialism can’t convert intelligence into powerful agency. It’s enough for there to be rare powerful obstacles in order for this to fail.
I don’t think this is the claim that the post is making but still makes sense to me. The post is saying something opposite, that the people working on the field are not doing prioritization right and so on or not thinking clearly about things while the risk is real
I do not necessarily disagree with this, coming from a legal / compliance background. If you see any of my profiles, I constantly complain about “performative compliance” and “compliance theatre”. Painfully present across the legal and governance sectors.
That said: can you provide examples of activism or regulatory efforts that you do agree with? What does a “non fake” regulatory effort look like?
I don’t think it would be okay to dismiss your take entirely, but it would be great to see what solutions you’d propose too. This is why I disagree in principle, because there are no specific points to contribute to.
In Europe, paradoxically, some of the people “close enough to the bureaucracy” that pushed for the AI Act to include GenAI providers, were OpenAI-adjacent.
But I will rescue this:
“(b) the regulatory targets themselves are aimed at things which seem easy to target (e.g. training FLOP limitations) rather than actually stopping advanced AI”
BigTech is too powerful to lobby against. “Stopping advanced AI” per se would contravene many market regulations (unless we define exactly what you mean by advanced AI and the undeniable dangers to people’s lives). Regulators can only prohibit development of products up to certain point. They cannot just decide to “stop” development of technologies arbitrarily. But the AI Act does prohibit many types of AI systems already: Article 5: Prohibited AI Practices | EU Artificial Intelligence Act.
Those are considered to create unacceptable risks to people’s lives and human rights.
Then there’s the AI regulation activists and lobbyists. They lobby and protest and stuff, pretending like they’re pushing for regulations on AI, but really they’re mostly networking and trying to improve their social status with DC People. Even if they do manage to pass any regulations on AI, those will also be mostly fake, because (a) these people are generally not getting deep into the bureaucracy which would actually implement any regulations, and (b) the regulatory targets themselves are aimed at things which seem easy to target (e.g. training FLOP limitations) rather than actually stopping advanced AI. The activists and lobbyists are nominally enemies of OpenAI, but in practice they all benefit from pushing the same narrative, and benefit from pretending that everyone involved isn’t faking everything all the time.
“The underlying reality is that their core products have mostly stagnated for over a year. In short: they’re faking being close to AGI.”
This seems like the most load-bearing belief in the full-cynical model; most of your other examples of fakeness rely on it in one way or another:
If the core products aren’t really improving, the progress measured on benchmarks is fake. But if they are, the benchmarks are an (imperfect but still real) attempt to quantify that real improvement.
If LLMs are stagnating, all the people generating dramatic-sounding papers for each new SOTA are just maintaining a holding pattern. But if they’re changing, then just studying/keeping up with the general properties of that progress is real. Same goes for people building and regularly updating their toy models of the thing.
Similarly, if the progress is fake, the propaganda signal-boosting that progress is also fake. If it isn’t, it isn’t. (At least directionally; a lot of that propaganda is still probably exaggerated.)
If the above three are all fake, all the people who feel real scared and want to be validated are stuck in a toxic emotional dead-end where they constantly freak out over fake things to no end. But if they’re responding to legitimate, persistent worldview updates, having a space to vibe them out with like-minded others seems important.
So, in deciding whether or not to endorse this narrative, we’d like to know whether or not the models really ARE stagnating. What makes you think the appearance of progress here is illusory?
This seems like the most load-bearing belief in the full-cynical model; most of your other examples of fakeness rely on it in one way or another [...]
Nope!
Even if the base models are improving, it can still be true that most of the progress measured on the benchmarks is fake, and has basically-nothing to do with the real improvements.
Even if the base models are improving, it can still be true that the dramatic sounding papers and toy models are fake, and have basically-nothing to do with the real improvements.
Even if the base models are improving, the propaganda about it can still be overblown and mostly fake, and have basically-nothing to do with the real improvements.
Even if the base models are improving, the people who feel real scared and just want to be validated can still be doing fake work and in fact be mostly useless, and their dynamic can still have basically-nothing to do with the real improvements.
Just because the base models are in fact improving does not mean that all this other stuff is actually coupled to the real improvement.
Sounds like you’re suggesting that real progress could be orthogonal to human-observed progress. I don’t see how this is possible. Human-observed progress is too broad.
The collective of benchmarks, dramatic papers and toy models, propaganda, and doomsayers are suggesting the models are simultaneously improving at: writing code, researching data online, generating coherent stories, persuading people of things, acting autonomously without human intervention, playing Pokemon, playing Minecraft, playing chess, aligning to human values, pretending to align to human values, providing detailed amphetamine recipes, refusing to provide said recipes, passing the Turing test, writing legal documents, offering medical advice, knowing what they don’t know, being emotionally compelling companions, correctly guessing the true authors of anonymous text, writing papers, remembering things, etc, etc.
They think all these improvements are happening at the same time in vastly different domains because they’re all downstream of the same task, which is text prediction. So, they’re lumped together in the general domain of ‘capabilities’, and call a model which can do all of them well a ‘general intelligence’. If the products are stagnating, sure, all those perceived improvements could be bullshit. (Big ‘if’!) But how could the models be ‘improving’ without improving at any of these things? What domains of ‘real improvement’ exist that are uncoupled to human perceptions of improvement, but still downstream of text prediction?
What domains of ‘real improvement’ exist that are uncoupled to human perceptions of improvement, but still downstream of text prediction?
As defined, this is a little paradoxical: how could I convince a human like you to perceive domains of real improvement which humans do not perceive...?
correctly guessing the true authors of anonymous text
See, this is exactly the example I would have given: truesight is an obvious example of a domain of real improvement which appears on no benchmarks I am aware of, but which appears to correlate strongly with the pretraining loss, is not applied anywhere (I hope), is unobvious that LLMs might do it and the capability does not naturally reveal itself in any standard use-cases (which is why people are shocked when it surfaces), and it would have been easy for no one to have observed it up until now or dismissed it, and even now after a lot of publicizing (including by yours truly), only a few weirdos know much about it.
Why can’t there be plenty of other things like inner-monologue or truesight? (“Wait, you could do X? Why didn’t you tell us?” “You never asked.”)
What domains of ‘real improvement’ exist that are uncoupled to human perceptions of improvement, but still downstream of text prediction?
Maybe a better example would be to point out that ‘emergent’ tasks in general, particularly multi-step tasks, can have observed success rates of precisely 0 in feasible finite samples, but extreme brute-force sampling reveals hidden scaling. Humans would perceive zero improvement as the models scaled (0/100 = 0%, 0⁄100 = 0%, 0⁄100 = 0%...), even though they might be rapidly improving from 1⁄100,000 to 1⁄10,000 to 1⁄1,000 to… etc. “Sampling can show the presence of knowledge but not the absence.”
As defined, this is a little paradoxical: how could I convince a human like you to perceive domains of real improvement which humans do not perceive...?
Oops, yes. I was thinking “domains of real improvement which humans are currently perceiving in LLMs”, not “domains of real improvement which humans are capable of perceiving in general”. So a capability like inner-monologue or truesight, which nobody currently knows about, but is improving anyway, would certainly qualify. And the discovery of such a capability could be ‘real’ even if other discoveries are ‘fake’.
That said, neither truesight nor inner-monologue seem uncoupled to the more common domains of improvement, as measured in benchmarks and toy models and people-being-scared. The latter, especially, I thought was popularized because it was so surprisingly good at improving benchmark performance. Truesight is narrower, but at the very least we’d expect it to correlate with skill in the common “write [x] in the style of [y]” prompt, right? Surely the same network of associations which lets it accurately generate “Eliezer Yudkowsky wrote this” after a given set of tokens, would also be useful for accurately finishing a sentence starting with “Eliezer Yudkowksy says...”.
So I still wouldn’t consider these things to have basically-nothing to do with commonly perceived domains of improvement.
The latter, especially, I thought was popularized because it was so surprisingly good at improving benchmark performance.
Inner-monologue is an example because as far as we know, it should have existed in pre-GPT-3 models and been constantly improving, but we wouldn’t have noticed because no one would have been prompting for it and if they had, they probably wouldn’t have noticed it. (The paper I linked might have demonstrated that by finding nontrivial performance in smaller models.) Only once it became fairly reliable in GPT-3 could hobbyists on 4chan stumble across it and be struck by the fact that, contrary to what all the experts said, GPT-3 could solve harder arithmetic or reasoning problems if you very carefully set it up just right as an elaborate multi-step process instead of what everyone did, which was just prompt it for the answer right away.
Saying it doesn’t count because once it was discovered it was such a large real improvement, is circular and defines away any example. (Did it not improve benchmarks once discovered? Then who cares about such an ‘uncoupled’ capability; it’s not a real improvement. Did it subsequently improve benchmarks once discovered? Then it’s not really an example because it’s ‘coupled’...) Surely the most interesting examples are ones which do exactly that!
And of course, now there is so much discussion, and so many examples, and it is in such widespread use, and has contaminated all LLMs being trained since, that they start to do it by default given the slightest pretext. The popularization eliminated the hiddenness. And here we are with ‘reasoning models’ which have blown through quite a few older forecasts and moved timelines earlier by years, to the extent that people are severely disappointed when a model like GPT-4.5 ‘only’ does as well as the scaling laws predicted and they start predicting the AI bubble is about to pop and scaling has been refuted.
would also be useful for accurately finishing a sentence starting with “Eliezer Yudkowsky says...”.
But that would be indistinguishable from many other sources of improvement. For starters, by giving a name, you are only testing one direction: ‘name → output’; truesight is about ‘name ← output’. The ‘reversal curse’ is an example of how such inference arrows are not necessarily bidirectional and do not necessarily scale much. (But if you didn’t know that, you would surely conclude the opposite.) There are many ways to improve performance of predicting output: better world-knowledge, abstract reasoning, use of context, access to tools or grounding like web search… No benchmark really distinguishes between these such that you could point to a single specific number and say, “that’s the truesight metric, and you can see it gets better with scale”.
Then there’s the AI regulation activists and lobbyists. [...] Even if they do manage to pass any regulations on AI, those will also be mostly fake
SB1047 was a pretty close shot to something really helpful. The AI Act and its code of practice might be insufficient, but there are good elements in it that, if applied, would reduce the risks. The problem is that it won’t be applied because of internal deployment.
But I sympathise somewhat with stuff like this:
They lobby and protest and stuff, pretending like they’re pushing for regulations on AI, but really they’re mostly networking and trying to improve their social status with DC People.
SB1047 was a pretty close shot to something really helpful.
No, it wasn’t. It was a pretty close shot to something which would have gotten a step closer to another thing, which itself would have gotten us a step closer to another thing, which might have been moderately helpful at best.
Sure, they are more-than-zero helpful. Heck, in a relative sense, they’d be one of the biggest wins in AI safety to date. But alas, reality does not grade on a curve.
One has to bear in mind that the words on that snapshot do not all accurately describe reality in the world where SB1047 passes. “Implement shutdown ability” would not in fact be operationalized in a way which would ensure the ability to shutdown an actually-dangerous AI, because nobody knows how to do that. “Implement reasonable safeguards to prevent societal-scale catastrophes” would in fact be operationalized as checking a few boxes on a form and maybe writing some docs, without changing deployment practices at all, because the rules for the board responsible for overseeing these things made it pretty easy for the labs to capture.
When I discussed the bill with some others at the time, the main takeaway was that the actually-substantive part was just putting any bureaucracy in place at all to track which entities are training models over 10^26 FLOP/$100M. The bill seemed unlikely to do much of anything beyond that.
Even if the bill had been much more substantive, it would still run into the standard problems of AI regulation: we simply do not have a way to reliably tell which models are and are not dangerous, so the choice is to either ban a very large class of models altogether, or allow models which will predictably be dangerous sooner or later. The most commonly proposed substantive proxy is to ban models over a certain size, which would likely slow down timelines by a factor of 2-3 at most, but definitely not slow down timelines by a factor of 10 or more.
The most commonly proposed substantive proxy is to ban models over a certain size, which would likely slow down timelines by a factor of 2-3 at most
… or, if we do live in a world in which LLMs are not AGI-complete, it might accelerate timelines. After all, this would force the capabilities people to turn their brains on again instead of mindlessly scaling, and that might lead to them stumbling on something which is AGI-complete. And it would, due to a design constraint, need much less compute for committing omnicide.
How likely would that be? Companies/people able to pivot like this would need to be live players, capable of even conceiving of new ideas that aren’t “scale LLMs”. Naturally, that means 90% of the current AI industry would be out of the game. But then, 90% of the current AI industry aren’t really pushing the frontier today either; that wouldn’t be much of a loss.
To what extent are the three AGI labs alive vs. dead players, then?
OpenAI has certainly been alive back in 2022. Maybe the coup and the exoduses killed it and it’s now a corpse whose apparent movement is just inertial (the reasoning models were invented prior to the coup, if Q* rumors are to be trusted, so it’s little evidence that OpenAI was still alive in 2024). But maybe not.
Anthropic houses a bunch of the best OpenAI researchers now, and it’s apparently capable of inventing some novel tricks (whatever’s the mystery behind Sonnet 3.5 and 3.6).
DeepMind is even now consistently outputting some interesting non-LLM research.
I think there’s a decent chance that they’re alive enough. Currently, they’re busy eating the best AI researchers and turning them into LLM researchers. If they stop focusing people’s attention on the potentially-doomed paradigm, if they’re forced to correct the mistake (on this model) that they’re making...
This has always been my worry about all the proposals to upper-bound FLOPs, complicated by my uncertainty regarding whether LLMs are or are not AGI-complete after all.
One major positive effect this might have is memetic. It might create the impression of an (artificially created) AI Winter, causing people to reflexively give up. In addition, not having an (apparent) in-paradigm roadmap to AGI would likely dissolve the race dynamics, both between AGI companies and between geopolitical entities. If you can’t produce straight-line graphs suggesting godhood by 2027, and are reduced to “well we probably need a transformer-sized insight here...”, it becomes much harder to generate hype and alarm that would be legible to investors and politicians.
But then, in worlds in which LLMs are not AGI-complete, how much actual progress to AGI is happening due to the race dynamic? Is it more or less progress than would be produced by a much-downsized field in the counterfactual in which LLM research is banned? How much downsizing would it actually cause, now that the ideas of AGI and the Singularity have gone mainstream-ish? Comparatively, how much downsizing would be caused by the chilling effect if the presumably doomed LLM paradigm is let to run its course of disappointing everyone by 2030 (when the AGI labs can scale no longer)?
On balance, upper-bounding FLOPs is probably still a positive thing to do. But I’m not really sure.
I disagree that the default would’ve been that the board would’ve been “easy for the labs to capture” (indeed, among the most prominent and plausible criticisms of its structure was that it would overregulate in response to political pressure), and thus that it wouldn’t have changed deployment practices. I think the frontier companies were in a good position to evaluate this, and they decided to oppose the bill (and/or support it conditional on sweeping changes, including the removal of the Frontier Model Division).
Also, I’m confused when policy skeptics say things like “sure, it might slow down timelines by a factor of 2-3, big deal.” Having 2-3x as much time is indeed a big deal!
I’m glad we agree “they’d be one of the biggest wins in AI safety to date.”
“Implement shutdown ability” would not in fact be operationalized in a way which would ensure the ability to shutdown an actually-dangerous AI, because nobody knows how to do that
How so? It’s pretty straightforward if the model is still contained in the lab.
“Implement reasonable safeguards to prevent societal-scale catastrophes” would in fact be operationalized as checking a few boxes on a form and maybe writing some docs, without changing deployment practices at all
I think ticking boxes is good. This is how we went to the Moon, and it’s much better to do this than to not do it. It’s not trivial to tick all the boxes. Look at the number of boxes you need to tick if you want to follow the Code of Practice of the AI Act or this paper from DeepMind.
we simply do not have a way to reliably tell which models are and are not dangerous
How so? I think capabilities evaluations are much simpler than alignment evals, and at the very least we can run those. You might say: “A model might sandbag.” Sure, but you can fine-tune it and see if the capabilities are recovered. If even with some fine-tuning the model is not able to do the tasks at all, modulo the problem of gradient hacking that is, I think, very unlikely, we can be pretty sure that the model wouldn’t be capable of doing such feat. I think at the very least, following the same methodology as the one followed by Anthropic in their last system cards is pretty good and would be very helpful.
The EU AI Act even mentions “alignment with human intent” explicitly, as a key concern for systemic risks. This is in Recital 110 (which defines what are systemic risks and how they may affect society).
I do not think any law has mentioned alignment like this before, so it’s massive already.
Will a lot of the implementation efforts feel “fake”? Oh, 100%. But I’d say that this is why we (this community) should not disengage from it...
I also get that the regulatory landscape in the US is another world entirely (which is what the OP is bringing up).
Your very first point is, to be a little uncharitable, ‘maybe OpenAI’s whole product org is fake.’ I know you have a disclaimer here but you’re talking about a product category that didn’t exist 30 months ago that today has this one website now reportedly used by 10% of people in the entire world and that the internet is saying expects ~12B revenue this year.
If your vibes are towards investing in that class of thing being fake or ‘mostly a hype machine’ then your vibes are simply not calibrated well in this domain.
No, the model here is entirely consistent with OpenAI putting out some actual cool products. Those products (under the model) just aren’t on a path to AGI, and OpenAI’s valuation is very much reliant on being on a path to AGI in the not-too-distant future. It’s the narrative about building AGI which is fake.
OpenAI’s valuation is very much reliant on being on a path to AGI in the not-too-distant future.
Really? I’m mostly ignorant on such matters, but I’d thought that their valuation seemed comically low compared to what I’d expect if their investors thought that OpenAI was likely to create anything close to a general superhuman AI system in the near future.[1] I considered this evidence that they think all the AGI/ASI talk is just marketing.
Well ok, if they actually thought OpenAI would create superintelligence as I think of it, their valuation would plummet because giving people money to kill you with is dumb. But there’s this space in between total obliviousness and alarm, occupied by a few actually earnest AI optimists. And, it seems to me, not occupied by the big OpenAI investors.
Consider, in support: Netflix has a $418B market cap. It is inconsistent to think that a $300B valuation for OpenAI or whatever’s in the news requires replacing tens of trillions of dollars of capital before the end of the decade.
Similarly, for people wanting to argue from the other direction, who might think a low current valuation is case-closed evidence against their success chances, consider that just a year ago the same argument would have discredited how they are valued today, and a year before that would have discredited where they were a year ago, and so forth. This holds similarly for historic busts in other companies. Investor sentiment is informational but clearly isn’t definitive, else stocks would never change rapidly.
Similarly, for people wanting to argue from the other direction, who might think a low current valuation is case-closed evidence against their success chances
To be clear: I think the investors would be wrong to think that AGI/ASI soon-ish isn’t pretty likely.
But most of your criticisms in the point you gave have ~no bearing on that? If you want to make a point about how effectively OpenAI’s research moves towards AGI you should be saying things relevant to that, not giving general malaise about their business model.
Or, I might understand ‘their business model is fake which implies a lack of competence about them broadly,’ but then I go back to the whole ‘10% of people in the entire world’ and ‘expects 12B revenue’ thing.
The point of listing the problems with their business model is that they need the AGI narrative in order to fuel the investor cash, without which they will go broke at current spend rates. They have cool products, they could probably make a profit if they switched to optimizing for that (which would mean more expensive products and probably a lot of cuts), but not anywhere near the level of profits they’d need to justify the valuation.
That’s how I interpreted it originally; you were arguing their product org vibed fake, I was arguing your vibes were miscalibrated. I’m not sure what to say to this that I didn’t say originally.
Then there’s the AI regulation activists and lobbyists. They lobby and protest and stuff, pretending like they’re pushing for regulations on AI, but really they’re mostly networking and trying to improve their social status with DC People.
The activists and the lobbyists are two very different groups. The activists are not trying to network with the DC people (yet). Unless you mean Encode, who I would call lobbyists, not activists.
Good point, I should have made those two separate bullet points:
Then there’s the AI regulation lobbyists. They lobby and stuff, pretending like they’re pushing for regulations on AI, but really they’re mostly networking and trying to improve their social status with DC People. Even if they do manage to pass any regulations on AI, those will also be mostly fake, because (a) these people are generally not getting deep into the bureaucracy which would actually implement any regulations, and (b) the regulatory targets themselves are aimed at things which seem easy to target (e.g. training FLOP limitations) rather than actually stopping advanced AI. The activists and lobbyists are nominally enemies of OpenAI, but in practice they all benefit from pushing the same narrative, and benefit from pretending that everyone involved isn’t faking everything all the time.
Also, there’s the AI regulation activists, who e.g. organize protests. Like ~98% of protests in general, such activity is mostly performative and not the sort of thing anyone would end up doing if they were seriously reasoning through how best to spend their time in order to achieve policy goals. Calling it “fake” feels almost redundant. Insofar as these protests have any impact, it’s via creating an excuse for friendly journalists to write stories about the dangers of AI (itself an activity which mostly feeds the narrative, and has dubious real impact).
(As with the top level, epistemic status:I don’t fully endorse all this, but I think it’s a pretty major mistake to not at least have a model like this sandboxed in one’s head and check it regularly.)
Oh, if you’re in the business of compiling a comprehensive taxonomy of ways the current AI thing may be fake, you should also add:
Vibe coders and “10x’d engineers”, who (on this model) would be falling into one of the failure modes outlined here: producing applications/features that didn’t need to exist, creating pointless code bloat (which helpfully show up in productivity metrics like “volume of code produced” or “number of commits”), or “automatically generating” entire codebases in a way that feels magical, then spending so much time bugfixing them it eats up ~all perceived productivity gains.
e/acc and other Twitter AI fans, who act like they’re bleeding-edge transhumanist visionaries/analysts/business gurus/startup founders, but who are just shitposters/attention-seekers who will wander off and never look back the moment the hype dies down.
I share some similar frustrations, and unfortunately these are also prevalent in other parts of the human society. The commonality of most of these fakeness seem to be impure intentions—there are impure/non-intrinsic motivations other than producing the best science/making true progress. Some of these motivations unfortunately could be based on survival/monetary pressure, and resolving that for true research or progress seems to be critical. We need to encourage a culture of pure motivations, and also equip ourselves with more ability/tools to distinguish extrinsic motivations.
… But It’s Fake Tho
Epistemic status: I don’t fully endorse all this, but I think it’s a pretty major mistake to not at least have a model like this sandboxed in one’s head and check it regularly.
Full-cynical model of the AI safety ecosystem right now:
There’s OpenAI, which is pretending that it’s going to have full AGI Any Day Now, and relies on that narrative to keep the investor cash flowing in while they burn billions every year, losing money on every customer and developing a product with no moat. They’re mostly a hype machine, gaming metrics and cherry-picking anything they can to pretend their products are getting better. The underlying reality is that their core products have mostly stagnated for over a year. In short: they’re faking being close to AGI.
Then there’s the AI regulation activists and lobbyists. They lobby and protest and stuff, pretending like they’re pushing for regulations on AI, but really they’re mostly networking and trying to improve their social status with DC People. Even if they do manage to pass any regulations on AI, those will also be mostly fake, because (a) these people are generally not getting deep into the bureaucracy which would actually implement any regulations, and (b) the regulatory targets themselves are aimed at things which seem easy to target (e.g. training FLOP limitations) rather than actually stopping advanced AI. The activists and lobbyists are nominally enemies of OpenAI, but in practice they all benefit from pushing the same narrative, and benefit from pretending that everyone involved isn’t faking everything all the time.
Then there’s a significant contingent of academics who pretend to produce technical research on AI safety, but in fact mostly view their job as producing technical propaganda for the regulation activists and lobbyists. (Central example: Dan Hendrycks, who is the one person I directly name mainly because I expect he thinks of himself as a propagandist and will not be particularly offended by that description.) They also push the narrative, and benefit from it. They’re all busy bullshitting research. Some of them are quite competent propagandists though.
There’s another significant contingent of researchers (some at the labs, some independent, some academic) who aren’t really propagandists, but mostly follow the twitter-memetic incentive gradient in choosing their research. This tends to generate paper titles which sound dramatic, but usually provide pretty little conclusive evidence of anything interesting upon reading the details, and very much feed the narrative. This is the main domain of Not Measuring What You Think You Are Measuring and Symbol/Referent Confusions.
Then of course there’s the many theorists who like to build neat toy models which are completely toy and will predictably not generalize useful to real-world AI applications. This is the main domain of Ad-Hoc Mathematical Definitions, the theorists’ analogue of Not Measuring What You Think You Are Measuring.
Benchmarks. When it sounds like a benchmark measures something reasonably challenging, it nearly-always turns out that it’s not really measuring the challenging thing, and the actual questions/tasks are much easier than the pitch would suggest. (Central examples: software eng, GPQA, frontier math.) Also it always turns out that the LLMs’ supposedly-impressive achievement relied much more on memorization of very similar content on the internet than the benchmark designers expected.
Then there’s a whole crowd of people who feel real scared about AI (whether for good reasons or because they bought the Narrative pushed by all the people above). They mostly want to feel seen and validated in their panic. They have discussions and meetups and stuff where they fake doing anything useful about the problem, while in fact they mostly just emotionally vibe with each other. This is a nontrivial chunk of LessWrong content, as e.g. Val correctly-but-antihelpfully pointed out. It’s also the primary motivation behind lots of “strategy” work, like e.g. surveying AI researchers about their doom probabilities, or doing timeline forecasts/models.
… and of course none of that means that LLMs won’t reach supercritical self-improvement, or that AI won’t kill us, or [...]. Indeed, absent the very real risk of extinction, I’d ignore all this fakery and go about my business elsewhere. I wouldn’t be happy about it, but it wouldn’t bother me any more than all the (many) other basically-fake fields out there.
Man, I really just wish everything wasn’t fake all the time.
What makes you confident that AI progress has stagnated at OpenAI? If you don’t have the time to explain why I understand, but what metrics over the past year have stagnated?
Could you name three examples of people doing non-fake work? Since towardsness to non-fake work is easier to use for aiming than awayness from fake work.
Chris Olah and Dan Murfet in the at-least-partially empirical domain. Myself in the theory domain, though I expect most people (including theorists) would not know what to look for to distinguish fake from non-fake theory work. In the policy domain, I have heard that Microsoft’s lobbying team does quite non-fake work (though not necessarily in a good direction). In the capabilities domain, DeepMind’s projects on everything except LLMs (like e.g. protein folding, or that fast matrix multiplication paper) seem consistently non-fake, even if they’re less immediately valuable than they might seem at first glance. Also Conjecture seems unusually good at sticking to reality across multiple domains.
I do not get this impression, why do you say this?
The entire field is based on fears that consequentialism provides an extremely powerful but difficult-to-align method of converting intelligence into agency. This is basically wrong. Yes, people attempt to justify it with coherence theorems, but obviously you can be approximately-coherent/approximately-consequentialist and yet still completely un-agentic, so this justification falls flat. Since the field is based on a wrong assumption with bogus justification, it’s all fake.
(IMO this is kinda unrelated to the OP, but I want to continue this thread.)
Have you elaborated on this anywhere?
Perhaps you missed it, but some guy in 2022 wrote this great post which claimed that “Consequentialism, broadly defined, is a general and useful way to develop capabilities.” ;-)
I’m actually just in the course of writing something about why “consequentialism provides an extremely powerful but difficult-to-align method of converting intelligence into agency” … maybe I can send you the draft for criticism when it’s ready?
I think it’s quite related to the OP. If a field is founded on a wrong assumption, then people only end up working in the field if they have some sort of blind spot, and that blind spot leads to their work being fake.
Not hugely. One tricky bit is that it basically ends up boiling down to “the original arguments don’t hold up if you think about them”, but the exact way they don’t hold up depends on what the argument is, so it’s kind of hard to respond to in general.
Haha! I think I mostly still stand by the post. In particular, “Consequentialism, broadly defined, is a general and useful way to develop capabilities.” remains true; it’s just that intelligence relies on patterns and thus works much better on common things (which must be small, because they are fragments of a finite world), than on rare things (which can be big, though don’t have to). This means that consequentialism isn’t very good at developing powerful capabilities unless it works in an environment that has already been highly filtered to be highly homogenous, because an inhomogenous environment is going to BTFO the intelligence.
(I’m not sure I stand 101% by my post; there’s some funky business about how to count evolution that I still haven’t settled on yet. And I was too quick to go from “imitation learning isn’t going to lead to far-superhuman abilities” to “consequentialism is the road to far-superhuman abilities”. But yeah I’m actually surprised at how well I stand by my old view despite my massive recent updates.)
Sounds good!
I think you’re conflating consequentialism and understanding in a weird-to-me way. (Or maybe I’m misunderstanding.)
I think consequentialism is related to choosing one action versus another action. I think understanding (e.g. predicting the consequence of an action) is different, and that in practice understanding has to involve self-supervised learning.
(I think human brains have both [partly-] consequentialist decisions and self-supervised updating of the world-model.) (They’re not totally independent, but rather they interact via training data: e.g. [partly-] consequentialist decision-making determines how you move your eyes, and then whatever your eyes are pointing at, your model of the visual world will then update by self-supervised learning on that particular data. But still, these are two systems that interact, not the same thing.)
I think self-supervised learning is perfectly capable of discovering rare but important patterns. Just look at today’s foundation models, which seem pretty great at that.
This I’d dispute. If your model if underparameterized (which I think is true for the typical model?), then it can’t learn any patterns that only occurs once in the data. And even if the model is overparameterized, it still can’t learn any pattern that never occurs in the data.
I’m saying that intelligence is the thing that allows you to handle patterns. So if you’ve got a dataset, intelligence allows you to build a model that makes predictions for other data based on the patterns it can find in said dataset. And if you have a function, intelligence allows you to find optima for said function based on the patterns it can find in said function.
Consequentialism is a way to set up intelligence to be agent-ish. This often involves setting up something that’s meant to build an understanding of actions based on data or experience.
One could in principle cut my definition of consequentialism up into self-supervised learning and true consequentialism (this seems like what you are doing..?). One disadvantage with that is that consequentialist online learning is going to have a very big effect on the dataset one ends up training the understanding on, so they’re not really independent of each other. Either way that just seems like a small labelling thing to me.
Dunno if anything’s changed since 2023, but this says LLMs learn things they’ve seen exactly once in the data.
I can vouch that you can ask LLMs about things that are extraordinarily rare in the training data—I’d assume well under once per billion tokens—and they do pretty well. E.g. they know lots of random street names.
Humans successfully went to the moon, despite it being a quite different environment that they had never been in before. And they didn’t do that with “durability, strength, healing, intuition, tradition”, but rather with intelligence.
Speaking of which, one can apply intelligence towards the problem of being resilient to unknown unknowns, and one would come up with ideas like durability, healing, learning from strategies that have stood the test of time (when available), margins of error, backup systems, etc.
I guess to add, I’m not talking about unknown unknowns. Often the rare important things are very well known (after all, they are important, so people put a lot of effort into knowing them), they just can’t efficiently be derived from empirical data (except essentially by copying someone else’s conclusion blindly, and that leaves you vulnerable to deception).
I don’t have time to read this study in detail until later today, but if I’m understanding it correctly, the study isn’t claiming that neural networks will learn rare important patterns in the data, but rather that they will learn rare patterns that they were recently trained on. So if you continually train on data, you will see a gradual shift towards new patterns and forgetting old ones.
Random street names aren’t necessarily important though? Like what would you do with them?
I didn’t say that intelligence can’t handle different environments, I said it can’t handle heterogenous environments. The moon is nearly a sterile sphere in a vacuum; this is very homogenous, to the point where pretty much all of the relevant patterns can be found or created on Earth. It would have been more impressive if e.g. the USA could’ve landed a rocket with a team of Americans in Moscow than on the moon.
Also people did use durability, strength, healing, intuition and tradition to go the moon. Like with strength, someone had to build the rockets (or build the machines which built the rockets). And without durability and healing, they would have been damaged too much in the process of doing that. Intuition and healing are harder to clearly attribute, but they’re part of it too.
Learning from strategies that stood the test of time would be tradition moreso than intelligence. I think tradition requires intelligence, but it also requires something else that’s less clear (and possibly not simple enough to be assembled manually, idk).
Margins of error and backup systems would be, idk, caution? Which, yes, definitely benefit from intelligence and consequentialism. Like I’m not saying intelligence and consequentialism are useless, in fact I agree that they are some of the most commonly useful things due to the frequent need to bypass common obstacles.
Right, that’s what I was gonna say. You need intelligence to sort out which traditions should be copied and which ones shouldn’t. There was a 13-billion-year “tradition” of not building e-commerce megastores, but Jeff Bezos ignored that “tradition”, and it worked out very well for him (and I’m happy about it too). Likewise, the Wright Brothers explicitly followed the “tradition” of how birds soar, but not the “tradition” of how birds flap their wings.
I do think there’s a “something else” (most [but not all] humans have an innate drive to follow and enforce social norms, more or less), but I don’t think it’s necessary. The Wright Brothers didn’t have any innate drive to copy anything about bird soaring tradition, but they did it anyway purely by intelligence.
I feel like I’ve lost the plot here. If you think there are things that are very important, but rare in the training data, and that LLMs consequently fail to learn, can you give an example?
I guess you’re using “empirical data” in a narrow sense. If Joe tells me X, I have gained “empirical data” that Joe told me X. And then I can apply my intelligence to interpret that “data”. For example, I can consider a number of hypotheses: the hypothesis that Joe is correct and honest, that Joe is mistaken but honest, that Joe is trying to deceive me, that Joe said Y but I misheard him, etc. And then I can gather or recall additional evidence that favors one of those hypotheses over another. I could ask Joe to repeat himself, to address the “I misheard him” hypothesis. I could consider how often I have found Joe to be mistaken about similar things in the past. I could ask myself whether Joe would benefit from deceiving me. Etc.
This is all the same process that I might apply to other kinds of “empirical data” like if my car was making a funny sound. I.e., consider possible generative hypotheses that would match the data, then try to narrow down via additional observations, and/or remain uncertain and prepare for multiple possibilities when I can’t figure it out. This is a middle road between “trusting people blindly” versus “ignoring everything that anyone tells you”, and it’s what reasonable people actually do. Doing that is just intelligence, not any particular innate human tendency—smart autistic people and smart allistic people and smart callous sociopaths etc. are all equally capable of traveling this middle road, i.e. applying intelligence towards the problem of learning things from what other people say.
(For example, if I was having this conversation with almost anyone else, I would have quit, or not participated in the first place. But I happen to have prior knowledge that you-in-particular have unusual and well-thought-through ideas, and even they’re wrong, they’re often wrong in very unusual and interesting ways, and that you don’t tend to troll, etc.)
I feel like I’m misunderstanding you somehow. You keep saying things that (to me) seem like you could equally well argue that humans cannot possibly survive in the modern world, but here we are. Do you have some positive theory of how humans survive and thrive in (and indeed create) historically-unprecedented heterogeneous environments?
I think the necessity of intelligence for tradition exists on a much more fundamental level than that. Intelligence allows people to from an extremely rich model of the world with tons of different concepts. If one had no intelligence at all, one wouldn’t even be able to copy the traditions. Like consider a collection of rocks or a forest; it can’t pass any tradition onto itself.
But conversely, just as intelligence cannot be converted into powerful agency, I don’t think it can be used to determine which traditions should be copied and which ones shouldn’t.
It seems to me that you are treating any variable attribute that’s highly correlated across generations as a “tradition”, to the point where not doing something is considered on the same ontological level as doing something. That is the sort of ontology that my LDSL series is opposed to.
I’m probably not the best person to make the case for tradition as (despite my critique of intelligence) I’m still a relatively strong believer in equillibration and reinvention.
Whenever there’s any example of this that’s too embarrassing or too big of an obstacle for applying them in a wide range of practical applications, a bunch of people point it out, and they come up with a fix that allows the LLMs to learn it.
The biggest class of relevant examples would all be things that never occur in the training data—e.g. things from my job, innovations like how to build a good fusion reactor, social relationships between the world’s elites, etc.. Though I expect you feel like these would be “cheating”, because it doesn’t have a chance to learn them?
The things in question often aren’t things that most humans have a chance to learn, or even would benefit from learning. Often it’s enough if just 1 person realizes and handles them, and alternately often if nobody handles them then you just lose whatever was dependent on them. Intelligence is a universal way to catch on to common patterns; other things than common patterns matter too, but there’s no corresponding universal solution.
You ran way deeper into the “except essentially by copying someone else’s conclusion blindly, and that leaves you vulnerable to deception” point than I meant you to. My main point is that humans have grounding on important factors that we’ve acquired through non-intelligence-based means. I bring up the possibility of copying other’s conclusions because for many of those factors, LLMs still have access to this via copying them.
It might be helpful to imagine what it would look like if LLMs couldn’t copy human insights. For instance, imagine if there was a planet with life much like Earth’s, but with no species that were capable of language. We could imagine setting up a bunch of cameras or other sensors on the planet and training a self-supervised learning algorithm on them. They could surely learn a lot about the world that way—but it also seems like they would struggle with a lot of things. The exact things they would struggle with might depend a lot on how much prior your build into the algorithm, and how dynamic the sensors are, and whether there’s also ways for it to perform interventions upon the planet. But for instance even recognizing the continuity of animal lives as they wander off the screen would either require a lot of prior knowledge built in to the algorithm, or a very powerful learning algorithm (e.g. Solomonoff induction can use a simplicity prior to infer that there must be an entire planet full of animals off-screen, but that’s computationally intractable).
(Also, again you still need to distinguish between “Is intelligence a useful tool for bridging lots of common gaps that other methods cannot handle?” vs “Is intelligence sufficient on its own to detect deception?”. My claim is that the the answer to the former is yes and the latter is no. To detect deception, you don’t just use intelligence but also other facets of human agency.)
First, some things that might seem like nitpicks but are moderately important to my position:
In many ways, our modern world is much less heterogeneous than the past. For instance thanks to improved hygeine, we are exposed to far fewer diseases, and thanks to improved policing/forensics, we are exposed to much less violent crime. International trade allows us to average away troubles with crop failures. While distribution shifts generically should make it harder for humans to survive, they can (especially if made by humans) make it easier to survive.
Humans do not in fact survive; our average lifespan is less than 100 years. Humanity as a species survives by birthing, nurturing, and teaching children, and by collaborating with each other. My guess would be that aging is driven to a substantial extent by heterogeneity (albeit perhaps endogenous heterogeneity?) that hasn’t been protected against. (I’m aware of John Wentworth’s ‘gears of aging’ series arguing that aging has a common cause, but I’ve come to think that his arguments don’t sufficiently much distinguish between ‘is eventually mediated by a common cause’ vs ‘is ultimately caused by a common cause’. By analogy, computer slowdowns may be said to be attributable to a small number of causes like CPU exhaustion, RAM exhaustion, network bandwidth exhaustion, etc., but these are mediators and the root causes will typically be some particular program that is using up those resources, and there’s a huge number of programs in the world which could be to blame depending on the case.)
We actually sort of are in a precarious situation? The world wars were unprecedentedly bloody. They basically ended because of the invention of nukes, which are so destructive that we avoid using them in war. But I don’t think we actually have a robust way to avoid that?
But more fundamentally, my objection to this question is that I doubt the meaningfulness of a positive theory of how humans survive and thrive. “Intelligence” and “consequentialism” are fruitful explanations of certain things because they can be fairly-straightforwardly constructed, have fairly well-characterizable properties, and even can be fairly well-localized anatomically in humans (e.g. parts of the brain).
Like one can quibble with the details of what counts as intelligence vs understanding vs consequentialism, but under the model where intelligence is about the ability to make use of patterns, you can hand a bunch of data to computer scientists and tell them to get to work untangling the patterns, and then it turns out there are some fairly general algorithms that can work on all sorts of datasets and patterns. (I find it quite plausible that we’ve already “achieved superhuman intelligence” in the sense that if you give both me and a transformer a big dataset that neither of us are pre-familiar with to study through, then (at least for sufficiently much data) eventually the transformer will clearly outperform me at predicting the next token.) And probably these fairly general algorithms are probably more-or-less the same sort of thing that much of the human brain is doing.
Thus “intelligence” factors out relatively nicely as a concept that can be identified as a major contributor to human success (I think intelligence is the main reason humans outperformed other primates). But this does not mean that the rest of human success can equally well be factored out into a small number of nicely attributable and implementable concepts. (Like, some of it probably can, but there’s not as much reason to presume that all of it can. “Durability” and “strength” are examples of things that fairly well can, and indeed we have definitely achieved far-superhuman strength. These are purely physical though, whereas a lot of the important stuff has a strong cognitive element to it—though I suspect it’s not purely cognitive...)
OK, here’s my argument that, if you take {intelligence, understanding, consequentialism} as a unit, it’s sufficient for everything:
If durability and strength are helpful, then {intelligence, understanding, consequentialism} can discover that durability and strength are helpful, and then build durability and strength.
Even if “the exact ways in which durability and strength will be helpful” does not constitute a learnable pattern, “durability and strength will be helpful” is nevertheless a (higher-level) learnable pattern.
If some other evolved aspects of the brain and body are helpful, then {intelligence, understanding, consequentialism} can likewise discover that they are helpful, and build them.
After all, if ‘those things are helpful’ wasn’t a learnable pattern, then evolution would not have discovered and exploited that pattern!
If the number of such aspects is dozens or hundreds or thousands, then whatever, {intelligence, understanding, consequentialism} can still get to work systematically discovering them all. The recipe for a human is not infinitely complex.
If reducing heterogeneity is helpful, then {intelligence, understanding, consequentialism} can discover that fact, and figure out how to reduce heterogeneity.
Etc.
Writing the part that I didn’t get around to yesterday:
You could theoretically imagine e.g. scanning all the atoms of a human body and then using this scan to assemble a new human body in their image. It’d be a massive technical challenge of course, because atoms don’t really sit still and let you look and position them. But with sufficient work, it seems like someone could figure it out.
This doesn’t really give you artificial general agency of the sort that standard Yudkowsky-style AI worries are about, because you can’t assign them a goal. You might get an Age of Em-adjacent situation from it, though even not quite that.
To reverse-engineer people in order to make AI, you’d instead want to identify separate faculties with interpretable effects and reconfigurable interface. This can be done for some of the human faculties because they are frequently applied to their full extent and because they are scaled up so much that the body had to anatomically separate them from everything else.
However, there’s just no reason to suppose that it should apply to all the important human faculties, and if one considers all the random extreme events one ends up having to deal with when performing tasks in an unhomogenized part of the world, there’s lots of reason to think humans are primarily adapted to those.
One way to think about the practical impact of AI is that it cannot really expand on its own, but that people will try to find or create sufficiently-homogenous places where AI can operate. The practical consequence of this is that there will be a direct correspondence between each part of the human work to prepare the AI to each part of the activities the AI is engaging in, which will (with caveats) eliminate alignment problems because the AI only does the sorts of things you explicitly make it able to do.
The above is similar to how we don’t worry so much about ‘website misalignment’ because generally there’s a direct correspondence between the behavior of the website and the underlying code, templates and database tables. This didn’t have to be true, in the sense that there are many short programs with behavior that’s not straightforwardly attributable to their source code and yet still in principle could be very influential, but we don’t know how to select good versions of such programs, so instead we go for the ones with a more direct correspondence, even though they are larger and possibly less useful. Similarly with AI, since consequentialism is so limited, people will manually build out some apps where AI can earn them a profit operating on homogenized stuff, and because this building-out directly corresponds to the effect of the apps, they will be alignable but not very independently agentic.
(The major caveat is people may use AI as a sort of weapon against others, and this might force others to use AI to defend themselves. This won’t lead to the traditional doom scenarios because they are too dependent on overestimating the power of consequentialism, but it may lead to other doom scenarios.)
I’ve grown undecided about whether to consider evolution a form of intelligence-powered consequentialism because in certain ways it’s much more powerful than individual intelligence (whether natural or artificial).
Individual intelligence mostly focuses on information that can be made use of over a very short time/space-scale. For instance an autoregressive model relates the immediate future to the immediate past. Meanwhile, evolution doesn’t meaningfully register anything shorter than the reproductive cycle, and is clearly capable of registering things across the entire lifespan and arguably longer than that (like, if you set your children up in an advantageous situation, then that continues paying fitness dividends even after you die).
Of course this is somewhat counterbalanced by the fact that evolution has much lower information bandwidth. Though from what I understand, people also massively underestimate evolution’s information bandwidth due to using an easy approximation (independent Bernoulli genotypes, linear short-tailed genotype-to-phenotype relationships and thus Gaussian phenotypes, quadratic fitness with independence between organisms). Whereas if you have a large number of different niches, then within each niche you can have the ordinary speed of evolution, and if you then have some sort of mixture niche, that niche can draw in organisms from each of the other niches and thus massively increase its genetic variance, and then since the speed of evolution is proportional to genetic variance, that makes this shared niche evolve way faster than normally. And if organisms then pass from the mixture niche out into the specialized niches, they can benefit from the fast evolution too.
(Mental picture to have in mind: we might distinguish niches like hunter, fisher, forager, farmer, herbalist, spinner, potter, bard, bandit, carpenter, trader, king, warlord (distinct from king in that kings gain power through expanding their family while warlords gain power by sniping the king off a kingdom), concubine, bureaucrat, … . Each of them used to be evolving individually, but also genes flowed between them in various ways. Though I suspect this is undercounting the number of niches because there’s also often subniches.)
And then obviously beyond these points, individual intelligence and evolution focus on different things—what’s happening recently vs what’s happened deep in the past. Neither are perfect; society has changed a lot, which renders what’s happened deep in the past less relevant than it could have been, but at the same time what’s happening recently (I argue) intrinsically struggles with rare, powerful factors.
Part of the trouble is, if you just study the organism in isolation, you just get some genetic or phenotypic properties. You don’t have any good way of knowing which of these are the important ones or not.
You can try developing a model of all the different relevant exogenous factors. But as I insist, a lot of them will be too rare to be practical to memorize. (Consider all the crazy things you hear people who make self-driving cars need to do to handle the long tail, and then consider that self-driving cars are much easier than many other tasks, with the main difficult part being the high energies involved in driving cars near people.)
The main theoretical hope is that one could use some clever algorithm to automatically sort of aggregate “small-scale” understanding (like an autoregressive convolutional model to predict next time given previous time) into “large-scale” understanding (being able to understand how a system could act extreme, by learning how it acts normally). But I’ve studied a bunch of different approaches for that, and ultimately it doesn’t really seem feasible. (Typically the small-scale understanding learned is only going to be valid near the regime that it was originally observed within, and also the methods to aggregate small-scale behavior into large-scale behavior either rely on excessively nice properties or basically require you to already know what the extreme behaviors would be.)
First, I want to emphasize that durability and strength are near the furthest towards the easy side because e.g. durability is a common property seen in a lot of objects, and the benefits of durability can be seen relatively immediately and reasoned about locally. I brought them up to dispute the notion that we are guaranteed a sufficiently homogenous environment because otherwise intelligence couldn’t develop.
Another complication is, you gotta consider that e.g. being cheap is also frequently useful, especially in the sort of helpful/assistant-based role that current AIs typically take. This trades off against agency because profit-maximizing companies don’t want money tied up into durability or strength that you’re not typically using. (People, meanwhile, might want durability or strength because they find it cool, sexy or excellent—and as a consequence, those people would then gain more agency.)
Also, I do get the impression you are overestimating the feasibility of ““durability and strength will be helpful” is nevertheless a (higher-level) learnable pattern”. I can see some methods where maybe this would be robustly learnable, and I can see some regimes where even current methods would learn it, but considering its simplicity, it’s relatively far from falling naturally out of the methods.
One complication here is, currently AI is ~never designing mechanical things, which makes it somewhat harder to talk about.
(I should maybe write more but it’s past midnight and also I guess I wonder how you’d respond to this.)
Filter for homogenity of environment is anthropic selection—if environment is sufficiently heterogeneous, it kills everyone who tries to reach out of its ecological niche, general intelligence doesn’t develop and we are not here to have this conversation.
Nah, there are other methods than intelligence for survival and success. E.g. durability, strength, healing, intuition, tradition, … . Most of these developed before intelligence did.
I mean, we exist and we are at least somewhat intelligent, which implies strong upper bound on heterogenity of environment.
On the other hand, words like “durability” imply possibility of categorization, which itself implies intelligence. If environment is sufficiently heterogenous, you are durable at one second and evaporate at another.
We don’t just use intelligence.
???
Vaporization is prevented by outer space which drains away energy.
Not clear why you say durability implies intelligence, surely trees are durable without intelligence.
I feel like I’m failing to convey the level of abstraction I intend to.
I’m not saying that durability of object implies intelligence of object. I’m saying that if the world is ordered in a way that allows existence of distinct durable and non-durable objects, that means the possibility of intelligence which can notice that some objects are durable and some are not and exploit this fact.
If the environment is not ordered enough to contain intelligent beings, it’s probably not ordered enough to contain distinct durable objects too.
To be clear, by “environment” I mean “the entire physics”. When I say “environment not ordered enough” I mean “environment with physical laws chaotic enough to not contain ordered patterns”.
It seems like you are trying to convince me that intelligence exists, which is obviously true and many of my comments rely on it. My position is simply that consequentialism cannot convert intelligence into powerful agency, it can only use intelligence to bypass common obstacles.
No, my point is that in worlds where intelligence is possible, almost all obstacles are common.
If there’s some big object, then it’s quite possible for it to diminish into a large number of similar obstacles, and I’d agree this is where most obstacles come from, to the point where it seems reasonable to say that intelligence can handle almost all obstacles.
However, my assertion wasn’t that intelligence cannot handle almost all obstacles, it was that consequentialism can’t convert intelligence into powerful agency. It’s enough for there to be rare powerful obstacles in order for this to fail.
I don’t think this is the claim that the post is making but still makes sense to me. The post is saying something opposite, that the people working on the field are not doing prioritization right and so on or not thinking clearly about things while the risk is real
I’m not trying to present johnswentworth’s position, I’m trying to present my position.
I do not necessarily disagree with this, coming from a legal / compliance background. If you see any of my profiles, I constantly complain about “performative compliance” and “compliance theatre”. Painfully present across the legal and governance sectors.
That said: can you provide examples of activism or regulatory efforts that you do agree with? What does a “non fake” regulatory effort look like?
I don’t think it would be okay to dismiss your take entirely, but it would be great to see what solutions you’d propose too. This is why I disagree in principle, because there are no specific points to contribute to.
In Europe, paradoxically, some of the people “close enough to the bureaucracy” that pushed for the AI Act to include GenAI providers, were OpenAI-adjacent.
But I will rescue this:
“(b) the regulatory targets themselves are aimed at things which seem easy to target (e.g. training FLOP limitations) rather than actually stopping advanced AI”
BigTech is too powerful to lobby against. “Stopping advanced AI” per se would contravene many market regulations (unless we define exactly what you mean by advanced AI and the undeniable dangers to people’s lives). Regulators can only prohibit development of products up to certain point. They cannot just decide to “stop” development of technologies arbitrarily. But the AI Act does prohibit many types of AI systems already: Article 5: Prohibited AI Practices | EU Artificial Intelligence Act.
Those are considered to create unacceptable risks to people’s lives and human rights.
“The underlying reality is that their core products have mostly stagnated for over a year. In short: they’re faking being close to AGI.”
This seems like the most load-bearing belief in the full-cynical model; most of your other examples of fakeness rely on it in one way or another:
If the core products aren’t really improving, the progress measured on benchmarks is fake. But if they are, the benchmarks are an (imperfect but still real) attempt to quantify that real improvement.
If LLMs are stagnating, all the people generating dramatic-sounding papers for each new SOTA are just maintaining a holding pattern. But if they’re changing, then just studying/keeping up with the general properties of that progress is real. Same goes for people building and regularly updating their toy models of the thing.
Similarly, if the progress is fake, the propaganda signal-boosting that progress is also fake. If it isn’t, it isn’t. (At least directionally; a lot of that propaganda is still probably exaggerated.)
If the above three are all fake, all the people who feel real scared and want to be validated are stuck in a toxic emotional dead-end where they constantly freak out over fake things to no end. But if they’re responding to legitimate, persistent worldview updates, having a space to vibe them out with like-minded others seems important.
So, in deciding whether or not to endorse this narrative, we’d like to know whether or not the models really ARE stagnating. What makes you think the appearance of progress here is illusory?
Nope!
Even if the base models are improving, it can still be true that most of the progress measured on the benchmarks is fake, and has basically-nothing to do with the real improvements.
Even if the base models are improving, it can still be true that the dramatic sounding papers and toy models are fake, and have basically-nothing to do with the real improvements.
Even if the base models are improving, the propaganda about it can still be overblown and mostly fake, and have basically-nothing to do with the real improvements.
Even if the base models are improving, the people who feel real scared and just want to be validated can still be doing fake work and in fact be mostly useless, and their dynamic can still have basically-nothing to do with the real improvements.
Just because the base models are in fact improving does not mean that all this other stuff is actually coupled to the real improvement.
Sounds like you’re suggesting that real progress could be orthogonal to human-observed progress. I don’t see how this is possible. Human-observed progress is too broad.
The collective of benchmarks, dramatic papers and toy models, propaganda, and doomsayers are suggesting the models are simultaneously improving at: writing code, researching data online, generating coherent stories, persuading people of things, acting autonomously without human intervention, playing Pokemon, playing Minecraft, playing chess, aligning to human values, pretending to align to human values, providing detailed amphetamine recipes, refusing to provide said recipes, passing the Turing test, writing legal documents, offering medical advice, knowing what they don’t know, being emotionally compelling companions, correctly guessing the true authors of anonymous text, writing papers, remembering things, etc, etc.
They think all these improvements are happening at the same time in vastly different domains because they’re all downstream of the same task, which is text prediction. So, they’re lumped together in the general domain of ‘capabilities’, and call a model which can do all of them well a ‘general intelligence’. If the products are stagnating, sure, all those perceived improvements could be bullshit. (Big ‘if’!) But how could the models be ‘improving’ without improving at any of these things? What domains of ‘real improvement’ exist that are uncoupled to human perceptions of improvement, but still downstream of text prediction?
As defined, this is a little paradoxical: how could I convince a human like you to perceive domains of real improvement which humans do not perceive...?
See, this is exactly the example I would have given: truesight is an obvious example of a domain of real improvement which appears on no benchmarks I am aware of, but which appears to correlate strongly with the pretraining loss, is not applied anywhere (I hope), is unobvious that LLMs might do it and the capability does not naturally reveal itself in any standard use-cases (which is why people are shocked when it surfaces), and it would have been easy for no one to have observed it up until now or dismissed it, and even now after a lot of publicizing (including by yours truly), only a few weirdos know much about it.
Why can’t there be plenty of other things like inner-monologue or truesight? (“Wait, you could do X? Why didn’t you tell us?” “You never asked.”)
Maybe a better example would be to point out that ‘emergent’ tasks in general, particularly multi-step tasks, can have observed success rates of precisely 0 in feasible finite samples, but extreme brute-force sampling reveals hidden scaling. Humans would perceive zero improvement as the models scaled (0/100 = 0%, 0⁄100 = 0%, 0⁄100 = 0%...), even though they might be rapidly improving from 1⁄100,000 to 1⁄10,000 to 1⁄1,000 to… etc. “Sampling can show the presence of knowledge but not the absence.”
Oops, yes. I was thinking “domains of real improvement which humans are currently perceiving in LLMs”, not “domains of real improvement which humans are capable of perceiving in general”. So a capability like inner-monologue or truesight, which nobody currently knows about, but is improving anyway, would certainly qualify. And the discovery of such a capability could be ‘real’ even if other discoveries are ‘fake’.
That said, neither truesight nor inner-monologue seem uncoupled to the more common domains of improvement, as measured in benchmarks and toy models and people-being-scared. The latter, especially, I thought was popularized because it was so surprisingly good at improving benchmark performance. Truesight is narrower, but at the very least we’d expect it to correlate with skill in the common “write [x] in the style of [y]” prompt, right? Surely the same network of associations which lets it accurately generate “Eliezer Yudkowsky wrote this” after a given set of tokens, would also be useful for accurately finishing a sentence starting with “Eliezer Yudkowksy says...”.
So I still wouldn’t consider these things to have basically-nothing to do with commonly perceived domains of improvement.
Inner-monologue is an example because as far as we know, it should have existed in pre-GPT-3 models and been constantly improving, but we wouldn’t have noticed because no one would have been prompting for it and if they had, they probably wouldn’t have noticed it. (The paper I linked might have demonstrated that by finding nontrivial performance in smaller models.) Only once it became fairly reliable in GPT-3 could hobbyists on 4chan stumble across it and be struck by the fact that, contrary to what all the experts said, GPT-3 could solve harder arithmetic or reasoning problems if you very carefully set it up just right as an elaborate multi-step process instead of what everyone did, which was just prompt it for the answer right away.
Saying it doesn’t count because once it was discovered it was such a large real improvement, is circular and defines away any example. (Did it not improve benchmarks once discovered? Then who cares about such an ‘uncoupled’ capability; it’s not a real improvement. Did it subsequently improve benchmarks once discovered? Then it’s not really an example because it’s ‘coupled’...) Surely the most interesting examples are ones which do exactly that!
And of course, now there is so much discussion, and so many examples, and it is in such widespread use, and has contaminated all LLMs being trained since, that they start to do it by default given the slightest pretext. The popularization eliminated the hiddenness. And here we are with ‘reasoning models’ which have blown through quite a few older forecasts and moved timelines earlier by years, to the extent that people are severely disappointed when a model like GPT-4.5 ‘only’ does as well as the scaling laws predicted and they start predicting the AI bubble is about to pop and scaling has been refuted.
But that would be indistinguishable from many other sources of improvement. For starters, by giving a name, you are only testing one direction: ‘name → output’; truesight is about ‘name ← output’. The ‘reversal curse’ is an example of how such inference arrows are not necessarily bidirectional and do not necessarily scale much. (But if you didn’t know that, you would surely conclude the opposite.) There are many ways to improve performance of predicting output: better world-knowledge, abstract reasoning, use of context, access to tools or grounding like web search… No benchmark really distinguishes between these such that you could point to a single specific number and say, “that’s the truesight metric, and you can see it gets better with scale”.
Gotta love how much of a perfect Scissor statement this is. (Same as my “o3 is not that impressive”.)
SB1047 was a pretty close shot to something really helpful. The AI Act and its code of practice might be insufficient, but there are good elements in it that, if applied, would reduce the risks. The problem is that it won’t be applied because of internal deployment.
But I sympathise somewhat with stuff like this:
No, it wasn’t. It was a pretty close shot to something which would have gotten a step closer to another thing, which itself would have gotten us a step closer to another thing, which might have been moderately helpful at best.
You really think those elements are not helpful? I’m really curious
Sure, they are more-than-zero helpful. Heck, in a relative sense, they’d be one of the biggest wins in AI safety to date. But alas, reality does not grade on a curve.
One has to bear in mind that the words on that snapshot do not all accurately describe reality in the world where SB1047 passes. “Implement shutdown ability” would not in fact be operationalized in a way which would ensure the ability to shutdown an actually-dangerous AI, because nobody knows how to do that. “Implement reasonable safeguards to prevent societal-scale catastrophes” would in fact be operationalized as checking a few boxes on a form and maybe writing some docs, without changing deployment practices at all, because the rules for the board responsible for overseeing these things made it pretty easy for the labs to capture.
When I discussed the bill with some others at the time, the main takeaway was that the actually-substantive part was just putting any bureaucracy in place at all to track which entities are training models over 10^26 FLOP/$100M. The bill seemed unlikely to do much of anything beyond that.
Even if the bill had been much more substantive, it would still run into the standard problems of AI regulation: we simply do not have a way to reliably tell which models are and are not dangerous, so the choice is to either ban a very large class of models altogether, or allow models which will predictably be dangerous sooner or later. The most commonly proposed substantive proxy is to ban models over a certain size, which would likely slow down timelines by a factor of 2-3 at most, but definitely not slow down timelines by a factor of 10 or more.
… or, if we do live in a world in which LLMs are not AGI-complete, it might accelerate timelines. After all, this would force the capabilities people to turn their brains on again instead of mindlessly scaling, and that might lead to them stumbling on something which is AGI-complete. And it would, due to a design constraint, need much less compute for committing omnicide.
How likely would that be? Companies/people able to pivot like this would need to be live players, capable of even conceiving of new ideas that aren’t “scale LLMs”. Naturally, that means 90% of the current AI industry would be out of the game. But then, 90% of the current AI industry aren’t really pushing the frontier today either; that wouldn’t be much of a loss.
To what extent are the three AGI labs alive vs. dead players, then?
OpenAI has certainly been alive back in 2022. Maybe the coup and the exoduses killed it and it’s now a corpse whose apparent movement is just inertial (the reasoning models were invented prior to the coup, if Q* rumors are to be trusted, so it’s little evidence that OpenAI was still alive in 2024). But maybe not.
Anthropic houses a bunch of the best OpenAI researchers now, and it’s apparently capable of inventing some novel tricks (whatever’s the mystery behind Sonnet 3.5 and 3.6).
DeepMind is even now consistently outputting some interesting non-LLM research.
I think there’s a decent chance that they’re alive enough. Currently, they’re busy eating the best AI researchers and turning them into LLM researchers. If they stop focusing people’s attention on the potentially-doomed paradigm, if they’re forced to correct the mistake (on this model) that they’re making...
This has always been my worry about all the proposals to upper-bound FLOPs, complicated by my uncertainty regarding whether LLMs are or are not AGI-complete after all.
One major positive effect this might have is memetic. It might create the impression of an (artificially created) AI Winter, causing people to reflexively give up. In addition, not having an (apparent) in-paradigm roadmap to AGI would likely dissolve the race dynamics, both between AGI companies and between geopolitical entities. If you can’t produce straight-line graphs suggesting godhood by 2027, and are reduced to “well we probably need a transformer-sized insight here...”, it becomes much harder to generate hype and alarm that would be legible to investors and politicians.
But then, in worlds in which LLMs are not AGI-complete, how much actual progress to AGI is happening due to the race dynamic? Is it more or less progress than would be produced by a much-downsized field in the counterfactual in which LLM research is banned? How much downsizing would it actually cause, now that the ideas of AGI and the Singularity have gone mainstream-ish? Comparatively, how much downsizing would be caused by the chilling effect if the presumably doomed LLM paradigm is let to run its course of disappointing everyone by 2030 (when the AGI labs can scale no longer)?
On balance, upper-bounding FLOPs is probably still a positive thing to do. But I’m not really sure.
I disagree that the default would’ve been that the board would’ve been “easy for the labs to capture” (indeed, among the most prominent and plausible criticisms of its structure was that it would overregulate in response to political pressure), and thus that it wouldn’t have changed deployment practices. I think the frontier companies were in a good position to evaluate this, and they decided to oppose the bill (and/or support it conditional on sweeping changes, including the removal of the Frontier Model Division).
Also, I’m confused when policy skeptics say things like “sure, it might slow down timelines by a factor of 2-3, big deal.” Having 2-3x as much time is indeed a big deal!
Probably not going to have a discussion on the topic right now, but out of honest curiosity: did you read the bill?
I’m glad we agree “they’d be one of the biggest wins in AI safety to date.”
How so? It’s pretty straightforward if the model is still contained in the lab.
I think ticking boxes is good. This is how we went to the Moon, and it’s much better to do this than to not do it. It’s not trivial to tick all the boxes. Look at the number of boxes you need to tick if you want to follow the Code of Practice of the AI Act or this paper from DeepMind.
How so? I think capabilities evaluations are much simpler than alignment evals, and at the very least we can run those. You might say: “A model might sandbag.” Sure, but you can fine-tune it and see if the capabilities are recovered. If even with some fine-tuning the model is not able to do the tasks at all, modulo the problem of gradient hacking that is, I think, very unlikely, we can be pretty sure that the model wouldn’t be capable of doing such feat. I think at the very least, following the same methodology as the one followed by Anthropic in their last system cards is pretty good and would be very helpful.
100% agreed @Charbel-Raphaël.
The EU AI Act even mentions “alignment with human intent” explicitly, as a key concern for systemic risks. This is in Recital 110 (which defines what are systemic risks and how they may affect society).
I do not think any law has mentioned alignment like this before, so it’s massive already.
Will a lot of the implementation efforts feel “fake”? Oh, 100%. But I’d say that this is why we (this community) should not disengage from it...
I also get that the regulatory landscape in the US is another world entirely (which is what the OP is bringing up).
Your very first point is, to be a little uncharitable, ‘maybe OpenAI’s whole product org is fake.’ I know you have a disclaimer here but you’re talking about a product category that didn’t exist 30 months ago that today has this one website now reportedly used by 10% of people in the entire world and that the internet is saying expects ~12B revenue this year.
If your vibes are towards investing in that class of thing being fake or ‘mostly a hype machine’ then your vibes are simply not calibrated well in this domain.
No, the model here is entirely consistent with OpenAI putting out some actual cool products. Those products (under the model) just aren’t on a path to AGI, and OpenAI’s valuation is very much reliant on being on a path to AGI in the not-too-distant future. It’s the narrative about building AGI which is fake.
Really? I’m mostly ignorant on such matters, but I’d thought that their valuation seemed comically low compared to what I’d expect if their investors thought that OpenAI was likely to create anything close to a general superhuman AI system in the near future.[1] I considered this evidence that they think all the AGI/ASI talk is just marketing.
Well ok, if they actually thought OpenAI would create superintelligence as I think of it, their valuation would plummet because giving people money to kill you with is dumb. But there’s this space in between total obliviousness and alarm, occupied by a few actually earnest AI optimists. And, it seems to me, not occupied by the big OpenAI investors.
Consider, in support: Netflix has a $418B market cap. It is inconsistent to think that a $300B valuation for OpenAI or whatever’s in the news requires replacing tens of trillions of dollars of capital before the end of the decade.
Similarly, for people wanting to argue from the other direction, who might think a low current valuation is case-closed evidence against their success chances, consider that just a year ago the same argument would have discredited how they are valued today, and a year before that would have discredited where they were a year ago, and so forth. This holds similarly for historic busts in other companies. Investor sentiment is informational but clearly isn’t definitive, else stocks would never change rapidly.
To be clear: I think the investors would be wrong to think that AGI/ASI soon-ish isn’t pretty likely.
But most of your criticisms in the point you gave have ~no bearing on that? If you want to make a point about how effectively OpenAI’s research moves towards AGI you should be saying things relevant to that, not giving general malaise about their business model.
Or, I might understand ‘their business model is fake which implies a lack of competence about them broadly,’ but then I go back to the whole ‘10% of people in the entire world’ and ‘expects 12B revenue’ thing.
The point of listing the problems with their business model is that they need the AGI narrative in order to fuel the investor cash, without which they will go broke at current spend rates. They have cool products, they could probably make a profit if they switched to optimizing for that (which would mean more expensive products and probably a lot of cuts), but not anywhere near the level of profits they’d need to justify the valuation.
That’s how I interpreted it originally; you were arguing their product org vibed fake, I was arguing your vibes were miscalibrated. I’m not sure what to say to this that I didn’t say originally.
The activists and the lobbyists are two very different groups. The activists are not trying to network with the DC people (yet). Unless you mean Encode, who I would call lobbyists, not activists.
Good point, I should have made those two separate bullet points:
Then there’s the AI regulation lobbyists. They lobby and stuff, pretending like they’re pushing for regulations on AI, but really they’re mostly networking and trying to improve their social status with DC People. Even if they do manage to pass any regulations on AI, those will also be mostly fake, because (a) these people are generally not getting deep into the bureaucracy which would actually implement any regulations, and (b) the regulatory targets themselves are aimed at things which seem easy to target (e.g. training FLOP limitations) rather than actually stopping advanced AI. The activists and lobbyists are nominally enemies of OpenAI, but in practice they all benefit from pushing the same narrative, and benefit from pretending that everyone involved isn’t faking everything all the time.
Also, there’s the AI regulation activists, who e.g. organize protests. Like ~98% of protests in general, such activity is mostly performative and not the sort of thing anyone would end up doing if they were seriously reasoning through how best to spend their time in order to achieve policy goals. Calling it “fake” feels almost redundant. Insofar as these protests have any impact, it’s via creating an excuse for friendly journalists to write stories about the dangers of AI (itself an activity which mostly feeds the narrative, and has dubious real impact).
(As with the top level, epistemic status: I don’t fully endorse all this, but I think it’s a pretty major mistake to not at least have a model like this sandboxed in one’s head and check it regularly.)
Oh, if you’re in the business of compiling a comprehensive taxonomy of ways the current AI thing may be fake, you should also add:
Vibe coders and “10x’d engineers”, who (on this model) would be falling into one of the failure modes outlined here: producing applications/features that didn’t need to exist, creating pointless code bloat (which helpfully show up in productivity metrics like “volume of code produced” or “number of commits”), or “automatically generating” entire codebases in a way that feels magical, then spending so much time bugfixing them it eats up ~all perceived productivity gains.
e/acc and other Twitter AI fans, who act like they’re bleeding-edge transhumanist visionaries/analysts/business gurus/startup founders, but who are just shitposters/attention-seekers who will wander off and never look back the moment the hype dies down.
True, but I feel a bit bad about punching that far down.
What are the other basically-fake fields out there?
quantum computing, nuclear fusion
I share some similar frustrations, and unfortunately these are also prevalent in other parts of the human society. The commonality of most of these fakeness seem to be impure intentions—there are impure/non-intrinsic motivations other than producing the best science/making true progress. Some of these motivations unfortunately could be based on survival/monetary pressure, and resolving that for true research or progress seems to be critical. We need to encourage a culture of pure motivations, and also equip ourselves with more ability/tools to distinguish extrinsic motivations.