I’d like to be convinced that I’m wrong, but I just watched a Kling AI video of Justin Timberlake drinking soda and it was pretty real looking. This plus Voice delay from OpenAI plus Yi-Large in top 10 on the LMSYS leader board after company has only existed 1 year plus just the general vibe has me really convinced that:
There is no moat. Chinese labs are now at peer level to Western AI labs. I mean, maybe they don’t have big text context lengths yet and maybe they have fewer GPUs, but Zvi, gwern and other’s insistance about not needing to worry—they don’t have the secret sauce yet—is, to put it politely, absolute nonsense. All the secrets have already leaked out. Only a month ago I was told that SORA like video was out of reach. Now we see, anyone can do it. Everyone and their mother is popping out video generation tools. The food eating videos from KLing should give everyone pause.
Predictions:
(Item removed. Realized that paper I was refering to would affect inference time compute, not training compute.)
By years end, some Chinese-made LLM will be atop the LMSYS leaderboard. (60%)
Beyond Sora text-to-video and image-to-video generation wide-released to general Chinese public by end of year (80%). Capable of generating multiple minutes of video. (70%, given the first statement). Generation times less than half that of Sora. (80%) Compute less than half that of Sora. (90%)
Chips of similar quality to ones produced by TSMC or Samsung will be produced by a Chinese firm within 2 years (50%). This will be accomplished by using a new lithographic process to sidestep the need for embargoed advanced etching machines or by reverse engineering one of the latest etching machines (smuggling it from Korea or Japan) (80%, given the first statement is true)
Advanced inexpensive Chinese personal robots will overwhelm the western markets, destroying current western robotics industry in the same way that the West’s small kitchen appliance industry was utterly crushed. (70%) Data from these robots will make its way to CCP (90%, given the first statement is true)
What does this mean: the West is caught backfoot again. Despite creating the technology, China, by sheer size and directed investment, is poised to crush the West in AI. We saw this same story with electric cars, solar panels, robotics. Fast copy (or steal) and then quickly iterate and scale is extremely effective and there is no easy way to combat it. Market asymmetries mean that Chinese firms always have a large market without competitors while Western markets are bombarded with cheap alternatives to domestic brands.
If these were Japanese firms in the 1980s or Korean firm in the 2000s, we could sit back and relax. Sure, they may be ahead, but they are friendly and so we can reap the benefits. That is not the case here, especially with the possibility of AGI. Chinese firms in the 2020s, funded and controlled by CCP and subject to civil-military fusion laws, the tech is likely already being deployed in weapon systems, propaganda tools, etc. If LLMs scale to AGI and the Chinese get it first, the West is cooked in a scary existential way, over and above the general danger of AGI.
Why? Observe the flood of fentanyl precursors streaming from Chinese ports to Mexico. This could be stopped, but is permitted because it serves the CCP’s ends. Observe the Chinese chips making their way into Russian weapon systems. This could be stopped, but it serves the CCP’s ends that its vassal Russia crush western advancement. Now imagine the same entity had AGI. This is not to say that the West has a good track record—Iran-Contra, Iraq, Afghanistan, arms to rogue regimes, propping up South American despots, turning a blind eye to South African apartheid for decades, etc. But the various checks and balances in the West often mean that there is a meaningful way to change such policies, especially ones that look calculated to disempower and subordinate. An AGI China is scary as fuck. Unchecked power. The CCP already has millions of people in work camps and promotes re-education (ethnic cleansing) in “wayward” provinces. Extrapolate a little.
Again, I am eager to be convinced I am wrong. I hate to beat this same drum over and over.
Papers like the one involving elimination of matrix-multiplication suggest that there is no need for warehouses full of GPUs to train advanced AI systems.
The paper is about getting rid of multiplication in inference, not in training (specifically, in focuses on attention rather than MLP). Quantization aware training creates models with extreme levels of quantization that are not much worse than full precision models (this is currently impossible to do post-training, if training itself wasn’t built around targeting this outcome). The important recent result is ternary quantization where weights in MLP become {-1, 0, 1}, and thus multiplication by such a matrix no longer needs multiplication by weights. So this is relevant for making inference cheaper or running models locally.
There seems to be a huge jump from: there’s no moat around generative AI (makes sense as how to make one is publicly known, and the secret sauce is just about improving performance) to… all the other stuff which seems completely unrelated?
I acknowledge this. My thinking is a bit scattered and my posts are often just an attempt to articulate publically somewhere intuitions that I have no outlet elsewhere to discuss and refine.
I’m saying first off, there is no moat. Yet I observe people on this and similar forums with the usual refrain: but look, the West is so far ahead in doing X in AI, so we shouldn’t use China as a boogie man when discussing AI policy. I claim this is bogus. The West isn’t far ahead in X because everything can be fast copied, stolen, brute forced and limits on hardware, etc. appear ineffective. Lots of the arguments in favor of disregarding China in setting AI safety policy assume it being perpetually a few steps behind. But if they are getting similar performance, then they aren’t behind.
So if there is no moat, and we can expect peer performance soon, then we should be worried because we have reason to believe that if scaling + tweaks can reach AGI, then China might conceivably get AGI first, which would be very bad. I have seen replies to this point of: well, how do you know it would be that much worse? Surely Xi wants human flourishing as well. And my response is: governments do terrible things. At least in the West, the public can see these terrible things and sometimes say, hey: I object. This is bad. The PRC has no mechanism. So AGI would be dangerous in their hands in a way that it might not be...at least initially, in the West...and the PRC is starting from a not so pro-flourishing position (Uighur slavery and genocide, pro-Putinism, invade Taiwan fever, debt trap diplomacy, secret police abroad, etc.).
If you think AGI kills everyone anyway, then this doesn’t matter. If you think AGI just makes the group possessing it really powerful and able to disempower or destroy competitors, then this REALLY matters, and policies designed to hinder Western AI development could mean Western disempowerment, subjugation, etc.
I make no guarantees about the coherence of this argument and welcome critiques. Personally, I hope to be wrong.
I would be willing to bet maybe $100 on the video prediction one. Kling is already in beta. As soon as it is released to the general public, that is satisfied. The only uncertainty is whether Chinese authorities crack down on such services for insufficient censorship of requests.
Advanced inexpensive Chinese personal robots will overwhelm the western markets, destroying current western robotics industry in the same way that the West’s small kitchen appliance industry was utterly crushed. (70%) Data from these robots will make its way to CCP (90%, given the first statement is true)
By what time period are you imagining this happening by?
If there’s a similar work culture in AI innovation, that doesn’t sound optimal for developing something faster than the U.S. when “outside the LLM” thinking might ultimately be needed to develop AGI.
Also, Xi has recently called for more innovation in AI and other tech sectors:
destroying current western robotics industry in the same way that the West’s small kitchen appliance industry was utterly crushed. (70%)
I’ve heard that the US is already ahead on advanced — IE, fully automated — manufacturing. China’s manufacturing economy depends on cheap human labor, which is an advantage they seem to be losing for some reason. I don’t see much of a reason to think there’s going to be a continuity between Chinese dominance in the pre-robotics manufacturing era and the next manufacturing era.
Papers like the one involving elimination of matrix-multiplication suggest that there is no need for warehouses full of GPUs to train advanced AI systems. Sudden collapse of Nvidia. (60%)
So the usual refrain from Zvi and others is that the specter of China beating us to the punch with AGI is not real because limits on compute, etc. I think Zvi has tempered his position on this in light of Meta’s promise to release the weights of its 400B+ model. Now there is word that SenseTime just released a model that beats GPT-4 Turbo on various metrics. Of course, maybe Meta chooses not to release its big model, and maybe SenseTime is bluffing—I would point out though that Alibaba’s Qwen model seems to do pretty okay in the arena...anyway, my point is that I don’t think the “what if China” argument can be dismissed as quickly as some people on here seem to be ready to do.
Are you saying that China will use Llama 3 400B weights as a basis for improving their research on LLMs? Or to make more tools from? Or to reach real AGI? Or what?
Yes, yes. Probably not. And they already have a Sora clone called Vidu, for heaven’s sake.
We spend all this time debating: should greedy companies be in control, should government intervene, will intervention slow progress to the good stuff: cancer cures, longevity, etc. All of these arguments assume that WE (which I read as a gloss for the West) will have some say in the use of AGI. If the PRC gets it, and it is as powerful as predicted, these arguments become academic. And this is not because the Chinese are malevolent. It’s because, AGI would fall into the hands of the CCP via their civil-military fusion. This is a far more calculating group than those in Western governments. Here officials have to worry about getting through the next election. There, they can more comfortably wield AGI for their ends while worrying less about palatability of the means: observe how the population quietly endured a draconian lock-down and only meekly revolted when conditions began to deteriorate and containment looked futile.
I am not an accelerationist. But I am a get-it-before-them-ist. Whether the West (which I count as including Korea and Japan and Taiwan) can maintain our edge is an open question. A country that churns out PhDs and loves AI will not be easily thwarted.
And they already have a Sora clone called Vidu, for heaven’s sake.
No, they don’t. They have a video generation model, which is one of a great many published over the past few years as image generation increasingly became solved, such as Imagen Video or Phenaki from Google years ago, and the Vidu samples are clearly inferior to Sora (despite heavy emphasis on the ‘pan over static scene’ easy niche): https://www.youtube.com/watch?v=u1R-jxDPC70
Here we are in 2024, and we’re still being told how Real Soon Now Chinese DL will crush Westerners. I’ve been hearing this for almost a decade now, and I’ve stopped being impressed by the likes of Hsu talking about how “China graduates a million engineers a year!” or whatever. Somehow, the Next Big Thing never comes out of Chinese DL, no matter how many papers or citations or patents they have each year. Something to think about.
(I also have an ongoing Twitter series where every half year or so, I tweet a few of the frontier-pushing Western DL achievements, and I ask for merely 3 Chinese things as good—not better, just plausibly as good, including in retrospect from previous years. You know how many actual legitimate answers I’ve gotten? Like 1. Somehow, all the e/accs and China hawks like Alexandr Wang can’t seem to think of even a single one which was at or past the frontier, as opposed to the latest shiny ‘catches up to GPT-4!* * [on narrow benchmarks, YMMV]’ clone model.)
Quantify how much worse the PRC getting AGI would be than OpenAI getting it, or the US government, and how much existential risk there is from not pausing/pausing, or from the PRC/OpenAI/the US government building AGI first, and then calculating whether pausing to do {alignment research, diplomacy, sabotage, espionage} is higher expected value than moving ahead.
(Is China getting AGI first half the value of the US getting it first, or 10%, or 90%?)
The discussion over pause or competition around AGI has been lacking this so far. Maybe I should write such an analysis.
Nvidia just low-key released its own 340B parameter model. For those of you worried about the releasing of model weights becoming the norm, this will probably aggravate your fears.
For those curious about the performance: eyeballing the technical report, it roughly performs at the level of LLama-3 70B. It seems to have an inferior parameters-to-performance ratio because it was only trained on 9 trillion tokens, while the Llama-3 models were trained on 15 trillion tokens. It’s also trained with a 4k context length as opposed to Llama-3′s 8k. Its primary purpose seems to be the synthetic data pipeline thing.
Now, apparently, someone has used Kling to make a commercial for a Mad Max-themed beer. Zvi would call this mundane utility.
What it demonstrates is that the Chinese can fast copy anything we do, improve around the edges, and release a product. Frontier model...boom, fast copied. The amount of compute required for some of these tasks makes me suspect big leaks from frontier labs. Also, because big labs here are reluctant to release any new models ahead of this years elections, Chinese counterparts get a head start with copying and product diffusion. We could see a situation like the one with TikTok. A Chinese firm creates an intel slurping app that it releases to the West (but doesn’t allow internally) and then the West cannot get rid of it because the Chinese proceed to abuse Western legal processes. A video generating application is the poster child for an app that can be tweaked to cause destabilization while also hiding behind free speech protections.
While I’m not sure about doom from AGI, my p(doom) for West rises every time another one of these fast copies happens. The asymmetry in Sino-Western relations—Chinese firms can enter western markets but not the reverse—ensures this dynamic will continue until western firms and labs lose predominance in everything.
When I was in middle school, our instructor was trying to teach us about the Bill of Rights. She handed out a paper copy and I immediately identified that Article the first (sic) and Article the second (sic) were not among the first ten amendments and that the numbers for the others were wrong. I boldly asserted that this wasn’t the Bill of Rights and the teacher apologized and cursed the unreliable Internet. But I was wrong. This WAS the Bill of Rights, but the BILL rather than the ten ratified amendments. Everyone came away wrongly informed from that exchange.
Edit: I wrote before that I identified that they were not in the Constitution, but article the second is, as the 27th amendment, and I knew that, but it wasn’t among the first ten.
Anyone paying attention to the mystery of the GPT-2 chatbot that has appeared on lmsys? People are saying it operates at levels comparable to or exceeding GPT-4. I’m writing because I think the appearance of mysterious unannounced chatbots for public use without provenance makes me update my p(doom) upward.
Possibilities:
this is a OpenAI chatbot based on GPT-4, just like it says it is. It has undergone some more tuning and maybe has boosted reasoning because of methods described in one of the more recently published papers
this is another big American AI company masquarading OpenAI
this is a big Chinese AI company masquerading as OpenAI
this is an anonymous person or group who is using some GPT-4 fine tune API to improve performance
Possibility 1 seems most likely. If that is the case, I guess it is alright, assuming it is purely based on GPT-4 and isn’t a new model. I suppose if they wanted to test on lmsys to gauge performance anonymously, they couldn’t slap 4.5 on it, but they also couldn’t ethically give it the name of another company’s model. Giving it an entirely new name would invite heavy suspicion. So calling it the name of an old model and monitoring how it does in battle seems like the most ethical compromise. Still, even labeling a model with a different name feels deceptive.
Possibility 2 would be extremely unethical and I don’t think it is the case. Also, the behavior of the model looks more like GPT-4 than another model. I expect lawsuits if this is the case.
Possibility 3 would be extremely unethical, but is possible. Maybe they trained a model on many GPT-4 responses and then did some other stuff. Stealing a model in this way would probably accelerate KYC legislation and yield outright bans on Chinese rental of compute. If this is the case, then there is no moat because we let our moat get stolen.
Possibility 4 is a something someone mentioned in Twitter. I don’t know whether it is viable.
In any case, releasing models in disguise onto the Internet lowers my expectations for companies to behave responsibly and transparently. It feels a bit like Amazon and their scheme to collect logistics data from competitors by calling itself a different name. In that case, like this, the facade was paper thin...the headquarters of the fake company was right next to Amazon, but it worked for a long while. Since I think 1 is the mostly likely, I believe OpenAI wants to make sure it soundly beats everyone else in the rankings before releasing an update with improvements. But didn’t they just release an update a few weeks ago? Hmm.
I’m not entirely sure if it’s the same gpt2 model I’m experimenting with in the past year. If I get my hands on it, I will surely try to stretch its context window—and see if it exceeds 1024 tokens to test if its really gpt2.
It definitely exceeds 1024 BPEs context (we wouldn’t be discussing it if it didn’t, I don’t think people even know how to write prompts that, combined with the system prompt etc, even fit in 1024 BPEs anymore), and it is almost certainly not GPT-2, come on.
Copy and pasting an entire paper/blog and asking the model to summarize it? - this isn’t hard to do, and it’s very easy to know if there is enough tokens, just run the text in any BPE tokenizer available online.
Sure, the poem prompt I mentioned using is like 3500 characters all on its own, and it had no issues repeatedly revising and printing out 4 new iterations of the poem without apparently forgetting when I used up my quota yesterday, so that convo must’ve been several thousand BPEs.
Yeah, I saw your other replies in another thread and I was able to test it myself later today and yup it’s most likely that it’s OpenAI’s new LLM. I’m just still confused why call such gpt2.
Altman made a Twitter-edit joke about ‘gpt-2 i mean gpt2’, so at this point, I think it’s just a funny troll-name related to the ‘v2 personality’ which makes it a successor to the ChatGPT ‘v1’, presumably, ‘personality’. See, it’s gptv2 geddit not gpt-2? very funny, everyone lol at troll
A Chinese company released a new SORA competitor—Kling—and it is arguably superior to SORA publically available. Could be exfiltration or could be genuinely home grown. In any case, the moat is all gone.
So US has already slipped behind despite chip limits. I also saw that Llama 3 was already bested by Qwen 2. We are about a week away from some Chinese model surpassing GPT-4o on Lmsys. I want to hear the China-is-no-big-deal folks explain this.
Possible. Possible. But I don’t see how that is more likely than that Alibaba just made something better. Or they made something with with lots of contamination. I think this should make us update toward not underestimating them. The Kling thing is a whole nother issue. If it is confirmed text-to-video and not something else, then we are in big trouble because the chip limits have failed.
For what it’s worth, Yann LeCun argues that video diffusion models like Sora, or any models which predict pixels, are useless for creating an AGI world model. So this might be a dead end. The reason, according to LeCun, is that pixel data is very high dimensional and redundant compared to text (LLMs only use something like 65.000 tokens), which makes exact prediction less useful. In his 2022 outline of his proposed AGI framework, JEPA, he instead proposes an architecture which predicts embeddings rather than exact pixels.
10 AI dropped a model on Lmsys that is doing fairly well, briefly overtaking Claude Opus before slipping a bit. Just another reminder that, as we wring our hands about dodgy behavior by Open AI, apparently these Chinese firms are getting compute (despite our efforts to restrict this) and releasing powerful and competitive models.
I think I’ve switched positions on open source models. Before I felt that we must not release them because they can be easily fine-tuned to remove safety measures and represent a tech donation to adversaries. But now I feel the harm posed by these open source models seems pretty small and that because Alibaba is releasing them at an exceptionally rapid pace, western forbearance will not affect their proliferation.
The stakes with open weights for current models are much lower than for hypothetical long-horizon capable models, where removal of safety tuning becomes a stronger argument. The major effects with current models are wide availability for post-training and interpretability research, and feeding the norm of publishing weights that might persist with future dangerous models.
I’d like to be convinced that I’m wrong, but I just watched a Kling AI video of Justin Timberlake drinking soda and it was pretty real looking. This plus Voice delay from OpenAI plus Yi-Large in top 10 on the LMSYS leader board after company has only existed 1 year plus just the general vibe has me really convinced that:
There is no moat. Chinese labs are now at peer level to Western AI labs. I mean, maybe they don’t have big text context lengths yet and maybe they have fewer GPUs, but Zvi, gwern and other’s insistance about not needing to worry—they don’t have the secret sauce yet—is, to put it politely, absolute nonsense. All the secrets have already leaked out. Only a month ago I was told that SORA like video was out of reach. Now we see, anyone can do it. Everyone and their mother is popping out video generation tools. The food eating videos from KLing should give everyone pause.
Predictions:
(Item removed. Realized that paper I was refering to would affect inference time compute, not training compute.)
By years end, some Chinese-made LLM will be atop the LMSYS leaderboard. (60%)
Beyond Sora text-to-video and image-to-video generation wide-released to general Chinese public by end of year (80%). Capable of generating multiple minutes of video. (70%, given the first statement). Generation times less than half that of Sora. (80%) Compute less than half that of Sora. (90%)
Chips of similar quality to ones produced by TSMC or Samsung will be produced by a Chinese firm within 2 years (50%). This will be accomplished by using a new lithographic process to sidestep the need for embargoed advanced etching machines or by reverse engineering one of the latest etching machines (smuggling it from Korea or Japan) (80%, given the first statement is true)
Advanced inexpensive Chinese personal robots will overwhelm the western markets, destroying current western robotics industry in the same way that the West’s small kitchen appliance industry was utterly crushed. (70%) Data from these robots will make its way to CCP (90%, given the first statement is true)
What does this mean: the West is caught backfoot again. Despite creating the technology, China, by sheer size and directed investment, is poised to crush the West in AI. We saw this same story with electric cars, solar panels, robotics. Fast copy (or steal) and then quickly iterate and scale is extremely effective and there is no easy way to combat it. Market asymmetries mean that Chinese firms always have a large market without competitors while Western markets are bombarded with cheap alternatives to domestic brands.
If these were Japanese firms in the 1980s or Korean firm in the 2000s, we could sit back and relax. Sure, they may be ahead, but they are friendly and so we can reap the benefits. That is not the case here, especially with the possibility of AGI. Chinese firms in the 2020s, funded and controlled by CCP and subject to civil-military fusion laws, the tech is likely already being deployed in weapon systems, propaganda tools, etc. If LLMs scale to AGI and the Chinese get it first, the West is cooked in a scary existential way, over and above the general danger of AGI.
Why? Observe the flood of fentanyl precursors streaming from Chinese ports to Mexico. This could be stopped, but is permitted because it serves the CCP’s ends. Observe the Chinese chips making their way into Russian weapon systems. This could be stopped, but it serves the CCP’s ends that its vassal Russia crush western advancement. Now imagine the same entity had AGI. This is not to say that the West has a good track record—Iran-Contra, Iraq, Afghanistan, arms to rogue regimes, propping up South American despots, turning a blind eye to South African apartheid for decades, etc. But the various checks and balances in the West often mean that there is a meaningful way to change such policies, especially ones that look calculated to disempower and subordinate. An AGI China is scary as fuck. Unchecked power. The CCP already has millions of people in work camps and promotes re-education (ethnic cleansing) in “wayward” provinces. Extrapolate a little.
Again, I am eager to be convinced I am wrong. I hate to beat this same drum over and over.
The paper is about getting rid of multiplication in inference, not in training (specifically, in focuses on attention rather than MLP). Quantization aware training creates models with extreme levels of quantization that are not much worse than full precision models (this is currently impossible to do post-training, if training itself wasn’t built around targeting this outcome). The important recent result is ternary quantization where weights in MLP become {-1, 0, 1}, and thus multiplication by such a matrix no longer needs multiplication by weights. So this is relevant for making inference cheaper or running models locally.
Good point.
There seems to be a huge jump from: there’s no moat around generative AI (makes sense as how to make one is publicly known, and the secret sauce is just about improving performance) to… all the other stuff which seems completely unrelated?
I acknowledge this. My thinking is a bit scattered and my posts are often just an attempt to articulate publically somewhere intuitions that I have no outlet elsewhere to discuss and refine.
I’m saying first off, there is no moat. Yet I observe people on this and similar forums with the usual refrain: but look, the West is so far ahead in doing X in AI, so we shouldn’t use China as a boogie man when discussing AI policy. I claim this is bogus. The West isn’t far ahead in X because everything can be fast copied, stolen, brute forced and limits on hardware, etc. appear ineffective. Lots of the arguments in favor of disregarding China in setting AI safety policy assume it being perpetually a few steps behind. But if they are getting similar performance, then they aren’t behind.
So if there is no moat, and we can expect peer performance soon, then we should be worried because we have reason to believe that if scaling + tweaks can reach AGI, then China might conceivably get AGI first, which would be very bad. I have seen replies to this point of: well, how do you know it would be that much worse? Surely Xi wants human flourishing as well. And my response is: governments do terrible things. At least in the West, the public can see these terrible things and sometimes say, hey: I object. This is bad. The PRC has no mechanism. So AGI would be dangerous in their hands in a way that it might not be...at least initially, in the West...and the PRC is starting from a not so pro-flourishing position (Uighur slavery and genocide, pro-Putinism, invade Taiwan fever, debt trap diplomacy, secret police abroad, etc.).
If you think AGI kills everyone anyway, then this doesn’t matter. If you think AGI just makes the group possessing it really powerful and able to disempower or destroy competitors, then this REALLY matters, and policies designed to hinder Western AI development could mean Western disempowerment, subjugation, etc.
I make no guarantees about the coherence of this argument and welcome critiques. Personally, I hope to be wrong.
Are you willing to bet on any of these predictions?
I find counterarguments more convincing than to challenge people to bet.
I would be willing to bet maybe $100 on the video prediction one. Kling is already in beta. As soon as it is released to the general public, that is satisfied. The only uncertainty is whether Chinese authorities crack down on such services for insufficient censorship of requests.
By what time period are you imagining this happening by?
This article on work culture in China might be relevant: https://www.businessinsider.com/china-work-culture-differences-west-2024-6
If there’s a similar work culture in AI innovation, that doesn’t sound optimal for developing something faster than the U.S. when “outside the LLM” thinking might ultimately be needed to develop AGI.
Also, Xi has recently called for more innovation in AI and other tech sectors:
https://www.msn.com/en-ie/money/other/xi-jinping-admits-china-is-relatively-weak-on-innovation-and-needs-more-talent-to-dominate-the-tech-battlefield/ar-BB1oUuk1
I’ve heard that the US is already ahead on advanced — IE, fully automated — manufacturing. China’s manufacturing economy depends on cheap human labor, which is an advantage they seem to be losing for some reason. I don’t see much of a reason to think there’s going to be a continuity between Chinese dominance in the pre-robotics manufacturing era and the next manufacturing era.
I assume you’re shorting Nvidia then, right?
What does “atop” mean here? Ranked in top 3 or top 20 or what?
So the usual refrain from Zvi and others is that the specter of China beating us to the punch with AGI is not real because limits on compute, etc. I think Zvi has tempered his position on this in light of Meta’s promise to release the weights of its 400B+ model. Now there is word that SenseTime just released a model that beats GPT-4 Turbo on various metrics. Of course, maybe Meta chooses not to release its big model, and maybe SenseTime is bluffing—I would point out though that Alibaba’s Qwen model seems to do pretty okay in the arena...anyway, my point is that I don’t think the “what if China” argument can be dismissed as quickly as some people on here seem to be ready to do.
Are you saying that China will use Llama 3 400B weights as a basis for improving their research on LLMs? Or to make more tools from? Or to reach real AGI? Or what?
Yes, yes. Probably not. And they already have a Sora clone called Vidu, for heaven’s sake.
We spend all this time debating: should greedy companies be in control, should government intervene, will intervention slow progress to the good stuff: cancer cures, longevity, etc. All of these arguments assume that WE (which I read as a gloss for the West) will have some say in the use of AGI. If the PRC gets it, and it is as powerful as predicted, these arguments become academic. And this is not because the Chinese are malevolent. It’s because, AGI would fall into the hands of the CCP via their civil-military fusion. This is a far more calculating group than those in Western governments. Here officials have to worry about getting through the next election. There, they can more comfortably wield AGI for their ends while worrying less about palatability of the means: observe how the population quietly endured a draconian lock-down and only meekly revolted when conditions began to deteriorate and containment looked futile.
I am not an accelerationist. But I am a get-it-before-them-ist. Whether the West (which I count as including Korea and Japan and Taiwan) can maintain our edge is an open question. A country that churns out PhDs and loves AI will not be easily thwarted.
No, they don’t. They have a video generation model, which is one of a great many published over the past few years as image generation increasingly became solved, such as Imagen Video or Phenaki from Google years ago, and the Vidu samples are clearly inferior to Sora (despite heavy emphasis on the ‘pan over static scene’ easy niche): https://www.youtube.com/watch?v=u1R-jxDPC70
Here we are in 2024, and we’re still being told how Real Soon Now Chinese DL will crush Westerners. I’ve been hearing this for almost a decade now, and I’ve stopped being impressed by the likes of Hsu talking about how “China graduates a million engineers a year!” or whatever. Somehow, the Next Big Thing never comes out of Chinese DL, no matter how many papers or citations or patents they have each year. Something to think about.
(I also have an ongoing Twitter series where every half year or so, I tweet a few of the frontier-pushing Western DL achievements, and I ask for merely 3 Chinese things as good—not better, just plausibly as good, including in retrospect from previous years. You know how many actual legitimate answers I’ve gotten? Like 1. Somehow, all the e/accs and China hawks like Alexandr Wang can’t seem to think of even a single one which was at or past the frontier, as opposed to the latest shiny ‘catches up to GPT-4!* * [on narrow benchmarks, YMMV]’ clone model.)
The standard way of dealing with this:
Quantify how much worse the PRC getting AGI would be than OpenAI getting it, or the US government, and how much existential risk there is from not pausing/pausing, or from the PRC/OpenAI/the US government building AGI first, and then calculating whether pausing to do {alignment research, diplomacy, sabotage, espionage} is higher expected value than moving ahead.
(Is China getting AGI first half the value of the US getting it first, or 10%, or 90%?)
The discussion over pause or competition around AGI has been lacking this so far. Maybe I should write such an analysis.
Gentlemen, calculemus!
Isn’t the main argument that Zvi makes that China is willing to do AI regulation and thus we can also do AI regulation.
In that frame the fact that Meta releases it’s weights is just regulatory failure on our part.
Nvidia just low-key released its own 340B parameter model. For those of you worried about the releasing of model weights becoming the norm, this will probably aggravate your fears.
Here is the link: https://research.nvidia.com/publication/2024-06_nemotron-4-340b
Oh, and they also released their synthetic data generation pipeline:
https://blogs.nvidia.com/blog/nemotron-4-synthetic-data-generation-llm-training/
For those curious about the performance: eyeballing the technical report, it roughly performs at the level of LLama-3 70B. It seems to have an inferior parameters-to-performance ratio because it was only trained on 9 trillion tokens, while the Llama-3 models were trained on 15 trillion tokens. It’s also trained with a 4k context length as opposed to Llama-3′s 8k. Its primary purpose seems to be the synthetic data pipeline thing.
Now, apparently, someone has used Kling to make a commercial for a Mad Max-themed beer. Zvi would call this mundane utility.
What it demonstrates is that the Chinese can fast copy anything we do, improve around the edges, and release a product. Frontier model...boom, fast copied. The amount of compute required for some of these tasks makes me suspect big leaks from frontier labs. Also, because big labs here are reluctant to release any new models ahead of this years elections, Chinese counterparts get a head start with copying and product diffusion. We could see a situation like the one with TikTok. A Chinese firm creates an intel slurping app that it releases to the West (but doesn’t allow internally) and then the West cannot get rid of it because the Chinese proceed to abuse Western legal processes. A video generating application is the poster child for an app that can be tweaked to cause destabilization while also hiding behind free speech protections.
While I’m not sure about doom from AGI, my p(doom) for West rises every time another one of these fast copies happens. The asymmetry in Sino-Western relations—Chinese firms can enter western markets but not the reverse—ensures this dynamic will continue until western firms and labs lose predominance in everything.
When I was in middle school, our instructor was trying to teach us about the Bill of Rights. She handed out a paper copy and I immediately identified that Article the first (sic) and Article the second (sic) were not among the first ten amendments and that the numbers for the others were wrong. I boldly asserted that this wasn’t the Bill of Rights and the teacher apologized and cursed the unreliable Internet. But I was wrong. This WAS the Bill of Rights, but the BILL rather than the ten ratified amendments. Everyone came away wrongly informed from that exchange.
Edit: I wrote before that I identified that they were not in the Constitution, but article the second is, as the 27th amendment, and I knew that, but it wasn’t among the first ten.
Anyone paying attention to the mystery of the GPT-2 chatbot that has appeared on lmsys? People are saying it operates at levels comparable to or exceeding GPT-4. I’m writing because I think the appearance of mysterious unannounced chatbots for public use without provenance makes me update my p(doom) upward.
Possibilities:
this is a OpenAI chatbot based on GPT-4, just like it says it is. It has undergone some more tuning and maybe has boosted reasoning because of methods described in one of the more recently published papers
this is another big American AI company masquarading OpenAI
this is a big Chinese AI company masquerading as OpenAI
this is an anonymous person or group who is using some GPT-4 fine tune API to improve performance
Possibility 1 seems most likely. If that is the case, I guess it is alright, assuming it is purely based on GPT-4 and isn’t a new model. I suppose if they wanted to test on lmsys to gauge performance anonymously, they couldn’t slap 4.5 on it, but they also couldn’t ethically give it the name of another company’s model. Giving it an entirely new name would invite heavy suspicion. So calling it the name of an old model and monitoring how it does in battle seems like the most ethical compromise. Still, even labeling a model with a different name feels deceptive.
Possibility 2 would be extremely unethical and I don’t think it is the case. Also, the behavior of the model looks more like GPT-4 than another model. I expect lawsuits if this is the case.
Possibility 3 would be extremely unethical, but is possible. Maybe they trained a model on many GPT-4 responses and then did some other stuff. Stealing a model in this way would probably accelerate KYC legislation and yield outright bans on Chinese rental of compute. If this is the case, then there is no moat because we let our moat get stolen.
Possibility 4 is a something someone mentioned in Twitter. I don’t know whether it is viable.
In any case, releasing models in disguise onto the Internet lowers my expectations for companies to behave responsibly and transparently. It feels a bit like Amazon and their scheme to collect logistics data from competitors by calling itself a different name. In that case, like this, the facade was paper thin...the headquarters of the fake company was right next to Amazon, but it worked for a long while. Since I think 1 is the mostly likely, I believe OpenAI wants to make sure it soundly beats everyone else in the rankings before releasing an update with improvements. But didn’t they just release an update a few weeks ago? Hmm.
I’m not entirely sure if it’s the same gpt2 model I’m experimenting with in the past year. If I get my hands on it, I will surely try to stretch its context window—and see if it exceeds 1024 tokens to test if its really gpt2.
It definitely exceeds 1024 BPEs context (we wouldn’t be discussing it if it didn’t, I don’t think people even know how to write prompts that, combined with the system prompt etc, even fit in 1024 BPEs anymore), and it is almost certainly not GPT-2, come on.
Copy and pasting an entire paper/blog and asking the model to summarize it? - this isn’t hard to do, and it’s very easy to know if there is enough tokens, just run the text in any BPE tokenizer available online.
Sure, the poem prompt I mentioned using is like 3500 characters all on its own, and it had no issues repeatedly revising and printing out 4 new iterations of the poem without apparently forgetting when I used up my quota yesterday, so that convo must’ve been several thousand BPEs.
Yeah, I saw your other replies in another thread and I was able to test it myself later today and yup it’s most likely that it’s OpenAI’s new LLM. I’m just still confused why call such gpt2.
Altman made a Twitter-edit joke about ‘gpt-2 i mean gpt2’, so at this point, I think it’s just a funny troll-name related to the ‘v2 personality’ which makes it a successor to the ChatGPT ‘v1’, presumably, ‘personality’. See, it’s gptv2 geddit not gpt-2? very funny, everyone lol at troll
A Chinese company released a new SORA competitor—Kling—and it is arguably superior to SORA publically available. Could be exfiltration or could be genuinely home grown. In any case, the moat is all gone.
Link: https://kling.kuaishou.com/
So US has already slipped behind despite chip limits. I also saw that Llama 3 was already bested by Qwen 2. We are about a week away from some Chinese model surpassing GPT-4o on Lmsys. I want to hear the China-is-no-big-deal folks explain this.
Wait till you find out that qwen 2 is probably just llama 3 with a few changes and some training on benchmarks to inflate performance a bit
Possible. Possible. But I don’t see how that is more likely than that Alibaba just made something better. Or they made something with with lots of contamination. I think this should make us update toward not underestimating them. The Kling thing is a whole nother issue. If it is confirmed text-to-video and not something else, then we are in big trouble because the chip limits have failed.
For what it’s worth, Yann LeCun argues that video diffusion models like Sora, or any models which predict pixels, are useless for creating an AGI world model. So this might be a dead end. The reason, according to LeCun, is that pixel data is very high dimensional and redundant compared to text (LLMs only use something like 65.000 tokens), which makes exact prediction less useful. In his 2022 outline of his proposed AGI framework, JEPA, he instead proposes an architecture which predicts embeddings rather than exact pixels.
10 AI dropped a model on Lmsys that is doing fairly well, briefly overtaking Claude Opus before slipping a bit. Just another reminder that, as we wring our hands about dodgy behavior by Open AI, apparently these Chinese firms are getting compute (despite our efforts to restrict this) and releasing powerful and competitive models.
I think I’ve switched positions on open source models. Before I felt that we must not release them because they can be easily fine-tuned to remove safety measures and represent a tech donation to adversaries. But now I feel the harm posed by these open source models seems pretty small and that because Alibaba is releasing them at an exceptionally rapid pace, western forbearance will not affect their proliferation.
The stakes with open weights for current models are much lower than for hypothetical long-horizon capable models, where removal of safety tuning becomes a stronger argument. The major effects with current models are wide availability for post-training and interpretability research, and feeding the norm of publishing weights that might persist with future dangerous models.