I don’t think Alex is saying deep learning is valueless, he’s saying the new value generated doesn’t seem commensurate with the scale of the research achievements. Everyone is using algorithmic recommendations, but they don’t feel better than Netflix or Amazon could do 10 years ago. Speech to text is better than it was, but not groundbreakingly so. Predictive text may add value to my life one day, but currently it’s an annoyance.
Maybe the more hidden applications have undergone bigger shifts. I’d love to hear more about deep learning for chip or data center design. But right now the consumer uses feel like modest improvements compounding over time, and I’m constantly frustrated by how unconfigurable tools are becoming.
Agreed. I distinctly remember it becoming worth using in 2015, and was using that as my reference point. Since then it’s probably improved, but it’s been gradual enough I haven’t noticed as it happens. Everything Alex cites came after 2015, so I wasn’t counting that as “had major discontinuities in line with the research discontinuities”.
However I think foreign language translation has experienced such a discontinuity, and it’s y of comparable magnitude to the wishlist.
Prior to DL text-to-speech used hidden markov models. Those were replaced with LSTMs relatively early in the DL-revolution (random 2014 paper). In 2015 there were likely still many HHM-based models around, but apparently at least Google already used DL-based text-to-speech.
he’s saying the new value generated doesn’t seem commensurate with the scale of the research achievements.
I would point out that the tech sector is the single most lucrative sector to have invested in in the past decade, despite endless predictions that the tech bubble is finally going to pop, and this techlash or that recession will definitely do it real soon now.
What would the world look like if there were extensive quality improvements in integrated bundles of services behind APIs and SaaS and smartphones driven by, among other things, DL? I submit that it would look like ours looks.
Consumer-obvious stuff is just a small chunk of the economy.
Everyone is using algorithmic recommendations, but they don’t feel better than Netflix or Amazon could do 10 years ago.
How would you know that? You aren’t Amazon. And when corporations do report lift, the implied revenue gains are pretty big. Even back in 2014 or so, Google could make a business case for dropping $130m on an order of Nvidia GPUs (ch8, Genius Makers), much more for DeepMind, and that was back when DL was mostly ‘just’ image stuff & NMT looking inevitable, well before it began eating the rest of the stack and modalities.
On tech sector out-performance, I think the more appropriate lookback period started around 2016 when AlphaGo became famous.
On predictions, there were also countless many that tech would take over the world. Abundance of predictions for boom or bust is a constant feature of capital markets, and should be given no weight.
On causal attribution, note that there have been many other advances in the tech sector, such as cloud computing, mobile computing, industry digitization, Moore’s law, etc. It’s unclear how much of the value added is driven by DL.
On tech sector out-performance, I think the more appropriate lookback period started around 2016 when AlphaGo became famous.
I disagree. Major investments in DL by big tech like FB, Baidu, and Google started well before 2016. I cited that purchase by Google partially to ward off exactly this sort of goalpost moving. And stock markets are forward-looking, so I see no reason to restrict it to AlphaGo (?) actually winning.
On predictions, there were also countless many that tech would take over the world.
Who cares about predictions? Talk is cheap. I’m talking about returns. Stock markets are forward-looking, so if that were really the consensus, they wouldn’t’ve outperformed.
It’s unclear how much of the value added is driven by DL.
And yet, in worlds where DL delivers huge economic value in consumer-opaque ways all throughout the stack, they look like our world looks.
Consumer-obvious stuff is just a small chunk of the economy.
Consumer-obvious stuff (“final goods and services”) is what is measured by GDP, which I think is the obvious go-to metric when considering “economic value.” The items on Alex’s list strike me as final goods, while the applications of DL you’ve mentioned are mostly intermediate goods. Alex wasn’t super clear on this, but he seems to be gesturing at the question of why we haven’t seen more new types of final goods and services, or “paradigmatically better” ones.
So while I think you are correct that DL is finding applications in making small improvements to consumer goods and research, design, and manufacturing processes, I think Alex is correct in pointing out that this has yet to introduce a whole new aisle of product types at Target.
I didn’t say ‘final goods or services’. Obviously yes, in the end, everything in the economy exists for the sake of human consumers, there being no one else who it could be for yet (as we don’t care about animals or whatever). I said ‘consumer-obvious’ to refer to what is obvious to consumers, like OP’s complaint.
This is not quite as simple as ‘final’ vs ‘intermediate’ goods. Many of the examples I gave often are final goods, like machine translation. (You, the consumer, punch in a foreign text, get its translation, and go on your merry way.) It’s just that they are upgrades to final goods, which the consumer doesn’t see. If you were paying attention, the rollout of Google Translate from n-grams statistical models to neural machine translation was such a quality jump that people noticed it had happened before Google happened to officially announce it. But if you weren’t paying attention at that particular time in November 2015 or whenever it was, well, Google Translate doesn’t, like, show you little animations of brains chugging away inside TPUs; so you, consumer, stand around like OP going “but why DL???” even as you use Google Translate on a regular basis.
Consumers either never realize these quality improvements happen (perhaps you started using GT after 2015), or they just forget about the pain points they used to endure (cf. my Ordinary Life Improvements essay which is all about that), or they take for granted that ‘line on graph go up’ where everything gets 2% better per year and they never think about the stacked sigmoids and radical underlying changes it must take to keep that steady improvement going.
but he seems to be gesturing at the question of why we haven’t seen more new types of final goods and services, or “paradigmatically better” ones.
Yes, I can agree with this. OP is wrong about DL not translating into huge amounts of economic value in excess of the amount invested & yielding profits, because it does, all through the stack, and part of his mistake is in not knowing how many existing things now rely on or plug in DL in some way; but the other part of the mistake is the valid question of “why don’t I see completely brand-new, highly-economically-valuable, things which are blatantly DL, which would satisfy me at a gut level about DL being a revolution?”
So, why don’t we? I don’t think it’s necessarily any one thing, but a mix of factors that mean it would always be slow to produce these sorts of brand new categories, and others which delay by relatively small time periods and mean that the cool applications we should’ve seen this year got delayed to 2025, say. I would appeal to a mix of:
the future is already here, just unevenly distributed: unfamiliarity with all the things that already do exist (does OP know about DALL-E 2 or 15.ai? OK, fine, does he know about Purplesmart.ai where you could chat with Twilight Sparkle, using face, voice, & text synthesis? Where did you do that before?)
automation-as-colonization-wave dynamics like Shirky’s observations about blogs taking a long time to show up after they were feasible. How long did it take to get brandnew killer apps for ‘electricity’?
Hanson uses the metaphor of a ‘rising tide’; DL can be racing up the spectrum from random to superhuman, but it may not have any noticeable effects until it hits a certain point. Below a certain error rate, things like machine translation or OCR or TTS just aren’t worth bothering with, no matter how impressive they are otherwise or how much progress they represent or how fast they are improving. AlphaGo Fan Hui vs AlphaGo Lee Sedol, GPT-2 vs GPT-3, DALL-E 1 vs DALL-E 2...
Most places are still trying to integrate and invent uses for spreadsheets. Check back in 50 years for a final list of applications of today’s SOTA.
the limitations of tool AI designs: “tool AIs want to be agent AIs” because tools lose a lot of performance and need to go through human bottlenecks, and are inherently molded to existing niches, like hooking an automobile engine up to a buggy. It’ll pull the buggy, sure, but you aren’t going to discover all the other things it could be doing, and it’ll just be a horse which doesn’t poop as much.
exogenous events like
GPU shortages (we would be seeing way more cool applications of just existing models if hobbyists didn’t have to sell a kidney to get a decent Nvidia GPU), which probably lets Nvidia keep prices up (killing tons of DL uses on the margin) and hold back compute progress in favor of dripfeeding
strategic missteps (Intel’s everything, AMD’s decision to ignore Nvidia building up a software ecosystem monopoly & rendering themselves irrelevant to DL, various research orgs ignoring scaling hypothesis work until relatively recently, losing lots of time for R&D cycles)
basic commercial dynamics (hiding stuff behind an API is good business model, but otherwise massively holds back progress),
Marginal cost: We can also note that general tech commercial dynamics like commoditize-your-complement lead to weird, perverse effects because of the valley of death between extremely high-priced services and free services. Like, Google Translate couldn’t roll out NMT using RNNs until they got TPUs. Why? Because a translation has to be almost free before Google can offer it effectively at global scale; and yet, it’s also not worth Google’s time to really try to offer paid APIs because people just don’t want to use them (‘free is different’), it captures little of the value, and Google profits most by creating an integrated ecosystem of services and it’s just not worth bothering doing. And because Google has created ‘a desert of profitability’ around it, it’s hard for any pure-NMT play to work. So you have the very weird ‘overhang’ of NMT in the labs for a long time with ~$0 economic value despite being much better, until suddenly it’s rolled out, but charging $0 each.
Risk aversion/censorship: putting stuff behind an API enables risk aversion and censorship to avoid any PR problems. How ridiculous that you can’t generate faces with DALL-E 2! Or anime!
Have a cool use for LaMDA, Chinchilla/Flamingo, Gopher, or PaLM? Too bad! And big corps can afford the opportunity cost because after all they make so much money already. They’re not going to go bankrupt or anything… So we regularly see researchers leaving GB, OA, or DM, (most recently, Adept AI Labs, with incidentally a really horrifying mission from the perspective of AI safety) and scuttlebutt has it, like Jang reports, that this is often because it’s just such a pain in the ass to get big corps to approve any public use of the most awesome models, that it’s easier to leave for a startup to recreate it from scratch and then deploy it. Or consider AI Dungeon: it used to be one of the best examples of something you just couldn’t do with earlier approaches, but has gone through so many wild change in quality apparently due to the backend and OA issues that I’m too embarrassed to mention it much these days because I have no idea if it’s lobotomized this month or not.
(I have also read repeatedly that exciting new Google projects like Duplex or a Google credit card have been killed by management afraid of any kind of backlash or criticism; in the case of the credit card, apparently DEI advocates brought up the risk of it ‘exacerbating economic inequality’ or something. Plus, remember that whole thing where for like half a year Googlers weren’t allowed to mention the name “LaMDA” even as they were posting half a dozen papers on Arxiv all about it?)
bottlenecks in compute (even ignoring the GPU shortage part) where our reach exceeds our grant-making grasp (we know that much bigger models would do so many cool things, but the big science money continues to flow to things like ITER or LHC)
and in developers/researchers capable of applying DL to all the domains it could be applied to.
(People the other day were getting excited over a new GNN weather-forecaster which apparently beats the s-t out of standard weather forecasting models. Does it? I dunno, I know very little about weather forecasting models and what it might be doing wrong or being exaggerated. Could I believe that one dude did so as a hobby? Absolutely—just how many DL experts do you think there are in weather-forecasting?)
general underdevelopment of approaches making them inefficient in many ways, so you can see the possibility long before the experience curve has cranked away enough times to democratize it (things like Chinchilla show how far even the basics are from being optimized, and are why DL has a steep experience curve)
Applications are a flywheel, and our DL flywheel has an incredible amount of friction in it right now in terms of getting out to a wider world and into the hands of more people empowered to find new uses, rather than passively consuming souped-up services.
To continue the analogy, it’s like if there was a black cab monopoly on buggies which was rich off fares & deliveries and worried about criticism in the London Times for running over old ladies, and automobile engines were still being hand-made one at a time by skilled mechanicks and all the iron & oil was being diverted to manufacture dreadnoughts, so they were slowly replacing horses one at a time with the new iron horses, but only eccentric aristocrats could afford to buy any to try to use elsewhere, which keeps demand low for engines, keeping them expensive and scarce, keeping mechanicks scarce… etc.
The worst part is, for most of these, time lost is gone forever. It’s just a slowdown. Like the Thai floods simply permanently set back hard drive progress and made them expensive for a long time, there was never any ‘catchup growth’ or ‘overhang’ from it. You might hope that stuff like the GPU shortages would lead to so much capital investment and R&D that we’d enjoy a GPU boom in 2023, given historical semiconductor boom-and-bust dynamics, but I’ve yet to see anything hopeful in that vein.
The worst part is, for most of these, time lost is gone forever. It’s just a slowdown.
Gwern, aren’t you in the set that’s aware there’s no plan and this is just going to kill us? Are you that eager to get this over with? Somewhat confused here.
The worst part is, for most of these, time lost is gone forever. It’s just a slowdown. Like the Thai floods simply permanently set back hard drive progress and made them expensive for a long time, there was never any ‘catchup growth’ or ‘overhang’ from it.
Isn’t this great news for AI safety due to giving us longer timelines?
Risk aversion/censorship: putting stuff behind an API enables risk aversion and censorship to avoid any PR problems. How ridiculous that you can’t generate faces with DALL-E 2! Or anime!
I don’t think Alex is saying deep learning is valueless, he’s saying the new value generated doesn’t seem commensurate with the scale of the research achievements. Everyone is using algorithmic recommendations, but they don’t feel better than Netflix or Amazon could do 10 years ago. Speech to text is better than it was, but not groundbreakingly so. Predictive text may add value to my life one day, but currently it’s an annoyance.
Maybe the more hidden applications have undergone bigger shifts. I’d love to hear more about deep learning for chip or data center design. But right now the consumer uses feel like modest improvements compounding over time, and I’m constantly frustrated by how unconfigurable tools are becoming.
I don’t know what you’re talking about. Speech to text actually works now! It was completely unusable just 12 years ago.
Agreed. I distinctly remember it becoming worth using in 2015, and was using that as my reference point. Since then it’s probably improved, but it’s been gradual enough I haven’t noticed as it happens. Everything Alex cites came after 2015, so I wasn’t counting that as “had major discontinuities in line with the research discontinuities”.
However I think foreign language translation has experienced such a discontinuity, and it’s y of comparable magnitude to the wishlist.
Was circa 2015 speech-to-text using deep learning? If not, how did it work?
Prior to DL text-to-speech used hidden markov models. Those were replaced with LSTMs relatively early in the DL-revolution (random 2014 paper). In 2015 there were likely still many HHM-based models around, but apparently at least Google already used DL-based text-to-speech.
I would point out that the tech sector is the single most lucrative sector to have invested in in the past decade, despite endless predictions that the tech bubble is finally going to pop, and this techlash or that recession will definitely do it real soon now.
What would the world look like if there were extensive quality improvements in integrated bundles of services behind APIs and SaaS and smartphones driven by, among other things, DL? I submit that it would look like ours looks.
Consumer-obvious stuff is just a small chunk of the economy.
How would you know that? You aren’t Amazon. And when corporations do report lift, the implied revenue gains are pretty big. Even back in 2014 or so, Google could make a business case for dropping $130m on an order of Nvidia GPUs (ch8, Genius Makers), much more for DeepMind, and that was back when DL was mostly ‘just’ image stuff & NMT looking inevitable, well before it began eating the rest of the stack and modalities.
On tech sector out-performance, I think the more appropriate lookback period started around 2016 when AlphaGo became famous.
On predictions, there were also countless many that tech would take over the world. Abundance of predictions for boom or bust is a constant feature of capital markets, and should be given no weight.
On causal attribution, note that there have been many other advances in the tech sector, such as cloud computing, mobile computing, industry digitization, Moore’s law, etc. It’s unclear how much of the value added is driven by DL.
I disagree. Major investments in DL by big tech like FB, Baidu, and Google started well before 2016. I cited that purchase by Google partially to ward off exactly this sort of goalpost moving. And stock markets are forward-looking, so I see no reason to restrict it to AlphaGo (?) actually winning.
Who cares about predictions? Talk is cheap. I’m talking about returns. Stock markets are forward-looking, so if that were really the consensus, they wouldn’t’ve outperformed.
And yet, in worlds where DL delivers huge economic value in consumer-opaque ways all throughout the stack, they look like our world looks.
Consumer-obvious stuff (“final goods and services”) is what is measured by GDP, which I think is the obvious go-to metric when considering “economic value.” The items on Alex’s list strike me as final goods, while the applications of DL you’ve mentioned are mostly intermediate goods. Alex wasn’t super clear on this, but he seems to be gesturing at the question of why we haven’t seen more new types of final goods and services, or “paradigmatically better” ones.
So while I think you are correct that DL is finding applications in making small improvements to consumer goods and research, design, and manufacturing processes, I think Alex is correct in pointing out that this has yet to introduce a whole new aisle of product types at Target.
I didn’t say ‘final goods or services’. Obviously yes, in the end, everything in the economy exists for the sake of human consumers, there being no one else who it could be for yet (as we don’t care about animals or whatever). I said ‘consumer-obvious’ to refer to what is obvious to consumers, like OP’s complaint.
This is not quite as simple as ‘final’ vs ‘intermediate’ goods. Many of the examples I gave often are final goods, like machine translation. (You, the consumer, punch in a foreign text, get its translation, and go on your merry way.) It’s just that they are upgrades to final goods, which the consumer doesn’t see. If you were paying attention, the rollout of Google Translate from n-grams statistical models to neural machine translation was such a quality jump that people noticed it had happened before Google happened to officially announce it. But if you weren’t paying attention at that particular time in November 2015 or whenever it was, well, Google Translate doesn’t, like, show you little animations of brains chugging away inside TPUs; so you, consumer, stand around like OP going “but why DL???” even as you use Google Translate on a regular basis.
Consumers either never realize these quality improvements happen (perhaps you started using GT after 2015), or they just forget about the pain points they used to endure (cf. my Ordinary Life Improvements essay which is all about that), or they take for granted that ‘line on graph go up’ where everything gets 2% better per year and they never think about the stacked sigmoids and radical underlying changes it must take to keep that steady improvement going.
Yes, I can agree with this. OP is wrong about DL not translating into huge amounts of economic value in excess of the amount invested & yielding profits, because it does, all through the stack, and part of his mistake is in not knowing how many existing things now rely on or plug in DL in some way; but the other part of the mistake is the valid question of “why don’t I see completely brand-new, highly-economically-valuable, things which are blatantly DL, which would satisfy me at a gut level about DL being a revolution?”
So, why don’t we? I don’t think it’s necessarily any one thing, but a mix of factors that mean it would always be slow to produce these sorts of brand new categories, and others which delay by relatively small time periods and mean that the cool applications we should’ve seen this year got delayed to 2025, say. I would appeal to a mix of:
the future is already here, just unevenly distributed: unfamiliarity with all the things that already do exist (does OP know about DALL-E 2 or 15.ai? OK, fine, does he know about Purplesmart.ai where you could chat with Twilight Sparkle, using face, voice, & text synthesis? Where did you do that before?)
automation-as-colonization-wave dynamics like Shirky’s observations about blogs taking a long time to show up after they were feasible. How long did it take to get brandnew killer apps for ‘electricity’?
Hanson uses the metaphor of a ‘rising tide’; DL can be racing up the spectrum from random to superhuman, but it may not have any noticeable effects until it hits a certain point. Below a certain error rate, things like machine translation or OCR or TTS just aren’t worth bothering with, no matter how impressive they are otherwise or how much progress they represent or how fast they are improving. AlphaGo Fan Hui vs AlphaGo Lee Sedol, GPT-2 vs GPT-3, DALL-E 1 vs DALL-E 2...
Most places are still trying to integrate and invent uses for spreadsheets. Check back in 50 years for a final list of applications of today’s SOTA.
the limitations of tool AI designs: “tool AIs want to be agent AIs” because tools lose a lot of performance and need to go through human bottlenecks, and are inherently molded to existing niches, like hooking an automobile engine up to a buggy. It’ll pull the buggy, sure, but you aren’t going to discover all the other things it could be doing, and it’ll just be a horse which doesn’t poop as much.
exogenous events like
GPU shortages (we would be seeing way more cool applications of just existing models if hobbyists didn’t have to sell a kidney to get a decent Nvidia GPU), which probably lets Nvidia keep prices up (killing tons of DL uses on the margin) and hold back compute progress in favor of dripfeeding
strategic missteps (Intel’s everything, AMD’s decision to ignore Nvidia building up a software ecosystem monopoly & rendering themselves irrelevant to DL, various research orgs ignoring scaling hypothesis work until relatively recently, losing lots of time for R&D cycles)
basic commercial dynamics (hiding stuff behind an API is good business model, but otherwise massively holds back progress),
Marginal cost: We can also note that general tech commercial dynamics like commoditize-your-complement lead to weird, perverse effects because of the valley of death between extremely high-priced services and free services. Like, Google Translate couldn’t roll out NMT using RNNs until they got TPUs. Why? Because a translation has to be almost free before Google can offer it effectively at global scale; and yet, it’s also not worth Google’s time to really try to offer paid APIs because people just don’t want to use them (‘free is different’), it captures little of the value, and Google profits most by creating an integrated ecosystem of services and it’s just not worth bothering doing. And because Google has created ‘a desert of profitability’ around it, it’s hard for any pure-NMT play to work. So you have the very weird ‘overhang’ of NMT in the labs for a long time with ~$0 economic value despite being much better, until suddenly it’s rolled out, but charging $0 each.
Risk aversion/censorship: putting stuff behind an API enables risk aversion and censorship to avoid any PR problems. How ridiculous that you can’t generate faces with DALL-E 2! Or anime!
Have a cool use for LaMDA, Chinchilla/Flamingo, Gopher, or PaLM? Too bad! And big corps can afford the opportunity cost because after all they make so much money already. They’re not going to go bankrupt or anything… So we regularly see researchers leaving GB, OA, or DM, (most recently, Adept AI Labs, with incidentally a really horrifying mission from the perspective of AI safety) and scuttlebutt has it, like Jang reports, that this is often because it’s just such a pain in the ass to get big corps to approve any public use of the most awesome models, that it’s easier to leave for a startup to recreate it from scratch and then deploy it. Or consider AI Dungeon: it used to be one of the best examples of something you just couldn’t do with earlier approaches, but has gone through so many wild change in quality apparently due to the backend and OA issues that I’m too embarrassed to mention it much these days because I have no idea if it’s lobotomized this month or not.
(I have also read repeatedly that exciting new Google projects like Duplex or a Google credit card have been killed by management afraid of any kind of backlash or criticism; in the case of the credit card, apparently DEI advocates brought up the risk of it ‘exacerbating economic inequality’ or something. Plus, remember that whole thing where for like half a year Googlers weren’t allowed to mention the name “LaMDA” even as they were posting half a dozen papers on Arxiv all about it?)
bottlenecks in compute (even ignoring the GPU shortage part) where our reach exceeds our grant-making grasp (we know that much bigger models would do so many cool things, but the big science money continues to flow to things like ITER or LHC)
and in developers/researchers capable of applying DL to all the domains it could be applied to.
(People the other day were getting excited over a new GNN weather-forecaster which apparently beats the s-t out of standard weather forecasting models. Does it? I dunno, I know very little about weather forecasting models and what it might be doing wrong or being exaggerated. Could I believe that one dude did so as a hobby? Absolutely—just how many DL experts do you think there are in weather-forecasting?)
general underdevelopment of approaches making them inefficient in many ways, so you can see the possibility long before the experience curve has cranked away enough times to democratize it (things like Chinchilla show how far even the basics are from being optimized, and are why DL has a steep experience curve)
Applications are a flywheel, and our DL flywheel has an incredible amount of friction in it right now in terms of getting out to a wider world and into the hands of more people empowered to find new uses, rather than passively consuming souped-up services.
To continue the analogy, it’s like if there was a black cab monopoly on buggies which was rich off fares & deliveries and worried about criticism in the London Times for running over old ladies, and automobile engines were still being hand-made one at a time by skilled mechanicks and all the iron & oil was being diverted to manufacture dreadnoughts, so they were slowly replacing horses one at a time with the new iron horses, but only eccentric aristocrats could afford to buy any to try to use elsewhere, which keeps demand low for engines, keeping them expensive and scarce, keeping mechanicks scarce… etc.
The worst part is, for most of these, time lost is gone forever. It’s just a slowdown. Like the Thai floods simply permanently set back hard drive progress and made them expensive for a long time, there was never any ‘catchup growth’ or ‘overhang’ from it. You might hope that stuff like the GPU shortages would lead to so much capital investment and R&D that we’d enjoy a GPU boom in 2023, given historical semiconductor boom-and-bust dynamics, but I’ve yet to see anything hopeful in that vein.
Gwern, aren’t you in the set that’s aware there’s no plan and this is just going to kill us? Are you that eager to get this over with? Somewhat confused here.
I too am confused.
Isn’t this great news for AI safety due to giving us longer timelines?
This is a brilliant comment for understanding the current deployment of DL. Deserves its own post.
This is the rather disappointing part.