I deny the premise. It’s publicized, you’re just not paying attention to the water in which you swim. Companies like Google and even Apple talk a great deal about how they increasingly employ DL at every layer of the stack. Just for smartphones: pull your smartphone out of your pocket. This is how DL generates economic value: DL affects the chip design, is an increasing fraction of the chips on the SoC, is most of what your camera does, detects your face to unlock it, powers the recommendations of every single thing on it whether Youtube or app store or streaming service (including the news articles and notifications shown to you as you unlock), powers the features like transcripts of calls or machine translation of pages or spam detection that you take for granted, powers the ads which monetize you in the search engine results which they also power, the anti-hacking and anti-abuse measures which keep you safe (and also censor hatespeech etc on streams or social media), the voice synthesis you hear when you talk to it, the voice transcription when you talk to it or have your Zoom/Google videoconference sessions during the pandemic, the wake words, the predictive text when you prefer to type rather than talk and the email suggestions (the whole email, or just the spelling/grammar suggestions), the GNN traffic forecasts changing your Google Maps route to the meeting you emailed about, the cooling systems of the data centers running all of this (not to mention optimizing the placement of the programs within the data centers both spatially in solving the placement problem and temporally in forecasting)...
This all is, of course, in addition to the standard adoption curves & colonization wave dynamics, and merely how far it’s gotten so far.
I think the conclusion here is probably right, but a lot of the examples seem to exaggerate the role of DL. Like, if I thought all of the obvious-hype-bullshit put forward by big companies about DL were completely true, then it would look like this answer.
Starting from the top:
Companies like Google and even Apple talk a great deal about how they increasingly employ DL at every layer of the stack.
So, a few years back Google was pushing the idea of “AI first design” internally—i.e. design apps around the AI use-cases. By all reports from the developers I know at Google, this whole approach crashed and burned. Most ML applications didn’t generalize well beyond their training data. Also they were extremely unreliable so they always needed to either be non-crucial or to have non-ML fallbacks. (One unusually public example: that scandal where black people were auto-labelled as gorillas.) I hear that the whole “AI first” approach has basically been abandoned since then.
Of course, Google still talks about how they increasingly employ DL at every layer of the stack. It’s great hype.
DL affects the chip design...
I mean, maybe it’s used somewhere in the design loop, but I doubt it’s particularly central. My guess would be it’s used in one or two tools somewhere which are in practice not-importantly-better than the corresponding non-DL version, but someone stuck a net in there somewhere just so that they could tell a clueless middle manager “it uses deep learning!” and the clueless middle manager would buy this mediocre piece of software.
is most of what your camera does...
Misleading. Yeah, computational photography techniques have exploded, but the core tricks are not deep learning at all.
detects your face to unlock it...
This one I think I basically buy, although I don’t know much about how face detection is done today.
powers the recommendations of every single thing on it whether Youtube or app store or streaming service...
powers the ads which monetize you in the search engine results which they also power...
Misleading. Those recommenders presumably aren’t using end-to-end DL; they’re mixing it in in a few specific places. It’s a marginal value add within a larger system, not the backbone of the system.
powers the features like transcripts of calls or machine translation of pages or spam detection that you take for granted...
I basically buy the transcripts and translation examples, and basically don’t buy the spam example—we already had basically-viable spam detection before DL, the value-add there has been marginal at best.
the anti-hacking and anti-abuse measures which keep you safe (and also censor hatespeech etc on streams or social media)...
I hear companies wish they could get ML to do this well, but in practice most things still need to loop through humans. That’s epistemic status: hearsay, so not confident, but it matches my priors.
the voice synthesis you hear when you talk to it, the voice transcription when you talk to it or have your Zoom/Google videoconference sessions during the pandemic, the wake words...
These examples I basically buy.
the predictive text when you prefer to type rather than talk and the email suggestions...
These seem to work in practice exactly when they’re doing the same thing an n-gram predictor would do, and not work whenever they try to predict anything more ambitious than that.
the GNN traffic forecasts changing your Google Maps route to the meeting you emailed about...
I would be surprised if DL were doing most of the heavy lifting in Maps’ traffic forecast at this point, although I would not be surprised if it were sprinkled in and hyped up. That use-case should work really well for non-DL machine learning systems (or so I’d guess), which are a lot more transparent to the designers and developers.
the cooling systems of the data centers running all of this (not to mention optimizing the placement of the programs within the data centers both spatially in solving the placement problem and temporally in forecasting)...
Another two places where I doubt that DL is the main backbone, although it may be sprinkled in here and there and hyped up a lot. I doubt that the marginal value-add from DL is all that high in either of these use-cases, since non-DL machine learning should already be pretty good at these sorts of problems.
Stellar breakdown of hype vs. reality. Just wanted to share some news from today that Google has fired an ML scientist for challenging their paper on DL for chip placement.
The New York Times has learned Google fired machine learning scientist Satrajit Chatterjee in March, soon after it refused to publish a paper Chatterjee and others wrote challenging earlier findings that computers could design some chip components more effectively than humans. The scientist was reportedly allowed to collaborate on a paper disputing those claims after he and fellow authors expressed reservations, but was dismissed after a resolution committee rejected the paper and the researchers hoped to bring the issue to CEO Sundar Pichai and Alphabet’s board of directors.
The company hasn’t detailed why it fired Chatterjee, but told the Times he’d been “terminated with cause.” It also maintained that the original paper had been “thoroughly vetted” and peer-reviewed, and that the study challenging the claims “did not meet our standards.”
Sounds like challenging the hype is a terminable offense. But see gwern’s context for the article below.
Sounds like challenging the hype is a terminable offense.
“One story is good until another is told”. The chip design work has apparently been replicated, and Metz’s* writeup there has several red flags: in describing Gebru’s departure, he omits any mention of her ultimatum and list of demands, so he’s not above leaving out extremely important context in these departures in trying to build up a narrative of ‘Google fires researchers for criticizing research’; he explicitly notes that Chatterjee was fired ‘for cause’ which is rather eyebrow-raising when usually senior people ‘resign to spend time with their families’ (said nonfirings typically involving things like keeping their stock options while senior people are only ‘fired for cause’ when they’ve really screwed up—like, say, harassment of an attractive young woman) but he doesn’t give what that ‘cause’ was (does he really not know after presumably talking to people?) or wonder why both Chatterjee and Google are withholding it; and he uninterestedly throws in a very brief and selective quote from a presumably much longer statement by a woman involved which should be raising your other eyebrow:
Ms. Goldie said that Dr. Chatterjee had asked to manage their project in 2019 and that they had declined. When he later criticized it, she said, he could not substantiate his complaints and ignored the evidence they presented in response.
“Sat Chatterjee has waged a campaign of misinformation against me and Azalia for over two years now,” Ms. Anna Goldie said in a written statement.
She said the work had been peer-reviewed by Nature, one of the most prestigious scientific publications. And she added that Google had used their methods to build new chips and that these chips were currently used in Google’s computer data centers.
(I note that this is put at the end, which in the NYT house style, is where they bury the inconvenient facts that they can’t in good journalist conscience leave out entirely, and that makes me suspect there is more to this part than is given.)
So, we’ll see. EDIT: Timnit Gebru, perhaps surprisingly, denies any parallel and seems to say Chatterjee deserved to be fired, saying:
...But I had heard about the person from many ppl. To the extent the story is connected to mine, it’s ONLY the pattern of action on toxic men taken too late while ppl like me are retaliated against. This is NOT a story about censorship. Its a story about a toxic person who was able to stay for a long time even though many ppl knew of said toxicity. And now, they’re somehow connecting it to my story of discrimination, speaking up day in & day out & being retaliated against?
Wired has a followup article with more detailed timeline and discussion. It edges much closer to the misogyny narrative than the evil-corporate-censorship narrative.
I don’t think Alex is saying deep learning is valueless, he’s saying the new value generated doesn’t seem commensurate with the scale of the research achievements. Everyone is using algorithmic recommendations, but they don’t feel better than Netflix or Amazon could do 10 years ago. Speech to text is better than it was, but not groundbreakingly so. Predictive text may add value to my life one day, but currently it’s an annoyance.
Maybe the more hidden applications have undergone bigger shifts. I’d love to hear more about deep learning for chip or data center design. But right now the consumer uses feel like modest improvements compounding over time, and I’m constantly frustrated by how unconfigurable tools are becoming.
Agreed. I distinctly remember it becoming worth using in 2015, and was using that as my reference point. Since then it’s probably improved, but it’s been gradual enough I haven’t noticed as it happens. Everything Alex cites came after 2015, so I wasn’t counting that as “had major discontinuities in line with the research discontinuities”.
However I think foreign language translation has experienced such a discontinuity, and it’s y of comparable magnitude to the wishlist.
Prior to DL text-to-speech used hidden markov models. Those were replaced with LSTMs relatively early in the DL-revolution (random 2014 paper). In 2015 there were likely still many HHM-based models around, but apparently at least Google already used DL-based text-to-speech.
he’s saying the new value generated doesn’t seem commensurate with the scale of the research achievements.
I would point out that the tech sector is the single most lucrative sector to have invested in in the past decade, despite endless predictions that the tech bubble is finally going to pop, and this techlash or that recession will definitely do it real soon now.
What would the world look like if there were extensive quality improvements in integrated bundles of services behind APIs and SaaS and smartphones driven by, among other things, DL? I submit that it would look like ours looks.
Consumer-obvious stuff is just a small chunk of the economy.
Everyone is using algorithmic recommendations, but they don’t feel better than Netflix or Amazon could do 10 years ago.
How would you know that? You aren’t Amazon. And when corporations do report lift, the implied revenue gains are pretty big. Even back in 2014 or so, Google could make a business case for dropping $130m on an order of Nvidia GPUs (ch8, Genius Makers), much more for DeepMind, and that was back when DL was mostly ‘just’ image stuff & NMT looking inevitable, well before it began eating the rest of the stack and modalities.
On tech sector out-performance, I think the more appropriate lookback period started around 2016 when AlphaGo became famous.
On predictions, there were also countless many that tech would take over the world. Abundance of predictions for boom or bust is a constant feature of capital markets, and should be given no weight.
On causal attribution, note that there have been many other advances in the tech sector, such as cloud computing, mobile computing, industry digitization, Moore’s law, etc. It’s unclear how much of the value added is driven by DL.
On tech sector out-performance, I think the more appropriate lookback period started around 2016 when AlphaGo became famous.
I disagree. Major investments in DL by big tech like FB, Baidu, and Google started well before 2016. I cited that purchase by Google partially to ward off exactly this sort of goalpost moving. And stock markets are forward-looking, so I see no reason to restrict it to AlphaGo (?) actually winning.
On predictions, there were also countless many that tech would take over the world.
Who cares about predictions? Talk is cheap. I’m talking about returns. Stock markets are forward-looking, so if that were really the consensus, they wouldn’t’ve outperformed.
It’s unclear how much of the value added is driven by DL.
And yet, in worlds where DL delivers huge economic value in consumer-opaque ways all throughout the stack, they look like our world looks.
Consumer-obvious stuff is just a small chunk of the economy.
Consumer-obvious stuff (“final goods and services”) is what is measured by GDP, which I think is the obvious go-to metric when considering “economic value.” The items on Alex’s list strike me as final goods, while the applications of DL you’ve mentioned are mostly intermediate goods. Alex wasn’t super clear on this, but he seems to be gesturing at the question of why we haven’t seen more new types of final goods and services, or “paradigmatically better” ones.
So while I think you are correct that DL is finding applications in making small improvements to consumer goods and research, design, and manufacturing processes, I think Alex is correct in pointing out that this has yet to introduce a whole new aisle of product types at Target.
I didn’t say ‘final goods or services’. Obviously yes, in the end, everything in the economy exists for the sake of human consumers, there being no one else who it could be for yet (as we don’t care about animals or whatever). I said ‘consumer-obvious’ to refer to what is obvious to consumers, like OP’s complaint.
This is not quite as simple as ‘final’ vs ‘intermediate’ goods. Many of the examples I gave often are final goods, like machine translation. (You, the consumer, punch in a foreign text, get its translation, and go on your merry way.) It’s just that they are upgrades to final goods, which the consumer doesn’t see. If you were paying attention, the rollout of Google Translate from n-grams statistical models to neural machine translation was such a quality jump that people noticed it had happened before Google happened to officially announce it. But if you weren’t paying attention at that particular time in November 2015 or whenever it was, well, Google Translate doesn’t, like, show you little animations of brains chugging away inside TPUs; so you, consumer, stand around like OP going “but why DL???” even as you use Google Translate on a regular basis.
Consumers either never realize these quality improvements happen (perhaps you started using GT after 2015), or they just forget about the pain points they used to endure (cf. my Ordinary Life Improvements essay which is all about that), or they take for granted that ‘line on graph go up’ where everything gets 2% better per year and they never think about the stacked sigmoids and radical underlying changes it must take to keep that steady improvement going.
but he seems to be gesturing at the question of why we haven’t seen more new types of final goods and services, or “paradigmatically better” ones.
Yes, I can agree with this. OP is wrong about DL not translating into huge amounts of economic value in excess of the amount invested & yielding profits, because it does, all through the stack, and part of his mistake is in not knowing how many existing things now rely on or plug in DL in some way; but the other part of the mistake is the valid question of “why don’t I see completely brand-new, highly-economically-valuable, things which are blatantly DL, which would satisfy me at a gut level about DL being a revolution?”
So, why don’t we? I don’t think it’s necessarily any one thing, but a mix of factors that mean it would always be slow to produce these sorts of brand new categories, and others which delay by relatively small time periods and mean that the cool applications we should’ve seen this year got delayed to 2025, say. I would appeal to a mix of:
the future is already here, just unevenly distributed: unfamiliarity with all the things that already do exist (does OP know about DALL-E 2 or 15.ai? OK, fine, does he know about Purplesmart.ai where you could chat with Twilight Sparkle, using face, voice, & text synthesis? Where did you do that before?)
automation-as-colonization-wave dynamics like Shirky’s observations about blogs taking a long time to show up after they were feasible. How long did it take to get brandnew killer apps for ‘electricity’?
Hanson uses the metaphor of a ‘rising tide’; DL can be racing up the spectrum from random to superhuman, but it may not have any noticeable effects until it hits a certain point. Below a certain error rate, things like machine translation or OCR or TTS just aren’t worth bothering with, no matter how impressive they are otherwise or how much progress they represent or how fast they are improving. AlphaGo Fan Hui vs AlphaGo Lee Sedol, GPT-2 vs GPT-3, DALL-E 1 vs DALL-E 2...
Most places are still trying to integrate and invent uses for spreadsheets. Check back in 50 years for a final list of applications of today’s SOTA.
the limitations of tool AI designs: “tool AIs want to be agent AIs” because tools lose a lot of performance and need to go through human bottlenecks, and are inherently molded to existing niches, like hooking an automobile engine up to a buggy. It’ll pull the buggy, sure, but you aren’t going to discover all the other things it could be doing, and it’ll just be a horse which doesn’t poop as much.
exogenous events like
GPU shortages (we would be seeing way more cool applications of just existing models if hobbyists didn’t have to sell a kidney to get a decent Nvidia GPU), which probably lets Nvidia keep prices up (killing tons of DL uses on the margin) and hold back compute progress in favor of dripfeeding
strategic missteps (Intel’s everything, AMD’s decision to ignore Nvidia building up a software ecosystem monopoly & rendering themselves irrelevant to DL, various research orgs ignoring scaling hypothesis work until relatively recently, losing lots of time for R&D cycles)
basic commercial dynamics (hiding stuff behind an API is good business model, but otherwise massively holds back progress),
Marginal cost: We can also note that general tech commercial dynamics like commoditize-your-complement lead to weird, perverse effects because of the valley of death between extremely high-priced services and free services. Like, Google Translate couldn’t roll out NMT using RNNs until they got TPUs. Why? Because a translation has to be almost free before Google can offer it effectively at global scale; and yet, it’s also not worth Google’s time to really try to offer paid APIs because people just don’t want to use them (‘free is different’), it captures little of the value, and Google profits most by creating an integrated ecosystem of services and it’s just not worth bothering doing. And because Google has created ‘a desert of profitability’ around it, it’s hard for any pure-NMT play to work. So you have the very weird ‘overhang’ of NMT in the labs for a long time with ~$0 economic value despite being much better, until suddenly it’s rolled out, but charging $0 each.
Risk aversion/censorship: putting stuff behind an API enables risk aversion and censorship to avoid any PR problems. How ridiculous that you can’t generate faces with DALL-E 2! Or anime!
Have a cool use for LaMDA, Chinchilla/Flamingo, Gopher, or PaLM? Too bad! And big corps can afford the opportunity cost because after all they make so much money already. They’re not going to go bankrupt or anything… So we regularly see researchers leaving GB, OA, or DM, (most recently, Adept AI Labs, with incidentally a really horrifying mission from the perspective of AI safety) and scuttlebutt has it, like Jang reports, that this is often because it’s just such a pain in the ass to get big corps to approve any public use of the most awesome models, that it’s easier to leave for a startup to recreate it from scratch and then deploy it. Or consider AI Dungeon: it used to be one of the best examples of something you just couldn’t do with earlier approaches, but has gone through so many wild change in quality apparently due to the backend and OA issues that I’m too embarrassed to mention it much these days because I have no idea if it’s lobotomized this month or not.
(I have also read repeatedly that exciting new Google projects like Duplex or a Google credit card have been killed by management afraid of any kind of backlash or criticism; in the case of the credit card, apparently DEI advocates brought up the risk of it ‘exacerbating economic inequality’ or something. Plus, remember that whole thing where for like half a year Googlers weren’t allowed to mention the name “LaMDA” even as they were posting half a dozen papers on Arxiv all about it?)
bottlenecks in compute (even ignoring the GPU shortage part) where our reach exceeds our grant-making grasp (we know that much bigger models would do so many cool things, but the big science money continues to flow to things like ITER or LHC)
and in developers/researchers capable of applying DL to all the domains it could be applied to.
(People the other day were getting excited over a new GNN weather-forecaster which apparently beats the s-t out of standard weather forecasting models. Does it? I dunno, I know very little about weather forecasting models and what it might be doing wrong or being exaggerated. Could I believe that one dude did so as a hobby? Absolutely—just how many DL experts do you think there are in weather-forecasting?)
general underdevelopment of approaches making them inefficient in many ways, so you can see the possibility long before the experience curve has cranked away enough times to democratize it (things like Chinchilla show how far even the basics are from being optimized, and are why DL has a steep experience curve)
Applications are a flywheel, and our DL flywheel has an incredible amount of friction in it right now in terms of getting out to a wider world and into the hands of more people empowered to find new uses, rather than passively consuming souped-up services.
To continue the analogy, it’s like if there was a black cab monopoly on buggies which was rich off fares & deliveries and worried about criticism in the London Times for running over old ladies, and automobile engines were still being hand-made one at a time by skilled mechanicks and all the iron & oil was being diverted to manufacture dreadnoughts, so they were slowly replacing horses one at a time with the new iron horses, but only eccentric aristocrats could afford to buy any to try to use elsewhere, which keeps demand low for engines, keeping them expensive and scarce, keeping mechanicks scarce… etc.
The worst part is, for most of these, time lost is gone forever. It’s just a slowdown. Like the Thai floods simply permanently set back hard drive progress and made them expensive for a long time, there was never any ‘catchup growth’ or ‘overhang’ from it. You might hope that stuff like the GPU shortages would lead to so much capital investment and R&D that we’d enjoy a GPU boom in 2023, given historical semiconductor boom-and-bust dynamics, but I’ve yet to see anything hopeful in that vein.
The worst part is, for most of these, time lost is gone forever. It’s just a slowdown.
Gwern, aren’t you in the set that’s aware there’s no plan and this is just going to kill us? Are you that eager to get this over with? Somewhat confused here.
The worst part is, for most of these, time lost is gone forever. It’s just a slowdown. Like the Thai floods simply permanently set back hard drive progress and made them expensive for a long time, there was never any ‘catchup growth’ or ‘overhang’ from it.
Isn’t this great news for AI safety due to giving us longer timelines?
Risk aversion/censorship: putting stuff behind an API enables risk aversion and censorship to avoid any PR problems. How ridiculous that you can’t generate faces with DALL-E 2! Or anime!
Well, merchant revenue, not Stripe profit, so not quite as impressive as it sounds, but it’s a good example of the sort of nitty-gritty DL applications you will never ever hear about unless you are deep into that exact niche and probably an employee; so a good Bayesian will remember that where there is smoke, there is fire and adjust for the fact that you’ll never hear of 99% of uses.
How are you distinguishing “new DL was instrumental in this process” from “they finally got enough data that existing data janitor techniques worked” or “DL was marginally involved and overall used up more time than it saved, but CEOs are incentivized to give it excess credit”?
It’s totally possible my world is constantly being made more magical in imperceptible ways by deep learning. It’s also possible that magic is improving at a pretty constant rate, disconnected from the flashy research successes, and PR is lying to me about it’s role.
Just for smartphones: pull your smartphone out of your pocket. This is how DL generates economic value: DL
affects the chip design, is an increasing fraction of the chips on the SoC,
is most of what your camera does,
detects your face to unlock it,
powers the recommendations of every single thing on it whether Youtube or app store or streaming service (including the news articles and notifications shown to you as you unlock),
powers the features like
transcripts of calls or
machine translation of pages or
spam detection that you take for granted,
powers the ads which monetize you in the search engine
[search engine] results which they also power,
the anti-hacking and anti-abuse measures which keep you safe
(and also censor hatespeech etc on streams or social media),
the voice synthesis you hear when you talk to it,
the voice transcription when you talk to it or have your Zoom/Google videoconference sessions during the pandemic,
the wake words,
the predictive text when you prefer to type rather than talk
and the email suggestions (the whole email, or just the spelling/grammar suggestions),
the GNN traffic forecasts changing your Google Maps route to the meeting you emailed about,
the cooling systems of the data centers running all of this
(not to mention optimizing the placement of the programs within the data centers both spatially in solving the placement problem and temporally in forecasting)...
Recently I learned that Pixel phones actually contain TPUs. This is a good indicator of how much deep learning is being used (particularly it is used by the camera I think)
I deny the premise. It’s publicized, you’re just not paying attention to the water in which you swim. Companies like Google and even Apple talk a great deal about how they increasingly employ DL at every layer of the stack. Just for smartphones: pull your smartphone out of your pocket. This is how DL generates economic value: DL affects the chip design, is an increasing fraction of the chips on the SoC, is most of what your camera does, detects your face to unlock it, powers the recommendations of every single thing on it whether Youtube or app store or streaming service (including the news articles and notifications shown to you as you unlock), powers the features like transcripts of calls or machine translation of pages or spam detection that you take for granted, powers the ads which monetize you in the search engine results which they also power, the anti-hacking and anti-abuse measures which keep you safe (and also censor hatespeech etc on streams or social media), the voice synthesis you hear when you talk to it, the voice transcription when you talk to it or have your Zoom/Google videoconference sessions during the pandemic, the wake words, the predictive text when you prefer to type rather than talk and the email suggestions (the whole email, or just the spelling/grammar suggestions), the GNN traffic forecasts changing your Google Maps route to the meeting you emailed about, the cooling systems of the data centers running all of this (not to mention optimizing the placement of the programs within the data centers both spatially in solving the placement problem and temporally in forecasting)...
This all is, of course, in addition to the standard adoption curves & colonization wave dynamics, and merely how far it’s gotten so far.
I think the conclusion here is probably right, but a lot of the examples seem to exaggerate the role of DL. Like, if I thought all of the obvious-hype-bullshit put forward by big companies about DL were completely true, then it would look like this answer.
Starting from the top:
So, a few years back Google was pushing the idea of “AI first design” internally—i.e. design apps around the AI use-cases. By all reports from the developers I know at Google, this whole approach crashed and burned. Most ML applications didn’t generalize well beyond their training data. Also they were extremely unreliable so they always needed to either be non-crucial or to have non-ML fallbacks. (One unusually public example: that scandal where black people were auto-labelled as gorillas.) I hear that the whole “AI first” approach has basically been abandoned since then.
Of course, Google still talks about how they increasingly employ DL at every layer of the stack. It’s great hype.
I mean, maybe it’s used somewhere in the design loop, but I doubt it’s particularly central. My guess would be it’s used in one or two tools somewhere which are in practice not-importantly-better than the corresponding non-DL version, but someone stuck a net in there somewhere just so that they could tell a clueless middle manager “it uses deep learning!” and the clueless middle manager would buy this mediocre piece of software.
Misleading. Yeah, computational photography techniques have exploded, but the core tricks are not deep learning at all.
This one I think I basically buy, although I don’t know much about how face detection is done today.
Misleading. Those recommenders presumably aren’t using end-to-end DL; they’re mixing it in in a few specific places. It’s a marginal value add within a larger system, not the backbone of the system.
I basically buy the transcripts and translation examples, and basically don’t buy the spam example—we already had basically-viable spam detection before DL, the value-add there has been marginal at best.
I hear companies wish they could get ML to do this well, but in practice most things still need to loop through humans. That’s epistemic status: hearsay, so not confident, but it matches my priors.
These examples I basically buy.
These seem to work in practice exactly when they’re doing the same thing an n-gram predictor would do, and not work whenever they try to predict anything more ambitious than that.
I would be surprised if DL were doing most of the heavy lifting in Maps’ traffic forecast at this point, although I would not be surprised if it were sprinkled in and hyped up. That use-case should work really well for non-DL machine learning systems (or so I’d guess), which are a lot more transparent to the designers and developers.
Another two places where I doubt that DL is the main backbone, although it may be sprinkled in here and there and hyped up a lot. I doubt that the marginal value-add from DL is all that high in either of these use-cases, since non-DL machine learning should already be pretty good at these sorts of problems.
Stellar breakdown of hype vs. reality. Just wanted to share some news from today that Google has fired an ML scientist for challenging their paper on DL for chip placement.
From Engadget (ungated):
Sounds like challenging the hype is a terminable offense.But see gwern’s context for the article below.“One story is good until another is told”. The chip design work has apparently been replicated, and Metz’s* writeup there has several red flags: in describing Gebru’s departure, he omits any mention of her ultimatum and list of demands, so he’s not above leaving out extremely important context in these departures in trying to build up a narrative of ‘Google fires researchers for criticizing research’; he explicitly notes that Chatterjee was fired ‘for cause’ which is rather eyebrow-raising when usually senior people ‘resign to spend time with their families’ (said nonfirings typically involving things like keeping their stock options while senior people are only ‘fired for cause’ when they’ve really screwed up—like, say, harassment of an attractive young woman) but he doesn’t give what that ‘cause’ was (does he really not know after presumably talking to people?) or wonder why both Chatterjee and Google are withholding it; and he uninterestedly throws in a very brief and selective quote from a presumably much longer statement by a woman involved which should be raising your other eyebrow:
(I note that this is put at the end, which in the NYT house style, is where they bury the inconvenient facts that they can’t in good journalist conscience leave out entirely, and that makes me suspect there is more to this part than is given.)
So, we’ll see. EDIT: Timnit Gebru, perhaps surprisingly, denies any parallel and seems to say Chatterjee deserved to be fired, saying:
Wired has a followup article with more detailed timeline and discussion. It edges much closer to the misogyny narrative than the evil-corporate-censorship narrative.
* yes, the SSC Metz.
Fair enough! Great context, thanks.
In my experience, not enough people on here publically realise their errors and thank the corrector. Nice to see it happen here.
I don’t think Alex is saying deep learning is valueless, he’s saying the new value generated doesn’t seem commensurate with the scale of the research achievements. Everyone is using algorithmic recommendations, but they don’t feel better than Netflix or Amazon could do 10 years ago. Speech to text is better than it was, but not groundbreakingly so. Predictive text may add value to my life one day, but currently it’s an annoyance.
Maybe the more hidden applications have undergone bigger shifts. I’d love to hear more about deep learning for chip or data center design. But right now the consumer uses feel like modest improvements compounding over time, and I’m constantly frustrated by how unconfigurable tools are becoming.
I don’t know what you’re talking about. Speech to text actually works now! It was completely unusable just 12 years ago.
Agreed. I distinctly remember it becoming worth using in 2015, and was using that as my reference point. Since then it’s probably improved, but it’s been gradual enough I haven’t noticed as it happens. Everything Alex cites came after 2015, so I wasn’t counting that as “had major discontinuities in line with the research discontinuities”.
However I think foreign language translation has experienced such a discontinuity, and it’s y of comparable magnitude to the wishlist.
Was circa 2015 speech-to-text using deep learning? If not, how did it work?
Prior to DL text-to-speech used hidden markov models. Those were replaced with LSTMs relatively early in the DL-revolution (random 2014 paper). In 2015 there were likely still many HHM-based models around, but apparently at least Google already used DL-based text-to-speech.
I would point out that the tech sector is the single most lucrative sector to have invested in in the past decade, despite endless predictions that the tech bubble is finally going to pop, and this techlash or that recession will definitely do it real soon now.
What would the world look like if there were extensive quality improvements in integrated bundles of services behind APIs and SaaS and smartphones driven by, among other things, DL? I submit that it would look like ours looks.
Consumer-obvious stuff is just a small chunk of the economy.
How would you know that? You aren’t Amazon. And when corporations do report lift, the implied revenue gains are pretty big. Even back in 2014 or so, Google could make a business case for dropping $130m on an order of Nvidia GPUs (ch8, Genius Makers), much more for DeepMind, and that was back when DL was mostly ‘just’ image stuff & NMT looking inevitable, well before it began eating the rest of the stack and modalities.
On tech sector out-performance, I think the more appropriate lookback period started around 2016 when AlphaGo became famous.
On predictions, there were also countless many that tech would take over the world. Abundance of predictions for boom or bust is a constant feature of capital markets, and should be given no weight.
On causal attribution, note that there have been many other advances in the tech sector, such as cloud computing, mobile computing, industry digitization, Moore’s law, etc. It’s unclear how much of the value added is driven by DL.
I disagree. Major investments in DL by big tech like FB, Baidu, and Google started well before 2016. I cited that purchase by Google partially to ward off exactly this sort of goalpost moving. And stock markets are forward-looking, so I see no reason to restrict it to AlphaGo (?) actually winning.
Who cares about predictions? Talk is cheap. I’m talking about returns. Stock markets are forward-looking, so if that were really the consensus, they wouldn’t’ve outperformed.
And yet, in worlds where DL delivers huge economic value in consumer-opaque ways all throughout the stack, they look like our world looks.
Consumer-obvious stuff (“final goods and services”) is what is measured by GDP, which I think is the obvious go-to metric when considering “economic value.” The items on Alex’s list strike me as final goods, while the applications of DL you’ve mentioned are mostly intermediate goods. Alex wasn’t super clear on this, but he seems to be gesturing at the question of why we haven’t seen more new types of final goods and services, or “paradigmatically better” ones.
So while I think you are correct that DL is finding applications in making small improvements to consumer goods and research, design, and manufacturing processes, I think Alex is correct in pointing out that this has yet to introduce a whole new aisle of product types at Target.
I didn’t say ‘final goods or services’. Obviously yes, in the end, everything in the economy exists for the sake of human consumers, there being no one else who it could be for yet (as we don’t care about animals or whatever). I said ‘consumer-obvious’ to refer to what is obvious to consumers, like OP’s complaint.
This is not quite as simple as ‘final’ vs ‘intermediate’ goods. Many of the examples I gave often are final goods, like machine translation. (You, the consumer, punch in a foreign text, get its translation, and go on your merry way.) It’s just that they are upgrades to final goods, which the consumer doesn’t see. If you were paying attention, the rollout of Google Translate from n-grams statistical models to neural machine translation was such a quality jump that people noticed it had happened before Google happened to officially announce it. But if you weren’t paying attention at that particular time in November 2015 or whenever it was, well, Google Translate doesn’t, like, show you little animations of brains chugging away inside TPUs; so you, consumer, stand around like OP going “but why DL???” even as you use Google Translate on a regular basis.
Consumers either never realize these quality improvements happen (perhaps you started using GT after 2015), or they just forget about the pain points they used to endure (cf. my Ordinary Life Improvements essay which is all about that), or they take for granted that ‘line on graph go up’ where everything gets 2% better per year and they never think about the stacked sigmoids and radical underlying changes it must take to keep that steady improvement going.
Yes, I can agree with this. OP is wrong about DL not translating into huge amounts of economic value in excess of the amount invested & yielding profits, because it does, all through the stack, and part of his mistake is in not knowing how many existing things now rely on or plug in DL in some way; but the other part of the mistake is the valid question of “why don’t I see completely brand-new, highly-economically-valuable, things which are blatantly DL, which would satisfy me at a gut level about DL being a revolution?”
So, why don’t we? I don’t think it’s necessarily any one thing, but a mix of factors that mean it would always be slow to produce these sorts of brand new categories, and others which delay by relatively small time periods and mean that the cool applications we should’ve seen this year got delayed to 2025, say. I would appeal to a mix of:
the future is already here, just unevenly distributed: unfamiliarity with all the things that already do exist (does OP know about DALL-E 2 or 15.ai? OK, fine, does he know about Purplesmart.ai where you could chat with Twilight Sparkle, using face, voice, & text synthesis? Where did you do that before?)
automation-as-colonization-wave dynamics like Shirky’s observations about blogs taking a long time to show up after they were feasible. How long did it take to get brandnew killer apps for ‘electricity’?
Hanson uses the metaphor of a ‘rising tide’; DL can be racing up the spectrum from random to superhuman, but it may not have any noticeable effects until it hits a certain point. Below a certain error rate, things like machine translation or OCR or TTS just aren’t worth bothering with, no matter how impressive they are otherwise or how much progress they represent or how fast they are improving. AlphaGo Fan Hui vs AlphaGo Lee Sedol, GPT-2 vs GPT-3, DALL-E 1 vs DALL-E 2...
Most places are still trying to integrate and invent uses for spreadsheets. Check back in 50 years for a final list of applications of today’s SOTA.
the limitations of tool AI designs: “tool AIs want to be agent AIs” because tools lose a lot of performance and need to go through human bottlenecks, and are inherently molded to existing niches, like hooking an automobile engine up to a buggy. It’ll pull the buggy, sure, but you aren’t going to discover all the other things it could be doing, and it’ll just be a horse which doesn’t poop as much.
exogenous events like
GPU shortages (we would be seeing way more cool applications of just existing models if hobbyists didn’t have to sell a kidney to get a decent Nvidia GPU), which probably lets Nvidia keep prices up (killing tons of DL uses on the margin) and hold back compute progress in favor of dripfeeding
strategic missteps (Intel’s everything, AMD’s decision to ignore Nvidia building up a software ecosystem monopoly & rendering themselves irrelevant to DL, various research orgs ignoring scaling hypothesis work until relatively recently, losing lots of time for R&D cycles)
basic commercial dynamics (hiding stuff behind an API is good business model, but otherwise massively holds back progress),
Marginal cost: We can also note that general tech commercial dynamics like commoditize-your-complement lead to weird, perverse effects because of the valley of death between extremely high-priced services and free services. Like, Google Translate couldn’t roll out NMT using RNNs until they got TPUs. Why? Because a translation has to be almost free before Google can offer it effectively at global scale; and yet, it’s also not worth Google’s time to really try to offer paid APIs because people just don’t want to use them (‘free is different’), it captures little of the value, and Google profits most by creating an integrated ecosystem of services and it’s just not worth bothering doing. And because Google has created ‘a desert of profitability’ around it, it’s hard for any pure-NMT play to work. So you have the very weird ‘overhang’ of NMT in the labs for a long time with ~$0 economic value despite being much better, until suddenly it’s rolled out, but charging $0 each.
Risk aversion/censorship: putting stuff behind an API enables risk aversion and censorship to avoid any PR problems. How ridiculous that you can’t generate faces with DALL-E 2! Or anime!
Have a cool use for LaMDA, Chinchilla/Flamingo, Gopher, or PaLM? Too bad! And big corps can afford the opportunity cost because after all they make so much money already. They’re not going to go bankrupt or anything… So we regularly see researchers leaving GB, OA, or DM, (most recently, Adept AI Labs, with incidentally a really horrifying mission from the perspective of AI safety) and scuttlebutt has it, like Jang reports, that this is often because it’s just such a pain in the ass to get big corps to approve any public use of the most awesome models, that it’s easier to leave for a startup to recreate it from scratch and then deploy it. Or consider AI Dungeon: it used to be one of the best examples of something you just couldn’t do with earlier approaches, but has gone through so many wild change in quality apparently due to the backend and OA issues that I’m too embarrassed to mention it much these days because I have no idea if it’s lobotomized this month or not.
(I have also read repeatedly that exciting new Google projects like Duplex or a Google credit card have been killed by management afraid of any kind of backlash or criticism; in the case of the credit card, apparently DEI advocates brought up the risk of it ‘exacerbating economic inequality’ or something. Plus, remember that whole thing where for like half a year Googlers weren’t allowed to mention the name “LaMDA” even as they were posting half a dozen papers on Arxiv all about it?)
bottlenecks in compute (even ignoring the GPU shortage part) where our reach exceeds our grant-making grasp (we know that much bigger models would do so many cool things, but the big science money continues to flow to things like ITER or LHC)
and in developers/researchers capable of applying DL to all the domains it could be applied to.
(People the other day were getting excited over a new GNN weather-forecaster which apparently beats the s-t out of standard weather forecasting models. Does it? I dunno, I know very little about weather forecasting models and what it might be doing wrong or being exaggerated. Could I believe that one dude did so as a hobby? Absolutely—just how many DL experts do you think there are in weather-forecasting?)
general underdevelopment of approaches making them inefficient in many ways, so you can see the possibility long before the experience curve has cranked away enough times to democratize it (things like Chinchilla show how far even the basics are from being optimized, and are why DL has a steep experience curve)
Applications are a flywheel, and our DL flywheel has an incredible amount of friction in it right now in terms of getting out to a wider world and into the hands of more people empowered to find new uses, rather than passively consuming souped-up services.
To continue the analogy, it’s like if there was a black cab monopoly on buggies which was rich off fares & deliveries and worried about criticism in the London Times for running over old ladies, and automobile engines were still being hand-made one at a time by skilled mechanicks and all the iron & oil was being diverted to manufacture dreadnoughts, so they were slowly replacing horses one at a time with the new iron horses, but only eccentric aristocrats could afford to buy any to try to use elsewhere, which keeps demand low for engines, keeping them expensive and scarce, keeping mechanicks scarce… etc.
The worst part is, for most of these, time lost is gone forever. It’s just a slowdown. Like the Thai floods simply permanently set back hard drive progress and made them expensive for a long time, there was never any ‘catchup growth’ or ‘overhang’ from it. You might hope that stuff like the GPU shortages would lead to so much capital investment and R&D that we’d enjoy a GPU boom in 2023, given historical semiconductor boom-and-bust dynamics, but I’ve yet to see anything hopeful in that vein.
Gwern, aren’t you in the set that’s aware there’s no plan and this is just going to kill us? Are you that eager to get this over with? Somewhat confused here.
I too am confused.
Isn’t this great news for AI safety due to giving us longer timelines?
This is a brilliant comment for understanding the current deployment of DL. Deserves its own post.
This is the rather disappointing part.
(I moved this to answers, since while it isn’t technically an answer, I think it still functions better as an answer than as a comment)
[I generally approve of mods moving comments to answers.]
Datapoint in favor, Patrick Collison of Stripe says ML has made them $1 billion: https://mobile.twitter.com/patrickc/status/1188890271854915586?lang=en-GB
Well, merchant revenue, not Stripe profit, so not quite as impressive as it sounds, but it’s a good example of the sort of nitty-gritty DL applications you will never ever hear about unless you are deep into that exact niche and probably an employee; so a good Bayesian will remember that where there is smoke, there is fire and adjust for the fact that you’ll never hear of 99% of uses.
How are you distinguishing “new DL was instrumental in this process” from “they finally got enough data that existing data janitor techniques worked” or “DL was marginally involved and overall used up more time than it saved, but CEOs are incentivized to give it excess credit”?
It’s totally possible my world is constantly being made more magical in imperceptible ways by deep learning. It’s also possible that magic is improving at a pretty constant rate, disconnected from the flashy research successes, and PR is lying to me about it’s role.
Does anybody know what “optimize the bitfields of card network requests” actually means?
The above answer, partially as bulleted lists.
affects the chip design, is an increasing fraction of the chips on the SoC,
is most of what your camera does,
detects your face to unlock it,
powers the recommendations of every single thing on it whether Youtube or app store or streaming service (including the news articles and notifications shown to you as you unlock),
powers the features like
transcripts of calls or
machine translation of pages or
spam detection that you take for granted,
powers the ads which monetize you in the search engine
[search engine] results which they also power,
the anti-hacking and anti-abuse measures which keep you safe
(and also censor hatespeech etc on streams or social media),
the voice synthesis you hear when you talk to it,
the voice transcription when you talk to it or have your Zoom/Google videoconference sessions during the pandemic,
the wake words,
the predictive text when you prefer to type rather than talk
and the email suggestions (the whole email, or just the spelling/grammar suggestions),
the GNN traffic forecasts changing your Google Maps route to the meeting you emailed about,
the cooling systems of the data centers running all of this
(not to mention optimizing the placement of the programs within the data centers both spatially in solving the placement problem and temporally in forecasting)...
Recently I learned that Pixel phones actually contain TPUs. This is a good indicator of how much deep learning is being used (particularly it is used by the camera I think)