I think one has to admit that smartphones with limited-attention-space are the revealed modal preference of consumers. It’s not at all clear that this is an inadequate equilibrium to shift, so much as a thing that many consumers actively want.
I doubt it’ll ever be mostly voice interface—there is no current solution to use voice in public without bothering others. Audio is also MUCH lower bandwidth than visual displays. It will very likely be hybrid/multi-modal, with different sets of modality for different users/contexts.
I do suspect that it won’t be long for LLM-intermediated “browsing” becomes common, where a lot of information-centric websites see more MCP traffic than HTML (render-able) requests. There’ll be a horroble mix of “thin apps” which are just a captive LLM search/summarize/render engine, and “AI browsers” which try to do this generically for many sources. Eventually, some standards will evolve about semantic encoding for best use in these things, and for visual hints to make it easier to display usefully.
To the curmudgeons among us, this will feel like reinventing HTML and CSS, badly. I hope we’ll be wrong, and it does actually lead to personalized/customized views and usage of many current semi-static site designs.
I think one has to admit that smartphones with limited-attention-space are the revealed modal preference of consumers. It’s not at all clear that this is an inadequate equilibrium to shift, so much as a thing that many consumers actively want.
I do totally agree, this is what the people want. I do concretely say “yep, and the people are wrong”. But, I think the solution is not “ban cell phones” or similar, it’s “can we invent a technology that gives people the thing they want out of smartphones but with less bad side effects?”
I doubt it’ll ever be mostly voice interface—there is no current solution to use voice in public without bothering others. It will very likely be hybrid/multi-modal, with different sets of modality for different users/contexts.
Oh ye of little faith about how fast technology is about to change. (I think it’s already pretty easy to do almost-subvocalized messages. I guess this conversation is sort of predicated on it being pre-uploads and maybe pre-ubiquitous neuralink-ish things)
Oh ye of little faith about how fast technology is about to change. (I think it’s already pretty easy to do almost-subvocalized messages. I guess this conversation is sort of predicated on it being pre-uploads and maybe pre-ubiquitous neuralink-ish things)
Subvocal mikes have been theoretically possible (and even demo’d) for decades, and highly desired and not yet actually feasible for public consumer use, which to me is strong evidence that it’s a Hard Problem. Neurallink or less-invasive brain interfaces even more so.
There’s a lot of AI and tech bets I won’t take—pure software can change REALLY fast. However, I’d be interested to operationalize this disagreement about hardware/wetware interfaces and timelines. I’d probably lay 3:1 against either voice-interface-usable-on-a-crowded-train or non-touch input and non-visual output via a brain link becoming common (say, 1% of smartphone users) by end of 2027, or 1:1 against for end of 2029.
Of the two, I give most weight to my losing this bet via subvocal interfaces that LLMs can be trained to interpret, with only a little bit of training/effort on the part of the user. That’ll be cool, but it’s still very physical and I predict won’t quickly work.
Part of the generator was “I’ve seen a demo of apple airpods basically working for this right now” (it’s not, like, 100% silent, you have to speak at a whisper, but, it seemed fine for a room with some background noise)
I’d probably lay 3:1 against either voice-interface-usable-on-a-crowded-train or non-touch input and non-visual output via a brain link becoming common (say, 1% of smartphone users) by end of 2027, or 1:1 against for end of 2029.
These do not seem like conservative estimates. For a technology like this I think a spread to almost everyone (with a smartphone) is pretty likely given a spread to 1% of users. At least, from a technological perspective (which seems to be what your comment is arguing from), spreading to 1% of users seems like the real hard part here.
They’re not intended to be conservative, they’re an attempt to operationalize my current beliefs. Offering 3:1 means I give a very significant probability (up to 25%) to the other side. That’s pretty huge for such a large change in software-interaction modality.
Agreed that being usable enough that 1% of users prefer it for at least some of their daily use is the hard part. Once it’s well-known and good enough for the early adopters, then making it the standard/default is just a matter of time—the technology can be predicted to win when it gets there.
I don’t honestly know how much Raemon’s (or your) beliefs differ from mine, in terms of timeline and likelihood. I didn’t intend to fully contradict anything he said, just to acknowledge that I think the most likely major change is still pretty iffy.
I think one has to admit that smartphones with limited-attention-space are the revealed modal preference of consumers. It’s not at all clear that this is an inadequate equilibrium to shift, so much as a thing that many consumers actively want.
I doubt it’ll ever be mostly voice interface—there is no current solution to use voice in public without bothering others. Audio is also MUCH lower bandwidth than visual displays. It will very likely be hybrid/multi-modal, with different sets of modality for different users/contexts.
I do suspect that it won’t be long for LLM-intermediated “browsing” becomes common, where a lot of information-centric websites see more MCP traffic than HTML (render-able) requests. There’ll be a horroble mix of “thin apps” which are just a captive LLM search/summarize/render engine, and “AI browsers” which try to do this generically for many sources. Eventually, some standards will evolve about semantic encoding for best use in these things, and for visual hints to make it easier to display usefully.
To the curmudgeons among us, this will feel like reinventing HTML and CSS, badly. I hope we’ll be wrong, and it does actually lead to personalized/customized views and usage of many current semi-static site designs.
I do totally agree, this is what the people want. I do concretely say “yep, and the people are wrong”. But, I think the solution is not “ban cell phones” or similar, it’s “can we invent a technology that gives people the thing they want out of smartphones but with less bad side effects?”
Oh ye of little faith about how fast technology is about to change. (I think it’s already pretty easy to do almost-subvocalized messages. I guess this conversation is sort of predicated on it being pre-uploads and maybe pre-ubiquitous neuralink-ish things)
Subvocal mikes have been theoretically possible (and even demo’d) for decades, and highly desired and not yet actually feasible for public consumer use, which to me is strong evidence that it’s a Hard Problem. Neurallink or less-invasive brain interfaces even more so.
There’s a lot of AI and tech bets I won’t take—pure software can change REALLY fast. However, I’d be interested to operationalize this disagreement about hardware/wetware interfaces and timelines. I’d probably lay 3:1 against either voice-interface-usable-on-a-crowded-train or non-touch input and non-visual output via a brain link becoming common (say, 1% of smartphone users) by end of 2027, or 1:1 against for end of 2029.
Of the two, I give most weight to my losing this bet via subvocal interfaces that LLMs can be trained to interpret, with only a little bit of training/effort on the part of the user. That’ll be cool, but it’s still very physical and I predict won’t quickly work.
Part of the generator was “I’ve seen a demo of apple airpods basically working for this right now” (it’s not, like, 100% silent, you have to speak at a whisper, but, it seemed fine for a room with some background noise)
These do not seem like conservative estimates. For a technology like this I think a spread to almost everyone (with a smartphone) is pretty likely given a spread to 1% of users. At least, from a technological perspective (which seems to be what your comment is arguing from), spreading to 1% of users seems like the real hard part here.
They’re not intended to be conservative, they’re an attempt to operationalize my current beliefs. Offering 3:1 means I give a very significant probability (up to 25%) to the other side. That’s pretty huge for such a large change in software-interaction modality.
Agreed that being usable enough that 1% of users prefer it for at least some of their daily use is the hard part. Once it’s well-known and good enough for the early adopters, then making it the standard/default is just a matter of time—the technology can be predicted to win when it gets there.
I don’t honestly know how much Raemon’s (or your) beliefs differ from mine, in terms of timeline and likelihood. I didn’t intend to fully contradict anything he said, just to acknowledge that I think the most likely major change is still pretty iffy.
Ok, I guess I got confused by your calling it a “Hard Problem”.