Great post! A few things I would add, having thought about crossing the street for a while:
When looking left and right, let your eyes briefly wander to the horizon on each side, or as far as possible until your view is obstructed. If there are blind spots (such as those created by parked cars), or road curvature that makes it difficult to see past some distance, it’s better to consciously note that and adjust strategy accordingly, rather than just assuming that if you don’t see a car coming, it isn’t there. (There’s also something really satisfying about looking at the horizon. I’ve also heard it’s good for the eyes.)
Related to the above, if your view is obstructed, you want to edge into the crossing slowly and take another look once you are past the obstruction.
It’s good to make use of both sound and peripheral vision as you approach a crossing. In most cases, even before you get to the curb, you should have a sense of the oncoming traffic, based on the sound, though you may not have a precise sense of its speed and direction. Starting to pay attention a few feet before you are at the crossing is a good idea.
When looking at oncoming traffic, you should be able to get a general sense both for how fast it’s going and whether it’s slowing down for the crossing or for you (if the crossing has a pedestrian right of way). Generally, if a car isn’t showing signs of slowing down as expected, I will wait at the crossing until it slows down or passes. I generally only cross if I’m confident that the driver’s speed is slow enough that I can cross even if they don’t slow down (with a large margin), or it looks like their foot is already on the brake.
Many of these habits are also useful for driving, and probably more important there given the greater speeds involved, and it’s good practice to start building this mental model early on.
Here’s Claude’s review of the article in light of changes over the past 12 years:
This is a rich piece and holds up remarkably well in its core claims. Let me give you a structured assessment.
What Has Stood the Test of Time
The central thesis — that text would dominate over video as the marginal medium of the internet — was basically correct for the period 2014–2022. Text did continue to dominate information production, search infrastructure, professional work, and knowledge accumulation. The argument about production costs was prescient: the explosion of Twitter, Reddit, newsletters, Discord, Slack, and (ironically) LLM chatbots all represent text continuing to dominate the high-value-density layer of the internet.
The “augmented text” framing aged extremely well. Rich text environments — Notion, Substack, Obsidian, Linear, GitHub — are exactly the evolution you described: not plain ASCII, but semantically dense text augmented with structure, embeds, formatting, and links. The vision of text getting progressively richer without jumping to video was accurate.
The meme/GIF point was spot-on. The visual internet that actually exploded was image macros, reaction GIFs, and short-form images — scaffold for text, not replacement for it.
The 3D video skepticism was well-calibrated. You said “at least 20 years, probably 30.” Oculus/Meta has sunk tens of billions into VR and AR with essentially zero mass-market traction for communication or content consumption. The Quest 3 is a technical marvel that almost nobody uses daily. Apple Vision Pro is a $3500 curiosity. The intermediate milestones haven’t been rewarding enough, exactly as you predicted.
The machine learning section is one of the best-aged passages in the piece. You specifically argued that ML’s value would be mostly intermediated through improvements to existing text interfaces rather than through flashy new modalities. This turned out to be deeply right — for nearly a decade, the biggest ML wins were ranking, recommendations, autocomplete, spam filtering, and search quality. The LLM era is almost an extreme version of this thesis.
The flow-through effects argument (text generating more downstream production) has been validated. Text on the internet has compounded — every piece of text is linkable, quotable, indexable, trainable-on. Video mostly doesn’t compound in the same way.
What Has Been Proven False or Significantly Complicated
The strongest refutation: short-form video. TikTok is the clearest counterexample to your thesis. You acknowledged video’s role in entertainment/movies but treated it as substitution rather than paradigm shift. TikTok represents something genuinely new: video as the primary discovery and communication medium for a large population cohort, replacing not just TV but search, social feeds, and even text tutorials. The “democratization of production” you cited as a text advantage turned out to apply to video too — partly because smartphones commoditized decent video production, and partly because TikTok’s algorithm is forgiving enough of rough production values that the inhibition barrier dropped substantially. YouTube Shorts, Instagram Reels, and now even LinkedIn video are real. For Gen Z, the video-first internet is the default.
Podcasts grew much larger than your framing suggested. You mention audio briefly and essentially dismiss it. But podcasts became a multi-billion dollar industry, became the dominant long-form interview format, and arguably became more culturally influential than comparable text media. The asynchronous mobility advantage (listening while exercising, driving, cooking) turned out to be a stronger force than you credited.
The inhibition argument partially inverted for video. You argued people would be more inhibited about video than text, reducing video production. But the smartphone + front camera + selfie culture + TikTok’s ephemeral/low-stakes aesthetic created a generation that is less inhibited about video than text. For many young people, typing out a thought feels more labored than filming themselves saying it.
Image platforms became more sophisticated than the meme-scaffold model. Instagram became a major driver of culture, commerce, and communication in ways that went beyond images-as-text-scaffolding. The visual identity economy (influencer culture, brand aesthetics) is genuinely image/video-native in ways that don’t reduce to text.
How LLMs Change the Text vs. Video Equation
This is where it gets most interesting, and I think the LLM era dramatically amplifies your core thesis while also introducing some genuinely new wrinkles.
LLMs are the apotheosis of text supremacy. The entire edifice of foundation models is built on text. The reason LLMs are so capable is precisely that the internet’s accumulated text was a dense, searchable, linkable, structured knowledge base — exactly the properties you argued for. If the internet had been primarily video since 2000, we would not have GPT-4. In a sense, the world bet on text as the training substrate, and it paid off in a way that creates further lock-in to text.
Text generation is now free. One of your core arguments was that text is cheaper to produce. LLMs make this even more true — to the point of near-zero marginal cost for competent text. This should further shift the balance toward text, since the production-cost advantage of video erodes (it still takes effort to make a good video), while the production-cost advantage of text collapses entirely in the other direction.
LLMs as text-to-everything interfaces. Rather than video replacing text for search and information retrieval, we’re now seeing text interfaces that generate code, images, analysis, and even video (Sora, Runway, etc.). Text has become the universal command layer — you prompt in text and get any modality back. This is a stronger form of text dominance than you argued for: text isn’t just winning the communication war, it’s becoming the control plane for all modalities.
But: video-to-text pipelines are getting very good. Auto-transcription (Whisper-quality and above), video summarization, multimodal models that can “read” video — these reduce one of text’s competitive advantages (searchability). If every YouTube video is automatically transcribed, indexed, and queryable, the search advantage of text partially transfers to video. This is a genuine new development that could modestly weaken your case.
The LLM UI is pure augmented text. Claude, ChatGPT, Perplexity — these are the dominant new computing interfaces of the 2020s, and they are quintessentially text-native. The “augmented text” future you described arrived in the form of conversational AI rather than the richer Facebook-style social annotations you imagined, but the basic prediction was correct.
A subtle reversal for LLM-generated content: one concern is that as text generation becomes cheap and ubiquitous, human-produced video may become a stronger authenticity signal. If you can’t tell whether a text was written by a person or an LLM, the value of video as proof-of-person might actually increase. This is speculative but worth watching.
Overall verdict: I’d say roughly 70% of the piece has aged well, with the main blind spot being the TikTok-style video revolution and the underestimation of podcast culture. The LLM era has, if anything, been a late-stage vindication of the core thesis — text dominated long enough and densely enough to become the training substrate for the most transformative technology in decades, and that technology is now making text even more dominant as a control interface. Your machine learning passage in particular reads almost prophetically given where things ended up.