I was a relatively late adopter of the smartphone. I was still using a flip phone until around 2015 or 2016 ish. From 2013 to early 2015, I worked as a data scientist at a startup whose product was a mobile social media app; my determination to avoid smartphones became somewhat of a joke there.
Even back then, developers talked about UI design for smartphones in terms of attention. Like, the core “advantages” of the smartphone were the “ability to present timely information” (i.e. interrupt/distract you) and always being on hand. Also it was small, so anything too complicated to fit in like three words and one icon was not going to fly.
… and, like, man, that sure did not make me want to buy a smartphone. Even today, I view my phone as a demon which will try to suck away my attention if I let my guard down. I have zero social media apps on there, and no app ever gets push notif permissions when not open except vanilla phone calls and SMS.
People would sometimes say something like “John, you should really get a smartphone, you’ll fall behind without one” and my gut response was roughly “No, I’m staying in place, and the rest of you are moving backwards”.
And in hindsight, boy howdy do I endorse that attitude! Past John’s gut was right on the money with that one.
I notice that I have an extremely similar gut feeling about LLMs today. Like, when I look at the people who are relatively early adopters, making relatively heavy use of LLMs… I do not feel like I’ll fall behind if I don’t leverage them more. I feel like the people using them a lot are mostly moving backwards, and I’m staying in place.
I’ve updated marginally towards this (as a guy pretty focused on LLM-augmentation. I anticipated LLM brain rot, but it still was more pernicious/fast than I expected)
I do still think some-manner-of-AI-integration is going to be an important part of “moving forward” but probably not whatever capitalism serves up.
I have tried out using them pretty extensively for coding. The speedup is real, and I expect to get more real. Right now it’s like a pretty junior employee that I get to infinitely micromanage. But it definitely does lull me into a lower agency state where instead of trying to solve problems myself I’m handing them off to LLMs much of the time to see if it can handle it.
During work hours, I try to actively override this, i.e. have the habit “send LLM off, and then go back to thinking about some kind of concrete thing (although often a higher level strategy.” But, this becomes harder to do as it gets later in the day and I get more tired.
One of the benefits of LLMs is that you can do moderately complex cognitive work* while tired (*that a junior engineer could do). But, that means by default a bunch of time is spent specifically training the habit of using LLMs in a stupid way.
(I feel sort of confused about how people who don’t use it for coding are doing. With coding, I can feel the beginnings of a serious exoskeleton that can build structures around me with thought. Outside of that, I don’t know of it being more than a somewhat better google).
I currently mostly avoid interactions that treat the AI like a person-I’m-talking to. That way seems most madness inducing.
Outside of [coding], I don’t know of it being more than a somewhat better google
I’ve recently tried heavily leveraging o3 as part of a math-research loop.
I have never been more bearish on LLMs automating any kind of research than I am now.
And I’ve tried lots of ways to make it work. I’ve tried telling it to solve the problem without any further directions, I’ve tried telling it to analyze the problem instead of attempting to solve it, I’ve tried dumping my own analysis of the problem into its context window, I’ve tried getting it to search for relevant lemmas/proofs in math literature instead of attempting to solve it, I’ve tried picking out a subproblem and telling it to focus on that, I’ve tried giving it directions/proof sketches, I’ve tried various power-user system prompts, I’ve tried resampling the output thrice and picking the best one. None of this made it particularly helpful, and the bulk of the time was spent trying to spot where it’s lying or confabulating to me in its arguments or proofs (which it ~always did).
It was kind of okay for tasks like “here’s a toy setup, use a well-known formula to compute the relationships between A and B”, or “try to rearrange this expression into a specific form using well-known identities”, which are relatively menial and freed up my working memory for more complicated tasks. But it’s pretty minor usefulness (and you have to re-check the outputs for errors anyway).
I assume there are math problems at which they do okay, but that capability sure is brittle. I don’t want to overupdate here, but geez, getting LLMs from here to the Singularity in 2-3 years just doesn’t feel plausible.
[disclaimer, not a math guy, only barely knows what he’s talking about, if this next thought is stupid I’m interested to learn more]
I don’t expect this to fix it right now, but, one thing I don’t think you listed is doing the work in lean or some other proof assistant that lets you check results immediately? I expect LLMs to first be able to do math in that format because it’s the format you can actually do a lot of training in. And it’d mean you can verify results more quickly.
My current vague understanding is that lean is normally too cumbersome to be a reasonable to work in, but, that’s the sort of thing that could change with LLMs in the mix.
I did actually try a bit of that back in the o1 days. What I’ve found is that getting LLMs to output formal Lean proofs is pretty difficult: they really don’t want to do that. When they’re not making mistakes, they use informal language as connective tissue between Lean snippets, they put in “sorry”s (a placeholder that makes a lemma evaluate as proven), and otherwise try to weasel out of it.
This is something that should be solvable by fine-tuning, but at the time, there weren’t any publicly available decent models fine-tuned for that.
We do have DeepSeek-Prover-V2 now, though. I should look into it at some point. But I am not optimistic, sounds like it’s doing the same stuff, just more cleverly.
(I had a bit of an epistemic rollercoaster making this prediction, I updated “by the time someone makes an actually worthwhile Math AI, even if lean was an important part of it’s training process, it’s probably not that hard to do additional fine tuning that gets it to output stuff in a more standard mathy format. But, then, it seemed like it was still going to be important to quickly check it wasn’t blatantly broken as part of the process)
(I feel sort of confused about how people who don’t use it for coding are doing. With coding, I can feel the beginnings of a serious exoskeleton that can build structures around me with thought. Outside of that, I don’t know of it being more than a somewhat better google).
There’s common ways I currently use (the free version of) ChatGPT that are partially categorizable as “somewhat better search engine”, but where I feel like that’s not representative of the real differences. A lot of this is coding-related, but not all, and the reasons I use it for coding-related and non-coding-related tasks feel similar. When it is coding-related, it’s generally not of the form of asking it to write code for me that I’ll then actually put into a project, though occasionally I will ask for example snippets which I can use to integrate the information better mentally before writing what I actually want.
The biggest difference in feel is that a chat-style interface is predictable and compact and avoids pushing a full-sized mental stack frame and having to spill all the context of whatever I was doing before. (The name of the website Stack Exchange is actually pretty on point here, insofar as they were trying to provide something similar from crowdsourcing!) This is something I can see being a source of creeping mental laziness—but it depends on the size and nature of the rest of the stack: if you were already under high context-retention load relative to your capabilities, and you’re already task-saturated enough, and you use a chatbot for leaf calls that would otherwise cause you to have to do a lot of inefficient working-memory juggling, then it seems like you’re already getting a lot of the actually-useful mental exercise at the other levels and you won’t be eliminating much of it, just getting some probabilistic task speedups.
In roughly descending order of “qualitatively different from a search engine” (which is not the same as “most impactful to me in practice”):
Some queries are reverse concept search, which to me is probably the biggest and hardest-to-replicate advantage over traditional search engine: I often have the shape of a concept that seems useful, but because I synthesized it myself rather than drawing from popular existing uses, I don’t know what it’s called. This can be checked for accuracy using a traditional search engine in the forward direction once I have the candidate term.
Some queries are for babble purposes: “list a bunch of X” and I’ll throw out 90% of them for actual use but use the distribution to help nudge my own imagination—generally I’ll do my own babble first and then augment it, to limit priming effects. There’s potential for memetic health issues here, but in my case most of these are isolated enough that I don’t expect them to combine to create larger problems. (In a qualitatively similar way but with a different impact, some of it is pure silliness. “Suppose the protagonists of Final Fantasy XIII had Geese powers. What kind of powers might they have?”)
Synthesis and shaping of information is way different from search engine capabilities. This includes asking for results tailored along specific axes I care about where it’s much less likely an existing webpage author has used that as a focus, small leaps of connective reasoning that would take processing and filtering through multiple large pages to do via search engine, and comparisons between popular instances of a class (in coding contexts, often software components) where sometimes someone’s actually written up the comparison and sometimes not. Being able to fluently ask followups that move from a topic to a subtopic or related topic without losing all the context is also very useful. “Tell me about the main differences between X1 and X2.” → “This new thing introduced in X2, is that because of Y?” (but beware of sycophancy biases if you use leading questions like that)
(Beyond this point we get closer to “basically a search engine”.)
Avoiding the rise in Web annoyances is a big one in practice—which ties into the weird tension of social contracts around Internet publishing being kind of broken right now, but from an information-consumer perspective, the reprocessed version is often superior. If a very common result is that a search engine will turn up six plausible results, and three of them are entirely blog slop (often of a pre-LLM type!) which is too vague to be useful for me, two of them ask me to sign up for a ‘free’ account to continue but only after I’ve started reading the useless intro text, and one of them contains the information I need in theory but I have to be prepared to click the “reject cookies” button, and click the close button on the “get these delivered to your inbox” on-scroll popup, and hope it doesn’t load another ten-megabyte hero image that I don’t care about and chew through my cellular quota in the process, and if I try to use browser extensions to combat this then the text doesn’t load, and so on and so on… then obviously I will switch to asking the chatbot first! “most of the content is buried in hour-long videos” is skew to this but results in the same for me.
In domains like “how would I get started learning skill X”, where there’s enough people who can get a commercial advantage through SEO’ing that into “well, take our course or buy our starter kit” (but usually subtler than that), those results seem (and I think for now probably are) less trustworthy than chatbot output that goes directly to concrete aspects that can be checked more cleanly, and tend to disguise themselves to be hard to filter out without reading a lot of the way through. Of course, there’s obvious ways for this not to last, either as SEO morphs into AIO or if the chatbot providers start selling the equivalent of product placement behind the scenes.
(fwiw, I never felt like phones offered any real “you need them to not fall behind”. They are kinda a nice-to-have in some situations. I do need them for uber/lyfy and maps, I use them for other things which have some benefits and costs, this post is upweighting “completely block the internet on my phone.” I don’t have any social media apps on my phone but it doesn’t matter much, I just use the web browser)
I imagine this differs a lot based on what social position you’re already in and where you’re likely to get your needs met. When assumptions like “everyone has a smartphone” become sufficiently widespread, you can be blocked off from things unpredictably when you don’t meet them. You often can’t tell which things these are in advance: simplification pressure causes a phase transition from “communicated request” to “implicit assumption”, and there’s too many widely-distributed ways for the assumption to become relevant, so doing your own modeling will produce a “reliably don’t need” result so infrequently as to be effectively useless. Then, if making the transition to conformity when you notice a potential opportunity is too slow or is blocked by e.g. resource constraints or value differences, a lot of instant-lose faces get added to the social dice you roll. If your anticipated social set is already stable and well-adapted to you, you may not be rolling many dice, but if you’re precarious, or searching for breakthrough opportunities, or just have a role with more wide-ranging and unpredictable requirements on which interactions you need to succeed at, it’s a huge penalty. Other technologies this often happens with in the USA, again depending on your social class and milieu, include cars, credit cards, and Facebook accounts.
(It feels like there has to already be an explainer for this somewhere in the LW-sphere, right? I didn’t see an obvious one, though…)
You’ve reminded me of a perspective I was meaning to include but then forgot to, actually. From the perspective of an equilibrium in which everyone’s implicitly expected to bring certain resources/capabilities as table stakes, making a personal decision that makes your life better but reduces your contribution to the pool can be seen as defection—and on a short time horizon or where you’re otherwise forced to take the equilibrium for granted, it seems hard to refute! (ObXkcd: “valuing unit standardization over being helpful possibly makes me a bad friend” if we take the protagonist as seeing “US customary units” as an awkward equilibrium.) Some offshoots of this which I’m not sure what to make of:
If the decision would lead to a better society if everyone did it, and leads to an improvement for you if only you do it, but requires the rest of a more localized group to spend more energy to compensate for you if you do it and they don’t, we have a sort of “incentive misalignment sandwich” going on. In practice I think there’s usually enough disagreement about the first point that this isn’t clear-cut, but it’s interesting to notice.
In the face of technological advances, what continues to count as table stakes tends to get set by Moloch and mimetic feedback loops rather than intentionally. In a way, people complaining vociferously about having to adopt new things are arguably acting in a counter-Moloch role here, but in the places I’ve seen that happen, it’s either been ineffective or led to a stressful and oppressive atmosphere of its own (or, most commonly and unfortunately, both).
I think intuitive recognition of (2) is a big motivator behind attacking adopters of new technology that might fall into this pattern, in a way that often gets poorly expressed in a “tech companies ruin everything” type of way. Personally taking up smartphones, or cars, or—nowadays the big one that I see in my other circles—generative AI, even if you don’t yourself look down on or otherwise directly negatively impact non-users, can be seen as playing into a new potential equilibrium where if you can, you ‘must’, or else you’re not putting in as much as everyone else, and so everyone else will gradually find that they get boxed in and any negative secondary effects on them are irrelevant compared to the phase transition energy. A comparison that comes to mind is actually labor unions; that’s another case where restraining individually expressed capabilities in order to retain a better collective bargaining position for others comes into play, isn’t it?
… hmm, come to think of it, maybe part of conformity-pressure in general can be seen as a special case of this where the pool resource is more purely “cognition and attention spent dealing with non-default things” and the nonconformity by default has more of a purely negative impact on that axis, whereas conformity-pressure over technology with specific capabilities causes the nature of the pool resource to be pulled in the direction of what the technology is providing and there’s an active positive thing going on that becomes the baseline… I wonder if anything useful can be derived from thinking about those two cases as denoting an axis of variation.
And when the conformity is to a new norm that may be more difficult to understand but produces relative positive externalities in some way, is that similar to treating the new norm as a required table stakes cognitive technology?
I found LLMs to be very useful for literature research. They can find relevant prior work that you can’t find with a search engine because you don’t know the right keywords. This can be a significant force multiplier.
They also seem potentially useful for quickly producing code for numerical tests of conjectures, but I only started experimenting with that.
Other use cases where I found LLMs beneficial:
Taking a photo of a menu in French (or providing a link to it) and asking it which dishes are vegan.
Recommending movies (I am a little wary of some kind of meme poisoning, but I don’t watch movies very often, so seems ok).
That said, I do agree that early adopters seem like they’re overeager and maybe even harming themselves in some way.
I’ve been trying to use Deep Research tools as a way to find hyper-specific fiction recommendations as well. The results have been mixed. They don’t seem to be very good at grokking the hyper-specificness of what you’re looking for, usually they have a heavy bias towards the popular stuff that outweighs what you actually requested[1], and if you ask them to look for obscure works, they tend to output garbage instead of hidden gems (because no taste).
It did produce good results a few times, though, and is only slightly worse than asking for recommendations on r/rational. Possibly if I iterate on the prompt a few times (e. g., explicitly point out the above issues?), it’ll actually become good.
Like, suppose I’m looking for some narrative property X. I want to find fiction with a lot of X. But what the LLM does is multiplying the amount of X in a work by the work’s popularity, so that works that are low in X but very popular end up in its selection.
I tend to have some luck with concrete analogies sometimes. For example I asked for the equivalent of Tonedeff (His polymer album is my favorite album) in other genres and it recommended me Venetian Snares. I then listened to some of his songs and it seemed like the kind of experimental stuff where I might find something I find interesting. Venetian Snares has 80k monthly listeners while Tonedeff has 14K, so there might be some weighting towards popularity, but that seems mild.
I can think of reasons why some would be wary, and am waried of something which could be called “meme poisoning” myself when I watch moves, but am curious what kind of meme poisoning you have in mind here.
I am perhaps an interesting corner case. I make extrenely heavy use of LLMs, largely via APIs for repetitive tasks. I sometimes run a quarter million queries in a day, all of which produce structured output. Incorrect output happens, but I design the surrounding systems to handle that.
A few times a week, I might ask a concrete question and get a response, which I treat with extreme skepticism.
But I don’t talk to the damn things. That feels increasingly weird and unwise.
Agree about phones (in fact I am seriously considering switching to a flip phone and using my iphone only for things like navigation).
Not so sure about LLMs. I had your attitude initially, and I still consider them an incredibly dangerous mental augmentation. But I do think that conservatively throwing a question at them to find searchable keywords is helpful, if you maintain the attitude that they are actively trying to take over your brain and therefore remain vigilant.
Not speaking for john but, I think LLMs can cause a lack of gears lvl understanding, more vibe coding, less mental flexibility due to lack of deliberate thought and more dependency on it for thinking in general. A lot of my friends will most likely never learn coding properly and rely solely on chatgpt, it would be similar to calculators—which reduced people’s ability to do mental maths— but for thinking.
LLM’s danger lies in its ability to solve the majority of simple problems. This reduces opportunities to learn skills or benefit from the training these tasks provide. This allows for a level of mental stagnation, or even degradation, depending on how frequently you use LLMs to handle problems. In other words, it induces mental laziness. This is one way it’s not moving people forward and in more severe cases backward.
As a side note, it is also harmful to the majority of current education institutions, as it can solve most academic problems. I have personally seen people use it to do homework, write essays, or even write term papers. Some of the more crafty students manage to cheat with it on exams. This creates a very shallow education, which is bad for many reasons.
Yes, I do think that. They don’t actively diminish thought, after all, it’s a tool you decide to use. But when you use it to handle a problem, you lose the thoughts, and the growth you could’ve had solving it yourself. It could be argued, however, that if you are experienced enough in solving such problems, there isn’t much to lose, and you gain time to pursue other issues.
But as to why I think this way: people already don’t learn skills because chatGPT can do it for them, as lesswronguser123 said “A lot of my friends will most likely never learn coding properly and rely solely on ChatGPT”, and not just his friends use it this way. Such people, at the very least, lose the opportunity to adopt a programming mindset, which is useful beyond programming.
Outside of people not learning skills, I also believe there is a lot of potential to delegate almost all of your thinking to chatGPT. For example: I could have used it to write this response, decide what to eat for breakfast, tell me what I should do in the future, etc. It can tell you what to do on almost every day-to-day decision. Some use it to a lesser extent, some to a greater, but you do think less if you use it this way.
Does it redistrubute thinking to another topic? I believe it depends on the person in question, some use it to have more time to solve a more complex problem, others to have more time for entertainment.
I think that these are genuinely hard questions to answer in a scientific way. My own speculation is that using AI to solve problems is a skill of its own, along with recognizing which problems they are currently not good for. Some use of LLMs teaches these skills, which is useful.
I think a potential failure mode for AI might be when people systematically choose to work on lower-impact problems that AI can be used to solve, rather than higher-impact problems that AI is less useful for but that can be solved in other ways. Of course, AI can also increase people’s ambitions by unlocking the ability to pursue higher-impact goals they would not have been able to otherwise achieve. Whether or not AI increases or decreases human ambition on net seems like a key question.
In my world, I see limited use of AI except as a complement to traditional internet search, a coding assistant by competent programmers, a sort of Grammarly on steroids, an OK-at-best tutor that’s cheap and always available on any topic, and a way to get meaningless paperwork done faster. These use cases all seem basically ambition-enhancing to me. That’s the reason I asked John why he’s worried about this version of AI. My experience is that once I gained some familiarity with the limitations of AI, it’s been a straightforwaredly useful tool, with none of the serious downsides I have experienced from social media and smartphones.
The issues I’ve seen seem to have to do with using AI to deepfake political policy proposals, homework, blog posts, and job applications. These are genuine and serious problems, but mainly have to do with adding a tremendous amount of noise to collective discourse rather than the self-sabotage enabled by smartphones and social media. So I’m wondering if John’s more concerned about those social issues or by some sort of self-sabotage capacity from AI that I’m not seeing. Using AI to do your homework is obviously self-sabotage, but given the context I’m assuming that’s not what John’s talking about.
I mean, they’re great as search engines or code-snippet writers (basically, search engine for standard functions). If someone thinks that gippities know stuff or can think or write well, that could be brainrotting.
From my perspective, good things about smartphones:
phone and camera and navigation is the same device
very rarely, check something online
buy tickets for mass transit
my contacts are backed up in the cloud
Bad things:
notifications
The advantages outweigh the disadvantages, but it requires discipline about what you install.
(Food for though: If only I had the same discipline about which web services I create an account for and put them into bookmarks on my PC.)
People would sometimes say something like “John, you should really get a smartphone, you’ll fall behind without one” and my gut response was roughly “No, I’m staying in place, and the rest of you are moving backwards”.
Similar here, but that’s because no one could give me a good use case. (I don’t consider social networks on smartphone to be good.)
And it’s probably similar with LLMs, depends on how specifically you use them. I use them to ask questions (like a smarter version of Google) that I try to verify e.g. on Wikipedia afterwards, and sometimes to write code. Those seem like good things to me. There are probably bad ways to use them, but that is not what I would typically do.
My main concern with heavy LLM usage is what Paul Graham discusses in Writes and Write-Nots. His argument is basically that writing is thinking and that if you use LLM’s to do your writing for you, well, your ability to think will erode.
For smart phones there was one argument that moved me a moderate amount. I’m a web developer and startup founder. I was talking to my cousin’s boyfriend who is also in tech. He made the argument to me that if I don’t actively use smart phones I won’t be able to empathize as much with smart phone users, which is important because to a meaningful extent, that’s who I’m building for.
I didn’t think the empathy point was as strong as my cousin’s boyfriend thought it was. Like, he seemed to think it was pretty essential and that if I don’t use smart phones I just wouldn’t be able to develop enough empathy to build a good product. I, on the other hand, saw it as something “useful” but not “essential”. Looking back, I think I’d downgrade it to something like “a little useful” instead of “useful”.
I’m not sure where I’m going with this, exactly. Just kinda reflecting and thinking out loud.
Conditional on LLMs scaling to AGI, I feel like it’s a contradiction to say that “LLMs offer little or negative utility AND it’s going to stay this way”. My model is that we are either dying in a couple years to LLMs getting us to AGI, and we are going to have a year or two or of AIs that can provide incredible utility, or we are not dying to LLMs and the timelines are longer.
I think I read somewhere that you don’t believe LLMs will get us to AGI, so this might already be implicit in your model? I personally am putting at least some credence on the ai-2027 model, which predicts superhuman coders in the near future. (Not saying that I believe this is the most probable future, just that I find it convincing enough that I want to be prepared for it.)
Up until recently I was in the “LLMs offer zero utility” camp (for coding), but now at work we have a Cursor plan (still would not pay for it for personal use probably), and with a lot of trial and error I feel like I am finding the kinds of tasks where AIs can offer a bit of utility, and I am slowly moving towards the “marginal utility” camp.
One kind of thing I like using it for is small scripts to automate bits of my workflow. E.g. I have an idea for a script, I know it would take me 30m-1h to implement it, but it’s not worth it because e.g. it would only save me a few seconds each time. But if I can reduce the time investment to only a few minutes by giving the task to the LLM, it can suddenly be worth it.
I would be interested in other people’s experiences with the negative side effects of LLM use. What are the symptoms/warning signs of “LLM brain rot”? I feel like with my current use I am relatively well equipped to avoid that:
I only ask things from LLMs that I know I could solve in a few hours tops.
I code review the result, tell it if it did something stupid.
90% of my job is stuff that is currently not close to being LLM automatable anyway.
I was a relatively late adopter of the smartphone. I was still using a flip phone until around 2015 or 2016 ish. From 2013 to early 2015, I worked as a data scientist at a startup whose product was a mobile social media app; my determination to avoid smartphones became somewhat of a joke there.
Even back then, developers talked about UI design for smartphones in terms of attention. Like, the core “advantages” of the smartphone were the “ability to present timely information” (i.e. interrupt/distract you) and always being on hand. Also it was small, so anything too complicated to fit in like three words and one icon was not going to fly.
… and, like, man, that sure did not make me want to buy a smartphone. Even today, I view my phone as a demon which will try to suck away my attention if I let my guard down. I have zero social media apps on there, and no app ever gets push notif permissions when not open except vanilla phone calls and SMS.
People would sometimes say something like “John, you should really get a smartphone, you’ll fall behind without one” and my gut response was roughly “No, I’m staying in place, and the rest of you are moving backwards”.
And in hindsight, boy howdy do I endorse that attitude! Past John’s gut was right on the money with that one.
I notice that I have an extremely similar gut feeling about LLMs today. Like, when I look at the people who are relatively early adopters, making relatively heavy use of LLMs… I do not feel like I’ll fall behind if I don’t leverage them more. I feel like the people using them a lot are mostly moving backwards, and I’m staying in place.
I’ve updated marginally towards this (as a guy pretty focused on LLM-augmentation. I anticipated LLM brain rot, but it still was more pernicious/fast than I expected)
I do still think some-manner-of-AI-integration is going to be an important part of “moving forward” but probably not whatever capitalism serves up.
I have tried out using them pretty extensively for coding. The speedup is real, and I expect to get more real. Right now it’s like a pretty junior employee that I get to infinitely micromanage. But it definitely does lull me into a lower agency state where instead of trying to solve problems myself I’m handing them off to LLMs much of the time to see if it can handle it.
During work hours, I try to actively override this, i.e. have the habit “send LLM off, and then go back to thinking about some kind of concrete thing (although often a higher level strategy.” But, this becomes harder to do as it gets later in the day and I get more tired.
One of the benefits of LLMs is that you can do moderately complex cognitive work* while tired (*that a junior engineer could do). But, that means by default a bunch of time is spent specifically training the habit of using LLMs in a stupid way.
(I feel sort of confused about how people who don’t use it for coding are doing. With coding, I can feel the beginnings of a serious exoskeleton that can build structures around me with thought. Outside of that, I don’t know of it being more than a somewhat better google).
I currently mostly avoid interactions that treat the AI like a person-I’m-talking to. That way seems most madness inducing.
(Disclaimer: only partially relevant rant.)
I’ve recently tried heavily leveraging o3 as part of a math-research loop.
I have never been more bearish on LLMs automating any kind of research than I am now.
And I’ve tried lots of ways to make it work. I’ve tried telling it to solve the problem without any further directions, I’ve tried telling it to analyze the problem instead of attempting to solve it, I’ve tried dumping my own analysis of the problem into its context window, I’ve tried getting it to search for relevant lemmas/proofs in math literature instead of attempting to solve it, I’ve tried picking out a subproblem and telling it to focus on that, I’ve tried giving it directions/proof sketches, I’ve tried various power-user system prompts, I’ve tried resampling the output thrice and picking the best one. None of this made it particularly helpful, and the bulk of the time was spent trying to spot where it’s lying or confabulating to me in its arguments or proofs (which it ~always did).
It was kind of okay for tasks like “here’s a toy setup, use a well-known formula to compute the relationships between A and B”, or “try to rearrange this expression into a specific form using well-known identities”, which are relatively menial and freed up my working memory for more complicated tasks. But it’s pretty minor usefulness (and you have to re-check the outputs for errors anyway).
I assume there are math problems at which they do okay, but that capability sure is brittle. I don’t want to overupdate here, but geez, getting LLMs from here to the Singularity in 2-3 years just doesn’t feel plausible.
Nod.
[disclaimer, not a math guy, only barely knows what he’s talking about, if this next thought is stupid I’m interested to learn more]
I don’t expect this to fix it right now, but, one thing I don’t think you listed is doing the work in lean or some other proof assistant that lets you check results immediately? I expect LLMs to first be able to do math in that format because it’s the format you can actually do a lot of training in. And it’d mean you can verify results more quickly.
My current vague understanding is that lean is normally too cumbersome to be a reasonable to work in, but, that’s the sort of thing that could change with LLMs in the mix.
I agree that it’s a promising direction.
I did actually try a bit of that back in the o1 days. What I’ve found is that getting LLMs to output formal Lean proofs is pretty difficult: they really don’t want to do that. When they’re not making mistakes, they use informal language as connective tissue between Lean snippets, they put in “sorry”s (a placeholder that makes a lemma evaluate as proven), and otherwise try to weasel out of it.
This is something that should be solvable by fine-tuning, but at the time, there weren’t any publicly available decent models fine-tuned for that.
We do have DeepSeek-Prover-V2 now, though. I should look into it at some point. But I am not optimistic, sounds like it’s doing the same stuff, just more cleverly.
Relevant: Terence Tao does find them helpful for some Lean-related applications.
yeah, it’s less that I’d bet it works now, just, whenever it DOES start working, it seems likely it’d be through this mechanism.
⚖ If Thane Ruthenis thinks there are AI tools that can meaningfully help with Math by this point, did they first have a noticeable period (> 1 month) where it was easier to get work out of them via working in lean-or-similar? (Raymond Arnold: 25% & 60%)
(I had a bit of an epistemic rollercoaster making this prediction, I updated “by the time someone makes an actually worthwhile Math AI, even if lean was an important part of it’s training process, it’s probably not that hard to do additional fine tuning that gets it to output stuff in a more standard mathy format. But, then, it seemed like it was still going to be important to quickly check it wasn’t blatantly broken as part of the process)
There’s common ways I currently use (the free version of) ChatGPT that are partially categorizable as “somewhat better search engine”, but where I feel like that’s not representative of the real differences. A lot of this is coding-related, but not all, and the reasons I use it for coding-related and non-coding-related tasks feel similar. When it is coding-related, it’s generally not of the form of asking it to write code for me that I’ll then actually put into a project, though occasionally I will ask for example snippets which I can use to integrate the information better mentally before writing what I actually want.
The biggest difference in feel is that a chat-style interface is predictable and compact and avoids pushing a full-sized mental stack frame and having to spill all the context of whatever I was doing before. (The name of the website Stack Exchange is actually pretty on point here, insofar as they were trying to provide something similar from crowdsourcing!) This is something I can see being a source of creeping mental laziness—but it depends on the size and nature of the rest of the stack: if you were already under high context-retention load relative to your capabilities, and you’re already task-saturated enough, and you use a chatbot for leaf calls that would otherwise cause you to have to do a lot of inefficient working-memory juggling, then it seems like you’re already getting a lot of the actually-useful mental exercise at the other levels and you won’t be eliminating much of it, just getting some probabilistic task speedups.
In roughly descending order of “qualitatively different from a search engine” (which is not the same as “most impactful to me in practice”):
Some queries are reverse concept search, which to me is probably the biggest and hardest-to-replicate advantage over traditional search engine: I often have the shape of a concept that seems useful, but because I synthesized it myself rather than drawing from popular existing uses, I don’t know what it’s called. This can be checked for accuracy using a traditional search engine in the forward direction once I have the candidate term.
Some queries are for babble purposes: “list a bunch of X” and I’ll throw out 90% of them for actual use but use the distribution to help nudge my own imagination—generally I’ll do my own babble first and then augment it, to limit priming effects. There’s potential for memetic health issues here, but in my case most of these are isolated enough that I don’t expect them to combine to create larger problems. (In a qualitatively similar way but with a different impact, some of it is pure silliness. “Suppose the protagonists of Final Fantasy XIII had Geese powers. What kind of powers might they have?”)
Synthesis and shaping of information is way different from search engine capabilities. This includes asking for results tailored along specific axes I care about where it’s much less likely an existing webpage author has used that as a focus, small leaps of connective reasoning that would take processing and filtering through multiple large pages to do via search engine, and comparisons between popular instances of a class (in coding contexts, often software components) where sometimes someone’s actually written up the comparison and sometimes not. Being able to fluently ask followups that move from a topic to a subtopic or related topic without losing all the context is also very useful. “Tell me about the main differences between X1 and X2.” → “This new thing introduced in X2, is that because of Y?” (but beware of sycophancy biases if you use leading questions like that)
(Beyond this point we get closer to “basically a search engine”.)
Avoiding the rise in Web annoyances is a big one in practice—which ties into the weird tension of social contracts around Internet publishing being kind of broken right now, but from an information-consumer perspective, the reprocessed version is often superior. If a very common result is that a search engine will turn up six plausible results, and three of them are entirely blog slop (often of a pre-LLM type!) which is too vague to be useful for me, two of them ask me to sign up for a ‘free’ account to continue but only after I’ve started reading the useless intro text, and one of them contains the information I need in theory but I have to be prepared to click the “reject cookies” button, and click the close button on the “get these delivered to your inbox” on-scroll popup, and hope it doesn’t load another ten-megabyte hero image that I don’t care about and chew through my cellular quota in the process, and if I try to use browser extensions to combat this then the text doesn’t load, and so on and so on… then obviously I will switch to asking the chatbot first! “most of the content is buried in hour-long videos” is skew to this but results in the same for me.
In domains like “how would I get started learning skill X”, where there’s enough people who can get a commercial advantage through SEO’ing that into “well, take our course or buy our starter kit” (but usually subtler than that), those results seem (and I think for now probably are) less trustworthy than chatbot output that goes directly to concrete aspects that can be checked more cleanly, and tend to disguise themselves to be hard to filter out without reading a lot of the way through. Of course, there’s obvious ways for this not to last, either as SEO morphs into AIO or if the chatbot providers start selling the equivalent of product placement behind the scenes.
(fwiw, I never felt like phones offered any real “you need them to not fall behind”. They are kinda a nice-to-have in some situations. I do need them for uber/lyfy and maps, I use them for other things which have some benefits and costs, this post is upweighting “completely block the internet on my phone.” I don’t have any social media apps on my phone but it doesn’t matter much, I just use the web browser)
I imagine this differs a lot based on what social position you’re already in and where you’re likely to get your needs met. When assumptions like “everyone has a smartphone” become sufficiently widespread, you can be blocked off from things unpredictably when you don’t meet them. You often can’t tell which things these are in advance: simplification pressure causes a phase transition from “communicated request” to “implicit assumption”, and there’s too many widely-distributed ways for the assumption to become relevant, so doing your own modeling will produce a “reliably don’t need” result so infrequently as to be effectively useless. Then, if making the transition to conformity when you notice a potential opportunity is too slow or is blocked by e.g. resource constraints or value differences, a lot of instant-lose faces get added to the social dice you roll. If your anticipated social set is already stable and well-adapted to you, you may not be rolling many dice, but if you’re precarious, or searching for breakthrough opportunities, or just have a role with more wide-ranging and unpredictable requirements on which interactions you need to succeed at, it’s a huge penalty. Other technologies this often happens with in the USA, again depending on your social class and milieu, include cars, credit cards, and Facebook accounts.
(It feels like there has to already be an explainer for this somewhere in the LW-sphere, right? I didn’t see an obvious one, though…)
yeah a friend of mine gave in because she was getting so much attitude about needing people to give her directions.
You’ve reminded me of a perspective I was meaning to include but then forgot to, actually. From the perspective of an equilibrium in which everyone’s implicitly expected to bring certain resources/capabilities as table stakes, making a personal decision that makes your life better but reduces your contribution to the pool can be seen as defection—and on a short time horizon or where you’re otherwise forced to take the equilibrium for granted, it seems hard to refute! (ObXkcd: “valuing unit standardization over being helpful possibly makes me a bad friend” if we take the protagonist as seeing “US customary units” as an awkward equilibrium.) Some offshoots of this which I’m not sure what to make of:
If the decision would lead to a better society if everyone did it, and leads to an improvement for you if only you do it, but requires the rest of a more localized group to spend more energy to compensate for you if you do it and they don’t, we have a sort of “incentive misalignment sandwich” going on. In practice I think there’s usually enough disagreement about the first point that this isn’t clear-cut, but it’s interesting to notice.
In the face of technological advances, what continues to count as table stakes tends to get set by Moloch and mimetic feedback loops rather than intentionally. In a way, people complaining vociferously about having to adopt new things are arguably acting in a counter-Moloch role here, but in the places I’ve seen that happen, it’s either been ineffective or led to a stressful and oppressive atmosphere of its own (or, most commonly and unfortunately, both).
I think intuitive recognition of (2) is a big motivator behind attacking adopters of new technology that might fall into this pattern, in a way that often gets poorly expressed in a “tech companies ruin everything” type of way. Personally taking up smartphones, or cars, or—nowadays the big one that I see in my other circles—generative AI, even if you don’t yourself look down on or otherwise directly negatively impact non-users, can be seen as playing into a new potential equilibrium where if you can, you ‘must’, or else you’re not putting in as much as everyone else, and so everyone else will gradually find that they get boxed in and any negative secondary effects on them are irrelevant compared to the phase transition energy. A comparison that comes to mind is actually labor unions; that’s another case where restraining individually expressed capabilities in order to retain a better collective bargaining position for others comes into play, isn’t it?
(Now much more tangentially:)
… hmm, come to think of it, maybe part of conformity-pressure in general can be seen as a special case of this where the pool resource is more purely “cognition and attention spent dealing with non-default things” and the nonconformity by default has more of a purely negative impact on that axis, whereas conformity-pressure over technology with specific capabilities causes the nature of the pool resource to be pulled in the direction of what the technology is providing and there’s an active positive thing going on that becomes the baseline… I wonder if anything useful can be derived from thinking about those two cases as denoting an axis of variation.
And when the conformity is to a new norm that may be more difficult to understand but produces relative positive externalities in some way, is that similar to treating the new norm as a required table stakes cognitive technology?
I mostly use it for syntax, and formatting/modifying docs, giving me quick visual designs...
I found LLMs to be very useful for literature research. They can find relevant prior work that you can’t find with a search engine because you don’t know the right keywords. This can be a significant force multiplier.
They also seem potentially useful for quickly producing code for numerical tests of conjectures, but I only started experimenting with that.
Other use cases where I found LLMs beneficial:
Taking a photo of a menu in French (or providing a link to it) and asking it which dishes are vegan.
Recommending movies (I am a little wary of some kind of meme poisoning, but I don’t watch movies very often, so seems ok).
That said, I do agree that early adopters seem like they’re overeager and maybe even harming themselves in some way.
I’ve been trying to use Deep Research tools as a way to find hyper-specific fiction recommendations as well. The results have been mixed. They don’t seem to be very good at grokking the hyper-specificness of what you’re looking for, usually they have a heavy bias towards the popular stuff that outweighs what you actually requested[1], and if you ask them to look for obscure works, they tend to output garbage instead of hidden gems (because no taste).
It did produce good results a few times, though, and is only slightly worse than asking for recommendations on r/rational. Possibly if I iterate on the prompt a few times (e. g., explicitly point out the above issues?), it’ll actually become good.
Like, suppose I’m looking for some narrative property X. I want to find fiction with a lot of X. But what the LLM does is multiplying the amount of X in a work by the work’s popularity, so that works that are low in X but very popular end up in its selection.
I tend to have some luck with concrete analogies sometimes. For example I asked for the equivalent of Tonedeff (His polymer album is my favorite album) in other genres and it recommended me Venetian Snares. I then listened to some of his songs and it seemed like the kind of experimental stuff where I might find something I find interesting. Venetian Snares has 80k monthly listeners while Tonedeff has 14K, so there might be some weighting towards popularity, but that seems mild.
I can think of reasons why some would be wary, and am waried of something which could be called “meme poisoning” myself when I watch moves, but am curious what kind of meme poisoning you have in mind here.
I am perhaps an interesting corner case. I make extrenely heavy use of LLMs, largely via APIs for repetitive tasks. I sometimes run a quarter million queries in a day, all of which produce structured output. Incorrect output happens, but I design the surrounding systems to handle that.
A few times a week, I might ask a concrete question and get a response, which I treat with extreme skepticism.
But I don’t talk to the damn things. That feels increasingly weird and unwise.
Agree about phones (in fact I am seriously considering switching to a flip phone and using my iphone only for things like navigation).
Not so sure about LLMs. I had your attitude initially, and I still consider them an incredibly dangerous mental augmentation. But I do think that conservatively throwing a question at them to find searchable keywords is helpful, if you maintain the attitude that they are actively trying to take over your brain and therefore remain vigilant.
Why do you think LLMs are moving people backwards? With phones, it was their attention-sucking nature. What is it with LLMs?
Not speaking for john but, I think LLMs can cause a lack of gears lvl understanding, more vibe coding, less mental flexibility due to lack of deliberate thought and more dependency on it for thinking in general. A lot of my friends will most likely never learn coding properly and rely solely on chatgpt, it would be similar to calculators—which reduced people’s ability to do mental maths— but for thinking.
LLM’s danger lies in its ability to solve the majority of simple problems. This reduces opportunities to learn skills or benefit from the training these tasks provide. This allows for a level of mental stagnation, or even degradation, depending on how frequently you use LLMs to handle problems. In other words, it induces mental laziness. This is one way it’s not moving people forward and in more severe cases backward.
As a side note, it is also harmful to the majority of current education institutions, as it can solve most academic problems. I have personally seen people use it to do homework, write essays, or even write term papers. Some of the more crafty students manage to cheat with it on exams. This creates a very shallow education, which is bad for many reasons.
Setting aside cheating, do you think LLMs are diminishing opportunities for thought, or redistributing them to other topics? And why?
Yes, I do think that. They don’t actively diminish thought, after all, it’s a tool you decide to use. But when you use it to handle a problem, you lose the thoughts, and the growth you could’ve had solving it yourself. It could be argued, however, that if you are experienced enough in solving such problems, there isn’t much to lose, and you gain time to pursue other issues.
But as to why I think this way: people already don’t learn skills because chatGPT can do it for them, as lesswronguser123 said “A lot of my friends will most likely never learn coding properly and rely solely on ChatGPT”, and not just his friends use it this way. Such people, at the very least, lose the opportunity to adopt a programming mindset, which is useful beyond programming.
Outside of people not learning skills, I also believe there is a lot of potential to delegate almost all of your thinking to chatGPT. For example: I could have used it to write this response, decide what to eat for breakfast, tell me what I should do in the future, etc. It can tell you what to do on almost every day-to-day decision. Some use it to a lesser extent, some to a greater, but you do think less if you use it this way.
Does it redistrubute thinking to another topic? I believe it depends on the person in question, some use it to have more time to solve a more complex problem, others to have more time for entertainment.
I think that these are genuinely hard questions to answer in a scientific way. My own speculation is that using AI to solve problems is a skill of its own, along with recognizing which problems they are currently not good for. Some use of LLMs teaches these skills, which is useful.
I think a potential failure mode for AI might be when people systematically choose to work on lower-impact problems that AI can be used to solve, rather than higher-impact problems that AI is less useful for but that can be solved in other ways. Of course, AI can also increase people’s ambitions by unlocking the ability to pursue higher-impact goals they would not have been able to otherwise achieve. Whether or not AI increases or decreases human ambition on net seems like a key question.
In my world, I see limited use of AI except as a complement to traditional internet search, a coding assistant by competent programmers, a sort of Grammarly on steroids, an OK-at-best tutor that’s cheap and always available on any topic, and a way to get meaningless paperwork done faster. These use cases all seem basically ambition-enhancing to me. That’s the reason I asked John why he’s worried about this version of AI. My experience is that once I gained some familiarity with the limitations of AI, it’s been a straightforwaredly useful tool, with none of the serious downsides I have experienced from social media and smartphones.
The issues I’ve seen seem to have to do with using AI to deepfake political policy proposals, homework, blog posts, and job applications. These are genuine and serious problems, but mainly have to do with adding a tremendous amount of noise to collective discourse rather than the self-sabotage enabled by smartphones and social media. So I’m wondering if John’s more concerned about those social issues or by some sort of self-sabotage capacity from AI that I’m not seeing. Using AI to do your homework is obviously self-sabotage, but given the context I’m assuming that’s not what John’s talking about.
I mean, they’re great as search engines or code-snippet writers (basically, search engine for standard functions). If someone thinks that gippities know stuff or can think or write well, that could be brainrotting.
Agreed, that’s basically how I use them.
...but you are using a phone now. Are you using LLMs? Maybe in both cases it is about using the tool in the way that benefits most?
From my perspective, good things about smartphones:
phone and camera and navigation is the same device
very rarely, check something online
buy tickets for mass transit
my contacts are backed up in the cloud
Bad things:
notifications
The advantages outweigh the disadvantages, but it requires discipline about what you install.
(Food for though: If only I had the same discipline about which web services I create an account for and put them into bookmarks on my PC.)
Similar here, but that’s because no one could give me a good use case. (I don’t consider social networks on smartphone to be good.)
And it’s probably similar with LLMs, depends on how specifically you use them. I use them to ask questions (like a smarter version of Google) that I try to verify e.g. on Wikipedia afterwards, and sometimes to write code. Those seem like good things to me. There are probably bad ways to use them, but that is not what I would typically do.
My main concern with heavy LLM usage is what Paul Graham discusses in Writes and Write-Nots. His argument is basically that writing is thinking and that if you use LLM’s to do your writing for you, well, your ability to think will erode.
I’m similar, for both smart phones and LLM usage.
For smart phones there was one argument that moved me a moderate amount. I’m a web developer and startup founder. I was talking to my cousin’s boyfriend who is also in tech. He made the argument to me that if I don’t actively use smart phones I won’t be able to empathize as much with smart phone users, which is important because to a meaningful extent, that’s who I’m building for.
I didn’t think the empathy point was as strong as my cousin’s boyfriend thought it was. Like, he seemed to think it was pretty essential and that if I don’t use smart phones I just wouldn’t be able to develop enough empathy to build a good product. I, on the other hand, saw it as something “useful” but not “essential”. Looking back, I think I’d downgrade it to something like “a little useful” instead of “useful”.
I’m not sure where I’m going with this, exactly. Just kinda reflecting and thinking out loud.
Conditional on LLMs scaling to AGI, I feel like it’s a contradiction to say that “LLMs offer little or negative utility AND it’s going to stay this way”. My model is that we are either dying in a couple years to LLMs getting us to AGI, and we are going to have a year or two or of AIs that can provide incredible utility, or we are not dying to LLMs and the timelines are longer.
I think I read somewhere that you don’t believe LLMs will get us to AGI, so this might already be implicit in your model? I personally am putting at least some credence on the ai-2027 model, which predicts superhuman coders in the near future. (Not saying that I believe this is the most probable future, just that I find it convincing enough that I want to be prepared for it.)
Up until recently I was in the “LLMs offer zero utility” camp (for coding), but now at work we have a Cursor plan (still would not pay for it for personal use probably), and with a lot of trial and error I feel like I am finding the kinds of tasks where AIs can offer a bit of utility, and I am slowly moving towards the “marginal utility” camp.
One kind of thing I like using it for is small scripts to automate bits of my workflow. E.g. I have an idea for a script, I know it would take me 30m-1h to implement it, but it’s not worth it because e.g. it would only save me a few seconds each time. But if I can reduce the time investment to only a few minutes by giving the task to the LLM, it can suddenly be worth it.
I would be interested in other people’s experiences with the negative side effects of LLM use. What are the symptoms/warning signs of “LLM brain rot”? I feel like with my current use I am relatively well equipped to avoid that:
I only ask things from LLMs that I know I could solve in a few hours tops.
I code review the result, tell it if it did something stupid.
90% of my job is stuff that is currently not close to being LLM automatable anyway.