Well, depends on the job, I suppose. I did read your post on the topic, and I’m guessing it indeed makes much more sense in the context of automating parts of a company, with lots of time-consuming but boilerplate-y tasks.
As someone doing math/conceptual research, I don’t currently see much potential there. I can imagine stuff that would be useful for me, e. g.:
Systems that would reduce the time needed to assemble the context for getting LLMs’ help with research/brainstorming tasks.
Systems that would remove the friction in getting LLMs’ assistance with math proofs.
Pipelines for quickly extracting insights from papers en masse.
A custom analogue of OpenAI’s Pulse where an LLM swarm’s context is updated with my latest thoughts regarding what I’m working on and it asynchronously searches the literature 24⁄7 in search of anything helpful.
But none of this would be an equivalent of even a 10h/week productivity boost, I don’t think.
To clarify, being able to speed-read a paper with an LLM or do a literature review using a Deep Research feature is very helpful for me. But this is the “80% of the value that you can get just by using the out-of-the-box tools the obvious way” I was talking about. Stuff on top of that mostly isn’t worth it.
IMO, the correct approach for most people is more along the lines of “try to be passively aware that LLMs exist now, and be constantly on the lookout for things where they could be easily applied for significant benefits”, rather than “spend N hours/week integrating them into your workflows in nontrivial-to-implement ways”.
FWIW, inspired by Justis, I’ve been keeping up a list of things that I could usefully automate with Claude Code (or similar) for my own personal productivity, adding to the list every time something pops into my head. I’ve been adding to the list for the past three weeks. But so far it’s a very underwhelming list! Here’s ~the whole thing:
Custom interface for composing tweet-threads, including their funny formula for counting characters (I have some complaints about the built-in twitter one, e.g. I usually also post them onto bluesky)
…And something similar for clipboard conversion from simple HTML into the abstruse “typst” format that I was using a few weeks ago for a particular project.
One-click way to move certain things to my Trello to-do list, e.g.
LessWrong notifications
Interesting-looking papers or links to read from social media (twitter, slack, discord)
Emails
Anyway, all of these seem like they would save me a pathetically small amount of time, and so I haven’t bothered to install Claude Code yet. But someday the list will be longer, or I will be bored and curious enough to do it regardless.
Meanwhile, I 80/20’d the second one (clipboard normalizer) just using a normal LLM chat interface: Gemini one-shotted a nice HTML + javascript solution that I stored locally and bookmarked. It adds an extra couple seconds compared to an app or chrome extension, but whatever, I don’t use it that often anyway.
I’ll keep brainstorming, but I dunno, I really don’t seem to do much that can be automated at all, and that I haven’t already automated years ago in the old-fashioned way (e.g. I have long had automatic file backups, automatic credit card payments, automatic bank transfers, automatic citation downloading, etc.)
But none of this would be an equivalent of even a 10h/week productivity boost, I don’t think.
To be clear, I think it’s worth spending 10h/week even if you expect to get less than 10h/week in productivity boost right now, because it does take a while to get good at using these systems, and my guess is there will be a future where these things will be very helpful for almost everyone, and skill will translate non-trivially.
spend N hours/week integrating them into your workflows in nontrivial-to-implement ways
I currently disagree. In my experience you do actually experience substantial downlift for a while, and it is worth getting good at having that not happen to you.
I think it’s worth spending 10h/week even if you expect to get less than 10h/week in productivity boost right now, because it does take a while to get good at using these systems
I am aware of this argument. Counterpoint: models get increasingly easier to use as they get more powerful – better at inferring your intent, not subject to entire classes of failure modes plaguing earlier generations, etc. – so the skills you’ll learn by painstakingly wrangling current LLMs will end up obsoleted by subsequent generation.
Like, inasmuch as one buys that LLMs are on the trajectory to becoming absurdly powerful, one should not expect to need to develop intricate skillsets for squeezing value out of them. You’re not gonna need to prompt-engineer AGIs and invent custom scaffolds for them, they will build the scaffolds for themselves and your cleverest prompts will be as effective as “just talk to them the obvious way”. (Same for ad-hoc continuous-memory setups and context-management hacks et cetera: if the AGI labs crack architectural continuous learning, it’ll all be obsoleted overnight.)
On the other hand, inasmuch as you don’t believe that LLMs are going to be getting increasingly easier to use, you essentially don’t believe that they’re on the trajectory to become absurdly powerful AGIs. If so, you should downgrade your expectation of how much value their future generations will bring you, and accordingly downgrade how much you should be investing in them now.
Oh, by the way: I saw you saying that you’re observing much more software downstream of LLMs. Any chance you can elaborate on that, provide some examples? This is the sort of thing I’m very interested in tracking, and high-quality information sources are hard to come by.
It’s clear to me that the product velocity of things like Cursor, Claude Code and Codex is much higher than I’ve seen for basically any other product. This is what I meant by saying most of the software I’ve seen has been for software developers themselves.
We are now starting to see this trickle out. Internally at Lightcone more of my staff can now build software solutions to problems where they previously needed support from a software engineer (a random example of this is building Airtable automations with script blocks). My guess is if you surveyed Hacker News you would also see that more things on there are small applications that someone built that previously would have taken prohibitively long to build. This is a random example of one such project: https://www.ismypubfucked.com/
The improvements in thinking quality of the models doesn’t address one of the main causes of downlift, which is the breaking up of deep work by regularly (and sometimes surprisingly) having 1-10 min periods where you are no longer able to do productive work because the LLM is executing a task, and so you lose cognitive context, and tend toward shallower decision-making. This is something that continues to plague me, often causing me to waste a lot of time (both in the individual chunks and when summing my decision-making over a day).
Not convinced this isn’t a temporary artefact of the current time horizons. Like, in the future, I think it’s plausible that the two categories of tasks you’d be delegating would be either (a) the sort of shallow tasks the future models would be able to complete instantly, or (b) the sort of deep tasks that’d take future models hours to complete.
Fair enough, though, maybe this counts. But is there really a rich suite of skills like that, and would they really take that long to learn by the time learning them does become immediately net-positive?
I think it’s fairly likely I need to re-orient my entire workflow around constantly (but somewhat surprisingly) having heavy-tail distributions of time where I can’t do productive work on my main work. This is not a small deal. I suspect that many people will deal with it very differently.
Here are some possible responses:
Build a practice of having multiple parallel LLM projects you can work on simultaneously (I have not found this cognitively trivial)
Build up a backlog of simple low-context tasks you can do, and figure out how to turn your lower-importance work into that kind of task
Learn how to identify tasks that aren’t worth it because of the downlift, even though you know an AI could do it.
The first two really sound quite complex, and the third sounds genuinely hard. I suspect other people will find other solutions...
Well, depends on the job, I suppose. I did read your post on the topic, and I’m guessing it indeed makes much more sense in the context of automating parts of a company, with lots of time-consuming but boilerplate-y tasks.
As someone doing math/conceptual research, I don’t currently see much potential there. I can imagine stuff that would be useful for me, e. g.:
Systems that would reduce the time needed to assemble the context for getting LLMs’ help with research/brainstorming tasks.
Systems that would remove the friction in getting LLMs’ assistance with math proofs.
Pipelines for quickly extracting insights from papers en masse.
A custom analogue of OpenAI’s Pulse where an LLM swarm’s context is updated with my latest thoughts regarding what I’m working on and it asynchronously searches the literature 24⁄7 in search of anything helpful.
Some sort of “exploratory medium for mathematics”.
But none of this would be an equivalent of even a 10h/week productivity boost, I don’t think.
To clarify, being able to speed-read a paper with an LLM or do a literature review using a Deep Research feature is very helpful for me. But this is the “80% of the value that you can get just by using the out-of-the-box tools the obvious way” I was talking about. Stuff on top of that mostly isn’t worth it.
IMO, the correct approach for most people is more along the lines of “try to be passively aware that LLMs exist now, and be constantly on the lookout for things where they could be easily applied for significant benefits”, rather than “spend N hours/week integrating them into your workflows in nontrivial-to-implement ways”.
FWIW, inspired by Justis, I’ve been keeping up a list of things that I could usefully automate with Claude Code (or similar) for my own personal productivity, adding to the list every time something pops into my head. I’ve been adding to the list for the past three weeks. But so far it’s a very underwhelming list! Here’s ~the whole thing:
Custom interface for composing tweet-threads, including their funny formula for counting characters (I have some complaints about the built-in twitter one, e.g. I usually also post them onto bluesky)
Jeff’s “clipboard normalizer” (but I have a PC not Mac)
…And something similar for clipboard conversion from simple HTML into the abstruse “typst” format that I was using a few weeks ago for a particular project.
One-click way to move certain things to my Trello to-do list, e.g.
LessWrong notifications
Interesting-looking papers or links to read from social media (twitter, slack, discord)
Emails
Anyway, all of these seem like they would save me a pathetically small amount of time, and so I haven’t bothered to install Claude Code yet. But someday the list will be longer, or I will be bored and curious enough to do it regardless.
Meanwhile, I 80/20’d the second one (clipboard normalizer) just using a normal LLM chat interface: Gemini one-shotted a nice HTML + javascript solution that I stored locally and bookmarked. It adds an extra couple seconds compared to an app or chrome extension, but whatever, I don’t use it that often anyway.
I’ll keep brainstorming, but I dunno, I really don’t seem to do much that can be automated at all, and that I haven’t already automated years ago in the old-fashioned way (e.g. I have long had automatic file backups, automatic credit card payments, automatic bank transfers, automatic citation downloading, etc.)
To be clear, I think it’s worth spending 10h/week even if you expect to get less than 10h/week in productivity boost right now, because it does take a while to get good at using these systems, and my guess is there will be a future where these things will be very helpful for almost everyone, and skill will translate non-trivially.
I currently disagree. In my experience you do actually experience substantial downlift for a while, and it is worth getting good at having that not happen to you.
I am aware of this argument. Counterpoint: models get increasingly easier to use as they get more powerful – better at inferring your intent, not subject to entire classes of failure modes plaguing earlier generations, etc. – so the skills you’ll learn by painstakingly wrangling current LLMs will end up obsoleted by subsequent generation.
Like, inasmuch as one buys that LLMs are on the trajectory to becoming absurdly powerful, one should not expect to need to develop intricate skillsets for squeezing value out of them. You’re not gonna need to prompt-engineer AGIs and invent custom scaffolds for them, they will build the scaffolds for themselves and your cleverest prompts will be as effective as “just talk to them the obvious way”. (Same for ad-hoc continuous-memory setups and context-management hacks et cetera: if the AGI labs crack architectural continuous learning, it’ll all be obsoleted overnight.)
On the other hand, inasmuch as you don’t believe that LLMs are going to be getting increasingly easier to use, you essentially don’t believe that they’re on the trajectory to become absurdly powerful AGIs. If so, you should downgrade your expectation of how much value their future generations will bring you, and accordingly downgrade how much you should be investing in them now.
Oh, by the way: I saw you saying that you’re observing much more software downstream of LLMs. Any chance you can elaborate on that, provide some examples? This is the sort of thing I’m very interested in tracking, and high-quality information sources are hard to come by.
It’s clear to me that the product velocity of things like Cursor, Claude Code and Codex is much higher than I’ve seen for basically any other product. This is what I meant by saying most of the software I’ve seen has been for software developers themselves.
We are now starting to see this trickle out. Internally at Lightcone more of my staff can now build software solutions to problems where they previously needed support from a software engineer (a random example of this is building Airtable automations with script blocks). My guess is if you surveyed Hacker News you would also see that more things on there are small applications that someone built that previously would have taken prohibitively long to build. This is a random example of one such project: https://www.ismypubfucked.com/
The improvements in thinking quality of the models doesn’t address one of the main causes of downlift, which is the breaking up of deep work by regularly (and sometimes surprisingly) having 1-10 min periods where you are no longer able to do productive work because the LLM is executing a task, and so you lose cognitive context, and tend toward shallower decision-making. This is something that continues to plague me, often causing me to waste a lot of time (both in the individual chunks and when summing my decision-making over a day).
Not convinced this isn’t a temporary artefact of the current time horizons. Like, in the future, I think it’s plausible that the two categories of tasks you’d be delegating would be either (a) the sort of shallow tasks the future models would be able to complete instantly, or (b) the sort of deep tasks that’d take future models hours to complete.
Fair enough, though, maybe this counts. But is there really a rich suite of skills like that, and would they really take that long to learn by the time learning them does become immediately net-positive?
I think it’s fairly likely I need to re-orient my entire workflow around constantly (but somewhat surprisingly) having heavy-tail distributions of time where I can’t do productive work on my main work. This is not a small deal. I suspect that many people will deal with it very differently.
Here are some possible responses:
Build a practice of having multiple parallel LLM projects you can work on simultaneously (I have not found this cognitively trivial)
Build up a backlog of simple low-context tasks you can do, and figure out how to turn your lower-importance work into that kind of task
Learn how to identify tasks that aren’t worth it because of the downlift, even though you know an AI could do it.
The first two really sound quite complex, and the third sounds genuinely hard. I suspect other people will find other solutions...