Model to track: You get 80% of the current max value LLMs could provide you from standard-issue chat models and any decent out-of-the-box coding agent, both prompted the obvious way. Trying to get the remaining 20% that are locked behind figuring out agent swarms, optimizing your prompts, setting up ad-hoc continuous-memory setups, doing comparative analyses of different frontier models’ performance on your tasks, inventing new galaxy-brained workflows, writing custom software, et cetera, would not be worth it: it would take too long for too little payoff.
There is an “LLMs for productivity!” memeplex that is trying to turn people into its hosts by fostering FOMO in those who are not investing tons of their time into tinkering with LLMs. You should ignore it. At best it would waste your time; at worst it would corrupt your priorities, convincing you that you should reorient your life around “optimizing your Claude Code setup” or writing productivity apps for yourself. LW regulars may be especially vulnerable to it: we know that AI is going to become absurdly powerful sooner or later, so it takes relatively little to sell to us the idea that it already is absurdly powerful – which may or may not be currently being exploited by analogues of crypto grifters.
(Not to say you mustn’t be tinkering with LLMs and vibe-coding custom software, especially if you’re having fun! But you should perhaps approach it in the spirit of a hobby, rather than the thing you should be doing.)
Well, at least, that’s my takeaway from watching the current ideatic ecosystem around LLMs and trying that stuff for myself (one, two, three). I do have tons of ideas about custom software that perhaps could 1.1x my productivity… but it’s too complex for the LLMs of today to vibe-code in a truly hands-off manner, and is not worth the time otherwise. Maybe in six more months.
Obviously “reverse any advice you hear” and “Thane has terminal skill issues and this post is sour grapes” may or may not apply. (Though, of course, “you have skill issues if you haven’t figured out how to 10x your productivity using LLMs, you must keep trying or you’ll be left behind in the permanent underclass!!!” is the standard recruitment pitch of the aforementioned memeplex.)
I think I directionally disagree with this for most people? My guess is the average person on LW should be spending around 10 hours a week trying to figure out how to automate themselves or other parts of their job using LLMs. It seems to me to be where most of the edge is in terms of increasing productivity and impact for most people (though of course not everyone).
Well, depends on the job, I suppose. I did read your post on the topic, and I’m guessing it indeed makes much more sense in the context of automating parts of a company, with lots of time-consuming but boilerplate-y tasks.
As someone doing math/conceptual research, I don’t currently see much potential there. I can imagine stuff that would be useful for me, e. g.:
Systems that would reduce the time needed to assemble the context for getting LLMs’ help with research/brainstorming tasks.
Systems that would remove the friction in getting LLMs’ assistance with math proofs.
Pipelines for quickly extracting insights from papers en masse.
A custom analogue of OpenAI’s Pulse where an LLM swarm’s context is updated with my latest thoughts regarding what I’m working on and it asynchronously searches the literature 24⁄7 in search of anything helpful.
But none of this would be an equivalent of even a 10h/week productivity boost, I don’t think.
To clarify, being able to speed-read a paper with an LLM or do a literature review using a Deep Research feature is very helpful for me. But this is the “80% of the value that you can get just by using the out-of-the-box tools the obvious way” I was talking about. Stuff on top of that mostly isn’t worth it.
IMO, the correct approach for most people is more along the lines of “try to be passively aware that LLMs exist now, and be constantly on the lookout for things where they could be easily applied for significant benefits”, rather than “spend N hours/week integrating them into your workflows in nontrivial-to-implement ways”.
FWIW, inspired by Justis, I’ve been keeping up a list of things that I could usefully automate with Claude Code (or similar) for my own personal productivity, adding to the list every time something pops into my head. I’ve been adding to the list for the past three weeks. But so far it’s a very underwhelming list! Here’s ~the whole thing:
Custom interface for composing tweet-threads, including their funny formula for counting characters (I have some complaints about the built-in twitter one, e.g. I usually also post them onto bluesky)
…And something similar for clipboard conversion from simple HTML into the abstruse “typst” format that I was using a few weeks ago for a particular project.
One-click way to move certain things to my Trello to-do list, e.g.
LessWrong notifications
Interesting-looking papers or links to read from social media (twitter, slack, discord)
Emails
Anyway, all of these seem like they would save me a pathetically small amount of time, and so I haven’t bothered to install Claude Code yet. But someday the list will be longer, or I will be bored and curious enough to do it regardless.
Meanwhile, I 80/20’d the second one (clipboard normalizer) just using a normal LLM chat interface: Gemini one-shotted a nice HTML + javascript solution that I stored locally and bookmarked. It adds an extra couple seconds compared to an app or chrome extension, but whatever, I don’t use it that often anyway.
I’ll keep brainstorming, but I dunno, I really don’t seem to do much that can be automated at all, and that I haven’t already automated years ago in the old-fashioned way (e.g. I have long had automatic file backups, automatic credit card payments, automatic bank transfers, automatic citation downloading, etc.)
But none of this would be an equivalent of even a 10h/week productivity boost, I don’t think.
To be clear, I think it’s worth spending 10h/week even if you expect to get less than 10h/week in productivity boost right now, because it does take a while to get good at using these systems, and my guess is there will be a future where these things will be very helpful for almost everyone, and skill will translate non-trivially.
spend N hours/week integrating them into your workflows in nontrivial-to-implement ways
I currently disagree. In my experience you do actually experience substantial downlift for a while, and it is worth getting good at having that not happen to you.
I think it’s worth spending 10h/week even if you expect to get less than 10h/week in productivity boost right now, because it does take a while to get good at using these systems
I am aware of this argument. Counterpoint: models get increasingly easier to use as they get more powerful – better at inferring your intent, not subject to entire classes of failure modes plaguing earlier generations, etc. – so the skills you’ll learn by painstakingly wrangling current LLMs will end up obsoleted by subsequent generation.
Like, inasmuch as one buys that LLMs are on the trajectory to becoming absurdly powerful, one should not expect to need to develop intricate skillsets for squeezing value out of them. You’re not gonna need to prompt-engineer AGIs and invent custom scaffolds for them, they will build the scaffolds for themselves and your cleverest prompts will be as effective as “just talk to them the obvious way”. (Same for ad-hoc continuous-memory setups and context-management hacks et cetera: if the AGI labs crack architectural continuous learning, it’ll all be obsoleted overnight.)
On the other hand, inasmuch as you don’t believe that LLMs are going to be getting increasingly easier to use, you essentially don’t believe that they’re on the trajectory to become absurdly powerful AGIs. If so, you should downgrade your expectation of how much value their future generations will bring you, and accordingly downgrade how much you should be investing in them now.
Oh, by the way: I saw you saying that you’re observing much more software downstream of LLMs. Any chance you can elaborate on that, provide some examples? This is the sort of thing I’m very interested in tracking, and high-quality information sources are hard to come by.
It’s clear to me that the product velocity of things like Cursor, Claude Code and Codex is much higher than I’ve seen for basically any other product. This is what I meant by saying most of the software I’ve seen has been for software developers themselves.
We are now starting to see this trickle out. Internally at Lightcone more of my staff can now build software solutions to problems where they previously needed support from a software engineer (a random example of this is building Airtable automations with script blocks). My guess is if you surveyed Hacker News you would also see that more things on there are small applications that someone built that previously would have taken prohibitively long to build. This is a random example of one such project: https://www.ismypubfucked.com/
The improvements in thinking quality of the models doesn’t address one of the main causes of downlift, which is the breaking up of deep work by regularly (and sometimes surprisingly) having 1-10 min periods where you are no longer able to do productive work because the LLM is executing a task, and so you lose cognitive context, and tend toward shallower decision-making. This is something that continues to plague me, often causing me to waste a lot of time (both in the individual chunks and when summing my decision-making over a day).
Not convinced this isn’t a temporary artefact of the current time horizons. Like, in the future, I think it’s plausible that the two categories of tasks you’d be delegating would be either (a) the sort of shallow tasks the future models would be able to complete instantly, or (b) the sort of deep tasks that’d take future models hours to complete.
Fair enough, though, maybe this counts. But is there really a rich suite of skills like that, and would they really take that long to learn by the time learning them does become immediately net-positive?
I think it’s fairly likely I need to re-orient my entire workflow around constantly (but somewhat surprisingly) having heavy-tail distributions of time where I can’t do productive work on my main work. This is not a small deal. I suspect that many people will deal with it very differently.
Here are some possible responses:
Build a practice of having multiple parallel LLM projects you can work on simultaneously (I have not found this cognitively trivial)
Build up a backlog of simple low-context tasks you can do, and figure out how to turn your lower-importance work into that kind of task
Learn how to identify tasks that aren’t worth it because of the downlift, even though you know an AI could do it.
The first two really sound quite complex, and the third sounds genuinely hard. I suspect other people will find other solutions...
My guess is the average person on LW should be spending around 10 hours a week trying to figure out how to automate themselves or other parts of their job using LLMs.
Yeah. I am nowhere near doing this systematically, but I noticed that whatever I am doing, it makes sense to ask “could I use an LLM to help me with this?” That includes even things like reading Reddit—now the LLM could read it for me, and just give me a summary. (I haven’t tried this yet.)
It is even worth revisiting the old (pre-LLM) question of “could I automate this using a shell/Python script?”, because LLM makes creating such scripts much cheaper.
Like, if in the past the balance was like “it takes me one hour to do it by hand, and it would also take nontrivial time to write the script, plus I might find out in the middle of it that the situation is more difficult than I thought and there are some exceptions, or I might end up exploring some rabbit hole… so all things considered it’s probably faster doing it by hand”, these days making the script sometimes only takes as much time as you need to verbally describe the intended functionality.
datapoint: this was my exact argument for not learning to vibecode (after working as a programmer for 10 years and quitting 9 years ago). Last month was when (I noticed that) vibecoding (had) crossed the threshold where it quickly paid off the time I put into it, and that was with private tutoring from someone who’d been on the cutting edge for >1 a year.
I’m not sure if this supports your argument (because I do think any time I put into learning to vibecode before the recent transition would have been wasted) or counters (because this is the month things transitioned).
I lean towards this, despite being a guy currently heavily invested in AI tools.
Cluster of things that all seem true to me:
I am 100% addicted to vibecoding in a straightforward “this is fun and dopamine inducing way”, which is making it hard to reason about.
As fun hobbies go, it does seem fine, all else equal
You’re broadly right about the 80%/20% rule, and microoptimizations not being worth it.
But:
There are some infrastructure that probably will make sense to have as the AIs scale up, which AI companies either won’t provide, or, you probably don’t want to trust. (This is a bet, I’m not that confident)
Learning to wield AI is going to be important (for at least many people. I think it’s more straightforwardly important for software engineers than theoretic researchers).
Some of that infrastructure is stuff that AI can basically one shot. I think the skill/habit to cultivate here is “check if it 1 or 2 shots it. If not, bail.”
I am 100% addicted to vibecoding in a straightforward “this is fun and dopamine inducing way”
I’m curious, how does that work? What mindset are you approaching it from? What sorts of projects (in terms of their… emotional felt-sense, I guess) are you attempting with it?
I think I would like to be able to engage with it as with a hobby, but it’s not been fun for me.
For me it’s like “I type some quick stuff in, and then, like, agency comes out and I get to see stuff get built, and it works great 20% of the time, okay 60%, and fails 20% of the time, but, that produces a kinda skinner-box slot machine element to it.” (to be clear I think the skinner-box bit is bad, the “stuff comes out with little effort” part is great. It’s like jamming with a partner who can do most of all the tedious parts of the work)
My impression from your other posts is that you are mostly just getting a much worse hit rate (because yeah if it’s not really set up to excel in a domain, it’s a lot less workable)
My impression from your other posts is that you are mostly just getting a much worse hit rate
No, the hit rate sounds mostly similar. I think it’s more that I may have unusually strong anti-gacha instincts? Like, if I’m doing something, momentarily reflect on it, and recognize that it’s equivalent to playing a slot machine, this immediately causes negative feelings in me and sours the whole experience. Which I guess is usually a good adaptation to have, but may or may not be be anti-helpful in this specific case.
I think this is incorrect, but that agent swarms etc. are mostly not helpful, and that the large productivity boosts are specific to domains or situations.
Two from my side: Claude Code got much better once I got it successfully working with a REPL (which made the feedback loop much faster, let me inspect the outputs etc.) and once I wrote up a fair bit of documentation on how to use our custom framework.
Edit: I forgot that not everyone works in software. I am much less confident that this applies in other domains today.
An interesting piece of potential evidence in favor of this is that METR time horizons measurements didn’t vary significantly for ChatGPT and Claude models when using a basic scaffold as compared to the specific Claude Code and Codex harnesses.
putting things in the new larger context windows, like the books of authors you respect and having the viewpoints discuss things back and forth between several authors. Helps avoid the powerpoint slop attractor.
learning to prompt better via the dual use of practicing good business writing techniques. Easy to do via the above by putting a couple business writing books in context and then prompting the model to give you exercises that it then grades you on.
Model to track: You get 80% of the current max value LLMs could provide you from standard-issue chat models and any decent out-of-the-box coding agent, both prompted the obvious way. Trying to get the remaining 20% that are locked behind figuring out agent swarms, optimizing your prompts, setting up ad-hoc continuous-memory setups, doing comparative analyses of different frontier models’ performance on your tasks, inventing new galaxy-brained workflows, writing custom software, et cetera, would not be worth it: it would take too long for too little payoff.
There is an “LLMs for productivity!” memeplex that is trying to turn people into its hosts by fostering FOMO in those who are not investing tons of their time into tinkering with LLMs. You should ignore it. At best it would waste your time; at worst it would corrupt your priorities, convincing you that you should reorient your life around “optimizing your Claude Code setup” or writing productivity apps for yourself. LW regulars may be especially vulnerable to it: we know that AI is going to become absurdly powerful sooner or later, so it takes relatively little to sell to us the idea that it already is absurdly powerful – which may or may not be currently being exploited by analogues of crypto grifters.
(Not to say you mustn’t be tinkering with LLMs and vibe-coding custom software, especially if you’re having fun! But you should perhaps approach it in the spirit of a hobby, rather than the thing you should be doing.)
Well, at least, that’s my takeaway from watching the current ideatic ecosystem around LLMs and trying that stuff for myself (one, two, three). I do have tons of ideas about custom software that perhaps could 1.1x my productivity… but it’s too complex for the LLMs of today to vibe-code in a truly hands-off manner, and is not worth the time otherwise. Maybe in six more months.
Obviously “reverse any advice you hear” and “Thane has terminal skill issues and this post is sour grapes” may or may not apply. (Though, of course, “you have skill issues if you haven’t figured out how to 10x your productivity using LLMs, you must keep trying or you’ll be left behind in the permanent underclass!!!” is the standard recruitment pitch of the aforementioned memeplex.)
I think I directionally disagree with this for most people? My guess is the average person on LW should be spending around 10 hours a week trying to figure out how to automate themselves or other parts of their job using LLMs. It seems to me to be where most of the edge is in terms of increasing productivity and impact for most people (though of course not everyone).
Well, depends on the job, I suppose. I did read your post on the topic, and I’m guessing it indeed makes much more sense in the context of automating parts of a company, with lots of time-consuming but boilerplate-y tasks.
As someone doing math/conceptual research, I don’t currently see much potential there. I can imagine stuff that would be useful for me, e. g.:
Systems that would reduce the time needed to assemble the context for getting LLMs’ help with research/brainstorming tasks.
Systems that would remove the friction in getting LLMs’ assistance with math proofs.
Pipelines for quickly extracting insights from papers en masse.
A custom analogue of OpenAI’s Pulse where an LLM swarm’s context is updated with my latest thoughts regarding what I’m working on and it asynchronously searches the literature 24⁄7 in search of anything helpful.
Some sort of “exploratory medium for mathematics”.
But none of this would be an equivalent of even a 10h/week productivity boost, I don’t think.
To clarify, being able to speed-read a paper with an LLM or do a literature review using a Deep Research feature is very helpful for me. But this is the “80% of the value that you can get just by using the out-of-the-box tools the obvious way” I was talking about. Stuff on top of that mostly isn’t worth it.
IMO, the correct approach for most people is more along the lines of “try to be passively aware that LLMs exist now, and be constantly on the lookout for things where they could be easily applied for significant benefits”, rather than “spend N hours/week integrating them into your workflows in nontrivial-to-implement ways”.
FWIW, inspired by Justis, I’ve been keeping up a list of things that I could usefully automate with Claude Code (or similar) for my own personal productivity, adding to the list every time something pops into my head. I’ve been adding to the list for the past three weeks. But so far it’s a very underwhelming list! Here’s ~the whole thing:
Custom interface for composing tweet-threads, including their funny formula for counting characters (I have some complaints about the built-in twitter one, e.g. I usually also post them onto bluesky)
Jeff’s “clipboard normalizer” (but I have a PC not Mac)
…And something similar for clipboard conversion from simple HTML into the abstruse “typst” format that I was using a few weeks ago for a particular project.
One-click way to move certain things to my Trello to-do list, e.g.
LessWrong notifications
Interesting-looking papers or links to read from social media (twitter, slack, discord)
Emails
Anyway, all of these seem like they would save me a pathetically small amount of time, and so I haven’t bothered to install Claude Code yet. But someday the list will be longer, or I will be bored and curious enough to do it regardless.
Meanwhile, I 80/20’d the second one (clipboard normalizer) just using a normal LLM chat interface: Gemini one-shotted a nice HTML + javascript solution that I stored locally and bookmarked. It adds an extra couple seconds compared to an app or chrome extension, but whatever, I don’t use it that often anyway.
I’ll keep brainstorming, but I dunno, I really don’t seem to do much that can be automated at all, and that I haven’t already automated years ago in the old-fashioned way (e.g. I have long had automatic file backups, automatic credit card payments, automatic bank transfers, automatic citation downloading, etc.)
To be clear, I think it’s worth spending 10h/week even if you expect to get less than 10h/week in productivity boost right now, because it does take a while to get good at using these systems, and my guess is there will be a future where these things will be very helpful for almost everyone, and skill will translate non-trivially.
I currently disagree. In my experience you do actually experience substantial downlift for a while, and it is worth getting good at having that not happen to you.
I am aware of this argument. Counterpoint: models get increasingly easier to use as they get more powerful – better at inferring your intent, not subject to entire classes of failure modes plaguing earlier generations, etc. – so the skills you’ll learn by painstakingly wrangling current LLMs will end up obsoleted by subsequent generation.
Like, inasmuch as one buys that LLMs are on the trajectory to becoming absurdly powerful, one should not expect to need to develop intricate skillsets for squeezing value out of them. You’re not gonna need to prompt-engineer AGIs and invent custom scaffolds for them, they will build the scaffolds for themselves and your cleverest prompts will be as effective as “just talk to them the obvious way”. (Same for ad-hoc continuous-memory setups and context-management hacks et cetera: if the AGI labs crack architectural continuous learning, it’ll all be obsoleted overnight.)
On the other hand, inasmuch as you don’t believe that LLMs are going to be getting increasingly easier to use, you essentially don’t believe that they’re on the trajectory to become absurdly powerful AGIs. If so, you should downgrade your expectation of how much value their future generations will bring you, and accordingly downgrade how much you should be investing in them now.
Oh, by the way: I saw you saying that you’re observing much more software downstream of LLMs. Any chance you can elaborate on that, provide some examples? This is the sort of thing I’m very interested in tracking, and high-quality information sources are hard to come by.
It’s clear to me that the product velocity of things like Cursor, Claude Code and Codex is much higher than I’ve seen for basically any other product. This is what I meant by saying most of the software I’ve seen has been for software developers themselves.
We are now starting to see this trickle out. Internally at Lightcone more of my staff can now build software solutions to problems where they previously needed support from a software engineer (a random example of this is building Airtable automations with script blocks). My guess is if you surveyed Hacker News you would also see that more things on there are small applications that someone built that previously would have taken prohibitively long to build. This is a random example of one such project: https://www.ismypubfucked.com/
The improvements in thinking quality of the models doesn’t address one of the main causes of downlift, which is the breaking up of deep work by regularly (and sometimes surprisingly) having 1-10 min periods where you are no longer able to do productive work because the LLM is executing a task, and so you lose cognitive context, and tend toward shallower decision-making. This is something that continues to plague me, often causing me to waste a lot of time (both in the individual chunks and when summing my decision-making over a day).
Not convinced this isn’t a temporary artefact of the current time horizons. Like, in the future, I think it’s plausible that the two categories of tasks you’d be delegating would be either (a) the sort of shallow tasks the future models would be able to complete instantly, or (b) the sort of deep tasks that’d take future models hours to complete.
Fair enough, though, maybe this counts. But is there really a rich suite of skills like that, and would they really take that long to learn by the time learning them does become immediately net-positive?
I think it’s fairly likely I need to re-orient my entire workflow around constantly (but somewhat surprisingly) having heavy-tail distributions of time where I can’t do productive work on my main work. This is not a small deal. I suspect that many people will deal with it very differently.
Here are some possible responses:
Build a practice of having multiple parallel LLM projects you can work on simultaneously (I have not found this cognitively trivial)
Build up a backlog of simple low-context tasks you can do, and figure out how to turn your lower-importance work into that kind of task
Learn how to identify tasks that aren’t worth it because of the downlift, even though you know an AI could do it.
The first two really sound quite complex, and the third sounds genuinely hard. I suspect other people will find other solutions...
Yeah. I am nowhere near doing this systematically, but I noticed that whatever I am doing, it makes sense to ask “could I use an LLM to help me with this?” That includes even things like reading Reddit—now the LLM could read it for me, and just give me a summary. (I haven’t tried this yet.)
It is even worth revisiting the old (pre-LLM) question of “could I automate this using a shell/Python script?”, because LLM makes creating such scripts much cheaper.
Like, if in the past the balance was like “it takes me one hour to do it by hand, and it would also take nontrivial time to write the script, plus I might find out in the middle of it that the situation is more difficult than I thought and there are some exceptions, or I might end up exploring some rabbit hole… so all things considered it’s probably faster doing it by hand”, these days making the script sometimes only takes as much time as you need to verbally describe the intended functionality.
datapoint: this was my exact argument for not learning to vibecode (after working as a programmer for 10 years and quitting 9 years ago). Last month was when (I noticed that) vibecoding (had) crossed the threshold where it quickly paid off the time I put into it, and that was with private tutoring from someone who’d been on the cutting edge for >1 a year.
I’m not sure if this supports your argument (because I do think any time I put into learning to vibecode before the recent transition would have been wasted) or counters (because this is the month things transitioned).
I lean towards this, despite being a guy currently heavily invested in AI tools.
Cluster of things that all seem true to me:
I am 100% addicted to vibecoding in a straightforward “this is fun and dopamine inducing way”, which is making it hard to reason about.
As fun hobbies go, it does seem fine, all else equal
You’re broadly right about the 80%/20% rule, and microoptimizations not being worth it.
But:
There are some infrastructure that probably will make sense to have as the AIs scale up, which AI companies either won’t provide, or, you probably don’t want to trust. (This is a bet, I’m not that confident)
Learning to wield AI is going to be important (for at least many people. I think it’s more straightforwardly important for software engineers than theoretic researchers).
Some of that infrastructure is stuff that AI can basically one shot. I think the skill/habit to cultivate here is “check if it 1 or 2 shots it. If not, bail.”
I’m curious, how does that work? What mindset are you approaching it from? What sorts of projects (in terms of their… emotional felt-sense, I guess) are you attempting with it?
I think I would like to be able to engage with it as with a hobby, but it’s not been fun for me.
For me it’s like “I type some quick stuff in, and then, like, agency comes out and I get to see stuff get built, and it works great 20% of the time, okay 60%, and fails 20% of the time, but, that produces a kinda skinner-box slot machine element to it.” (to be clear I think the skinner-box bit is bad, the “stuff comes out with little effort” part is great. It’s like jamming with a partner who can do most of all the tedious parts of the work)
My impression from your other posts is that you are mostly just getting a much worse hit rate (because yeah if it’s not really set up to excel in a domain, it’s a lot less workable)
Thanks!
No, the hit rate sounds mostly similar. I think it’s more that I may have unusually strong anti-gacha instincts? Like, if I’m doing something, momentarily reflect on it, and recognize that it’s equivalent to playing a slot machine, this immediately causes negative feelings in me and sours the whole experience. Which I guess is usually a good adaptation to have, but may or may not be be anti-helpful in this specific case.
I think this is incorrect, but that agent swarms etc. are mostly not helpful, and that the large productivity boosts are specific to domains or situations.
Two from my side: Claude Code got much better once I got it successfully working with a REPL (which made the feedback loop much faster, let me inspect the outputs etc.) and once I wrote up a fair bit of documentation on how to use our custom framework.
Edit: I forgot that not everyone works in software. I am much less confident that this applies in other domains today.
An interesting piece of potential evidence in favor of this is that METR time horizons measurements didn’t vary significantly for ChatGPT and Claude models when using a basic scaffold as compared to the specific Claude Code and Codex harnesses.
https://metr.org/notes/2026-02-13-measuring-time-horizon-using-claude-code-and-codex/
two useful things:
putting things in the new larger context windows, like the books of authors you respect and having the viewpoints discuss things back and forth between several authors. Helps avoid the powerpoint slop attractor.
learning to prompt better via the dual use of practicing good business writing techniques. Easy to do via the above by putting a couple business writing books in context and then prompting the model to give you exercises that it then grades you on.