I’ve also spent a lot of time trying to delegate research, and it’s hard to reach the break-even point where you’re spending less time explaining what you want and how to search for it, even to a bright student in the field, than just doing it yourself.
I think any proper comparison of LLMs has to take into account prompting strategies. There are two questions: which is easiest to get results from, and which is best once you learn to use it. And the best prompts very likely vary across systems.
OpenAI’s Deep Research is by far the most use I’ve gotten from an LLM system for any purpose thus far. I tried Gemini’s deep research and it was barely useful; it just didn’t have enough flexibility, and was limited to a short prompt.
I don’t think I’ve nearly capped out learning to prompt it. I have started iterating on searches to sort of chain together multiple OAI Deep Research runs. You can coax it (I can’t bring myself to bully but it probably works at least as well) into providing more sources; my last one was 51 sources so it’s not strictly capped at 50?
I doubt there’s much difference in quality of research if you just want a list of sources to go read yourself—which most often I do. But the OAI version (which is indeed based on a fine-tuned version of the to-be-released “smartest” model yet made, o3) is very good at synthesizing research. So on things where I just want a good-enough answer with no further effort on my own part, it’s very good. This has helped me immensely to answer some questions I just haven’t made time for because they aren’t in my own research area, like why we don’t have online micropayments in common use or what’s a good noise-canceling bluetooth mic. It’s really nice to have good-enough answers to questions very quickly, and it can unblock people from doing the research that would improve their lives on how to improve pain points in their lives.
I can’t imagine paying $200/month for it perpetually, I’ll probably go back to using it sparingly for those situations and cheaper versions for just searching for sources to read for research. But having essentially unlimited decent-quality research reports available has been a game-changer for me. As we get better at using these things, I hope the improve our general wisdom level—it becomes easier to understand relevant-but-not-crucial current topics much more quickly.
This is really useful, thanks!
I’ve also spent a lot of time trying to delegate research, and it’s hard to reach the break-even point where you’re spending less time explaining what you want and how to search for it, even to a bright student in the field, than just doing it yourself.
I think any proper comparison of LLMs has to take into account prompting strategies. There are two questions: which is easiest to get results from, and which is best once you learn to use it. And the best prompts very likely vary across systems.
OpenAI’s Deep Research is by far the most use I’ve gotten from an LLM system for any purpose thus far. I tried Gemini’s deep research and it was barely useful; it just didn’t have enough flexibility, and was limited to a short prompt.
I don’t think I’ve nearly capped out learning to prompt it. I have started iterating on searches to sort of chain together multiple OAI Deep Research runs. You can coax it (I can’t bring myself to bully but it probably works at least as well) into providing more sources; my last one was 51 sources so it’s not strictly capped at 50?
I doubt there’s much difference in quality of research if you just want a list of sources to go read yourself—which most often I do. But the OAI version (which is indeed based on a fine-tuned version of the to-be-released “smartest” model yet made, o3) is very good at synthesizing research. So on things where I just want a good-enough answer with no further effort on my own part, it’s very good. This has helped me immensely to answer some questions I just haven’t made time for because they aren’t in my own research area, like why we don’t have online micropayments in common use or what’s a good noise-canceling bluetooth mic. It’s really nice to have good-enough answers to questions very quickly, and it can unblock people from doing the research that would improve their lives on how to improve pain points in their lives.
I can’t imagine paying $200/month for it perpetually, I’ll probably go back to using it sparingly for those situations and cheaper versions for just searching for sources to read for research. But having essentially unlimited decent-quality research reports available has been a game-changer for me. As we get better at using these things, I hope the improve our general wisdom level—it becomes easier to understand relevant-but-not-crucial current topics much more quickly.