Current frontier reasoning models can consistently suggest slightly obscure papers and books vaguely related to individual somewhat out-of-context short decision theory research notes (with theoretical computer science flavor; the notes have some undefined-there terms, even if suggestively named, and depend on unexplained-there ideas). This year the titles and authors are mostly real or almost-correct-enough that the real works they refer to can still be found, and the suggested books and papers are relevant enough that skimming some of them actually helps with meditating on the topic of the specific research note (ends up inspiring some direction to explore or study that would’ve been harder to come up with this quickly without going through these books and papers).
Works with o3 and gemini-2.5-pro, previously almost worked with sonnet-3.7-thinking, but not as well, essentially doesn’t work even with opus-4 when in non-thinking mode (I don’t have access to the thinking opus-4). Curious that it works for decision theory with o3, despite o3 consistently going completely off the rails whenever I show it my AI hardware/compute forecasting notes (even when not asked to, it starts inventing detailed but essentially random “predictions” of its own that seem to be calibrated to be about as surprising to o3 as my predictions in my note would be surprising to o3, given that I’m relying on not-shown-there news and papers in making my predictions that aren’t in o3′s prior).
Compile reports that accurately cite and quote dozens of sources (e.g. a historical list of policy objectives in SOTA policy optimization methods over the last few decades).
Maybe the LLMs of a year ago could have done that with sufficient scaffolding, but I didn’t have access to or write my own version of such scaffolding, so that capability was not practically available to me last year.
What is the most impressive thing an LLM has do for you recently that you don’t think it could have done last year?
Current frontier reasoning models can consistently suggest slightly obscure papers and books vaguely related to individual somewhat out-of-context short decision theory research notes (with theoretical computer science flavor; the notes have some undefined-there terms, even if suggestively named, and depend on unexplained-there ideas). This year the titles and authors are mostly real or almost-correct-enough that the real works they refer to can still be found, and the suggested books and papers are relevant enough that skimming some of them actually helps with meditating on the topic of the specific research note (ends up inspiring some direction to explore or study that would’ve been harder to come up with this quickly without going through these books and papers).
Works with o3 and gemini-2.5-pro, previously almost worked with sonnet-3.7-thinking, but not as well, essentially doesn’t work even with opus-4 when in non-thinking mode (I don’t have access to the thinking opus-4). Curious that it works for decision theory with o3, despite o3 consistently going completely off the rails whenever I show it my AI hardware/compute forecasting notes (even when not asked to, it starts inventing detailed but essentially random “predictions” of its own that seem to be calibrated to be about as surprising to o3 as my predictions in my note would be surprising to o3, given that I’m relying on not-shown-there news and papers in making my predictions that aren’t in o3′s prior).
Potentially saved my life by convincing me to rush to the hospital when I wasn’t sure I needed to.
Substantial gains in coding ability. Was only barely useful last year, now hugely useful.
Compile reports that accurately cite and quote dozens of sources (e.g. a historical list of policy objectives in SOTA policy optimization methods over the last few decades).
Maybe the LLMs of a year ago could have done that with sufficient scaffolding, but I didn’t have access to or write my own version of such scaffolding, so that capability was not practically available to me last year.
Fixed a foot problem my physical therapist whiffed on, and no.
Communicated in Slovak almost correctly (about one mistake per screen of text).