Extracting sections from books and reformatting them is what I happen to be trying to do right now and it sucks. I think you might be confused that an LLM plus a LW user piloting it patiently is in fact an agi.
I’m interested in what setup exactly you’re using (I think using Claude 4.6 in Cursor is noticeably smarter than me calling it from other contexts. I think the Cursor harness and (probably, haven’t checked) Claude Code are particularly good).
(to be clear, the thing that feels like “AGI” is the LLM + harness, not the LLM by itself)
Hmm, interesting, I’d expect it to work better for you. I kind of wonder if there’s something about your prompting that’s not working, or if your tasks are too far outside looking like software engineering or other tasks it’s been trained on.
For context, I have several hundred lines of instructions I hand it, plus a prompt of a few hundred words for what I want, and now it can one-shot many software tasks and 80% a great many more. The only place that still really requires lots of intense human-AI iteration are tasks that exceed what it can reason about within the context window, like large refactors, that are often hard to break down into smaller chunks.
But the crux of my feelings come from how good it’s getting at looping. It does often take a bit of work to get the right initial prompt, but then it can iterate for hours or even days if you have the tokens on tasks that have clear, measurable objectives, which is why I am sad to say we now have functional paperclip maximizers, even if they are, for now, easily defeated.
I’m expecting some things to work better once I have a separate computer running openclaw or something. It’s all so annoyingly fiddly so I’d figure I’d wait a couple months for people to improve that.
I doubt you need that at all, Claude Code CLI or Codex CLI and you’re most of the way there.
Based on your other comment saying 3.1 I’m wondering whether or not you’re using Claude/ChatGPT rather than Gemini? Gemini 3.0 at least was notably behind both of them, and while Gemini 3.1 has improved it still seems to struggle in comparison.
Extracting sections from books in my experience works pretty well- the main way they’ll ever choke on that is if they decide to read a 200page pdf to context because they lack knowledge of their own limits at digesting that. Tell them to convert it to text if they don’t do that themselves?
Extracting sections from books and reformatting them is what I happen to be trying to do right now and it sucks. I think you might be confused that an LLM plus a LW user piloting it patiently is in fact an agi.
I’m interested in what setup exactly you’re using (I think using Claude 4.6 in Cursor is noticeably smarter than me calling it from other contexts. I think the Cursor harness and (probably, haven’t checked) Claude Code are particularly good).
(to be clear, the thing that feels like “AGI” is the LLM + harness, not the LLM by itself)
3.1 in aistudio today. Will retry cursor soon.
Hmm, interesting, I’d expect it to work better for you. I kind of wonder if there’s something about your prompting that’s not working, or if your tasks are too far outside looking like software engineering or other tasks it’s been trained on.
For context, I have several hundred lines of instructions I hand it, plus a prompt of a few hundred words for what I want, and now it can one-shot many software tasks and 80% a great many more. The only place that still really requires lots of intense human-AI iteration are tasks that exceed what it can reason about within the context window, like large refactors, that are often hard to break down into smaller chunks.
But the crux of my feelings come from how good it’s getting at looping. It does often take a bit of work to get the right initial prompt, but then it can iterate for hours or even days if you have the tokens on tasks that have clear, measurable objectives, which is why I am sad to say we now have functional paperclip maximizers, even if they are, for now, easily defeated.
I’m expecting some things to work better once I have a separate computer running openclaw or something. It’s all so annoyingly fiddly so I’d figure I’d wait a couple months for people to improve that.
I doubt you need that at all, Claude Code CLI or Codex CLI and you’re most of the way there. Based on your other comment saying 3.1 I’m wondering whether or not you’re using Claude/ChatGPT rather than Gemini? Gemini 3.0 at least was notably behind both of them, and while Gemini 3.1 has improved it still seems to struggle in comparison.
Extracting sections from books in my experience works pretty well- the main way they’ll ever choke on that is if they decide to read a 200page pdf to context because they lack knowledge of their own limits at digesting that. Tell them to convert it to text if they don’t do that themselves?