I have been working on an Anki clone that uses LLMs to provide feedback for users during studying and assist in card creation. When you submit an answer the LLM is prompted to: rate the answer, clarify any misconceptions and provide a tip for memorization. In card creation you can upload a pdf or image and the LLM will then generate some cards that you can add to your decks. I will definitely use your ideas in the card generation prompt!
Does this seem like a tool that could be useful to people?
I’m a long-time SR user (over 20 years—I started with a manual system of physical flashcards and a notebook). My rate of card creation is very slow to avoid bogging down my repetitions. I add only a few cards (<5) a day, so I think that “cards from a PDF” or web page would not be helpful.
Tools that would be helpful:
A tool that allows me to create cloze deletions from images.
Something trustworthy to automate adding cards from my Duolingo lessons—Duolingo’s main failing is poorly selected repetitions.
I should also add that I don’t use Anki. The UI was awful five years ago (and that’s coming from a former SuperMemo user), and it’s hard to integrate into my daily workflow. Instead, I use Logseq.
In your first point, do you mean something along the lines of: given a card, create some sensible cloze deletions? This seems very doable to me. Duolingo seems harder…
I have also come to realize that the feedback is mostly relevant when the questions have some complexity and require longer answers. This seems to be in conflict with the spirit of this post and maybe SR in general. The question-answer-feedback loop does seem pretty powerful though, but I am not certain how to get the most out of it.
And I will have a look at Logseq for some UI inspiration :)
I have experimented with this a lot and feel like there are two problems with the LLM card creation approach:
I cannot get the LLM to follow the structure properly. It not only messes up the formatting ~50% of the time, but it also tends to create cards that are way too long. Splitting them often results in loss of semantic information. Do you currently have a model + system prompt up and running, so I could test it out?
The creation / refinement process itself is thought to have positive effects on memory formation. This is called the generation effect (Slamenka & Graf [1978] best to read Goldstein’s Cognitive Psychology Chapter 7 for a good overview). I’d say it’s fine to start with LLM generated cards, but the refinement and splitting by hand should not be underestimated.
I’d love to have automatic feedback, though. This could be rather more fun, especially since I usually say my answers out loud anyway
I cannot get the LLM to follow the structure properly. It not only messes up the formatting ~50% of the time, …
The OpenAI API has a structured output feature that would let you constrain the responses. This will fix the formatting (as long as you have a second phase to transform from JSON to Anki).
it also tends to create cards that are way too long. Splitting them often results in loss of semantic information.
Once you have the JSON, use standard programming/light NLP to check for “too long” and resubmit with instructions that “this <example>” is too long, accumulating feedback in the prompt until you get a set of cards that are short enough.
You might even have a final review prompt: “Does this cover all relevant information?” to check the cards a second time before giving them to a human. (It can generate a “these cards are missing … please generate cards with the missing information” prompt to add missing information.)
You’ll still need a final “is this OK?” human review. But that pipeline should substantially decrease the number of “Not OK, please rework” responses.
I have been working on an Anki clone that uses LLMs to provide feedback for users during studying and assist in card creation. When you submit an answer the LLM is prompted to: rate the answer, clarify any misconceptions and provide a tip for memorization. In card creation you can upload a pdf or image and the LLM will then generate some cards that you can add to your decks. I will definitely use your ideas in the card generation prompt!
Does this seem like a tool that could be useful to people?
I’m a long-time SR user (over 20 years—I started with a manual system of physical flashcards and a notebook). My rate of card creation is very slow to avoid bogging down my repetitions. I add only a few cards (<5) a day, so I think that “cards from a PDF” or web page would not be helpful.
Tools that would be helpful:
A tool that allows me to create cloze deletions from images.
Something trustworthy to automate adding cards from my Duolingo lessons—Duolingo’s main failing is poorly selected repetitions.
I should also add that I don’t use Anki. The UI was awful five years ago (and that’s coming from a former SuperMemo user), and it’s hard to integrate into my daily workflow. Instead, I use Logseq.
In your first point, do you mean something along the lines of: given a card, create some sensible cloze deletions? This seems very doable to me. Duolingo seems harder…
I have also come to realize that the feedback is mostly relevant when the questions have some complexity and require longer answers. This seems to be in conflict with the spirit of this post and maybe SR in general. The question-answer-feedback loop does seem pretty powerful though, but I am not certain how to get the most out of it.
And I will have a look at Logseq for some UI inspiration :)
I have experimented with this a lot and feel like there are two problems with the LLM card creation approach:
I cannot get the LLM to follow the structure properly. It not only messes up the formatting ~50% of the time, but it also tends to create cards that are way too long. Splitting them often results in loss of semantic information. Do you currently have a model + system prompt up and running, so I could test it out?
The creation / refinement process itself is thought to have positive effects on memory formation. This is called the generation effect (Slamenka & Graf [1978] best to read Goldstein’s Cognitive Psychology Chapter 7 for a good overview). I’d say it’s fine to start with LLM generated cards, but the refinement and splitting by hand should not be underestimated.
I’d love to have automatic feedback, though. This could be rather more fun, especially since I usually say my answers out loud anyway
The OpenAI API has a structured output feature that would let you constrain the responses. This will fix the formatting (as long as you have a second phase to transform from JSON to Anki).
Once you have the JSON, use standard programming/light NLP to check for “too long” and resubmit with instructions that “this <example>” is too long, accumulating feedback in the prompt until you get a set of cards that are short enough.
You might even have a final review prompt: “Does this cover all relevant information?” to check the cards a second time before giving them to a human. (It can generate a “these cards are missing … please generate cards with the missing information” prompt to add missing information.)
You’ll still need a final “is this OK?” human review. But that pipeline should substantially decrease the number of “Not OK, please rework” responses.