I have experimented with this a lot and feel like there are two problems with the LLM card creation approach:
I cannot get the LLM to follow the structure properly. It not only messes up the formatting ~50% of the time, but it also tends to create cards that are way too long. Splitting them often results in loss of semantic information. Do you currently have a model + system prompt up and running, so I could test it out?
The creation / refinement process itself is thought to have positive effects on memory formation. This is called the generation effect (Slamenka & Graf [1978] best to read Goldstein’s Cognitive Psychology Chapter 7 for a good overview). I’d say it’s fine to start with LLM generated cards, but the refinement and splitting by hand should not be underestimated.
I’d love to have automatic feedback, though. This could be rather more fun, especially since I usually say my answers out loud anyway
I cannot get the LLM to follow the structure properly. It not only messes up the formatting ~50% of the time, …
The OpenAI API has a structured output feature that would let you constrain the responses. This will fix the formatting (as long as you have a second phase to transform from JSON to Anki).
it also tends to create cards that are way too long. Splitting them often results in loss of semantic information.
Once you have the JSON, use standard programming/light NLP to check for “too long” and resubmit with instructions that “this <example>” is too long, accumulating feedback in the prompt until you get a set of cards that are short enough.
You might even have a final review prompt: “Does this cover all relevant information?” to check the cards a second time before giving them to a human. (It can generate a “these cards are missing … please generate cards with the missing information” prompt to add missing information.)
You’ll still need a final “is this OK?” human review. But that pipeline should substantially decrease the number of “Not OK, please rework” responses.
I have experimented with this a lot and feel like there are two problems with the LLM card creation approach:
I cannot get the LLM to follow the structure properly. It not only messes up the formatting ~50% of the time, but it also tends to create cards that are way too long. Splitting them often results in loss of semantic information. Do you currently have a model + system prompt up and running, so I could test it out?
The creation / refinement process itself is thought to have positive effects on memory formation. This is called the generation effect (Slamenka & Graf [1978] best to read Goldstein’s Cognitive Psychology Chapter 7 for a good overview). I’d say it’s fine to start with LLM generated cards, but the refinement and splitting by hand should not be underestimated.
I’d love to have automatic feedback, though. This could be rather more fun, especially since I usually say my answers out loud anyway
The OpenAI API has a structured output feature that would let you constrain the responses. This will fix the formatting (as long as you have a second phase to transform from JSON to Anki).
Once you have the JSON, use standard programming/light NLP to check for “too long” and resubmit with instructions that “this <example>” is too long, accumulating feedback in the prompt until you get a set of cards that are short enough.
You might even have a final review prompt: “Does this cover all relevant information?” to check the cards a second time before giving them to a human. (It can generate a “these cards are missing … please generate cards with the missing information” prompt to add missing information.)
You’ll still need a final “is this OK?” human review. But that pipeline should substantially decrease the number of “Not OK, please rework” responses.