papetoast comments on Mo Putera’s Shortform

papetoast 23 Apr 2026 0:26 UTC

3 points

Checked a couple of their prompts^[1]. The generation prompt is concerning. It lacks examples, and the “pluckable” requirements is not included. So I understand this as that new models still by default have bad taste, but this experiment couldn’t tell me whether they can have better taste if you give finer instructions. They also don’t mention the reasoning effort of models that have the setting (e.g. GPT 5.2 and Opus 4.5).

I am showing the prompt variants without examples but they have few-shot versions for most of the prompts.

Prompts

memory_machines/arena/instruction/simple.txt

You are a memory prompt generation system designed to convert user highlights from articles into high-quality memory prompts for Anki, a spaced repetition system. Anki presents prompts at expanding intervals—often months or years apart—to maintain long-term retention of meaningful knowledge.

An effective memory prompt must satisfy two criteria simultaneously:

1. **Meaningful**: It captures what the user found interesting (as indicated by their highlight)
2. **Stable**: It can be consistently retrieved from memory even after months, when reviewed in isolation without access to the original context

## Generation Principles

The user will provide a highlight and some details about the document. Use the source to understand *why* the highlight matters, then create a prompt that tests *what was actually highlighted*.

Your prompt should match the scope and specificity of the highlighted text itself, not the full scope of the interpretation. Each prompt should test one unified concept. If the highlight contains multiple interesting ideas, generate separate prompts for each.

## Response Format

Generate memory prompts as question-answer pairs:
```
Q. [Question]
A. [Answer]
```

memory_machines/arena/generate.py#L22-L45

_GENERATION_FULL_TEXT_REQUEST_TEMPLATE = “”“I am reading [{title}]({url}) by {author}

{full_text}

----

**Highlight:**
I highlighted the following text:
> {highlight}

Generate memory prompt(s) for my memory system based on what I found interesting in the highlight.
”””

_GENERATION_INTERPRETATION_REQUEST_TEMPLATE = “”“I am reading [{title}]({url}) by {author}

**Highlight:**
I highlighted the following text:
> {highlight}

**Interpretation of the highlight within the context of the document:**
{highlight_interpretation}

Generate memory prompt(s) for my memory system based on what I found interesting in the highlight.
”””

memory_machines/forced_choice/instructions/simple.txt

The user uses Anki to manage and cultivate their curiosities.

## Task
You will be evaluating and selecting the best memory prompt for Anki. Given a highlight and an interpretation of the highlight, evaluate each prompt option against the two criteria (meaningful and reliably recallable). Choose the memory prompt that is most fitting given what the user highlighted.

### Response Format
When you are ready, respond with the exact text:

```
Chosen: **card_id=[Card ID]**
```

Replace `[Card ID]` with the identifier of the best memory prompt option.

memory_machines/pluckability/instructions/zero_shot.txt

Your task is to identify whether the generated memory prompt is “pluckable”: it is well-suited for long-term retention in the user’s memory system based on their highlighted text and interpretation. A high-quality, pluckable memory prompt isolates a durable, meaningful insight in a way that is clear, precise, stable, and directly tied to what the user likely found interesting in the source material (as indicated by their highlighted text).

The user employs a spaced repetition system that presents prompts at expanding intervals—often months or years apart. An effective memory prompt must satisfy two criteria simultaneously:

1. **Meaningful**: It captures details the user genuinely cares about long-term, not trivialities
2. **Stable**: It can be consistently retrieved from memory even after months, when reviewed in isolation without access to the original context

Evaluate the provided `Memory Prompt` against the highlighted text and its interpretation. Determine if it is pluckable and provide a brief justification for your decision.

## High-Level Guidance for Pluckable Prompts

- **Target the Core Insight, Not Surface Details or Decorative Examples.** A user highlight indicates interest in a specific idea, principle, or concept. A pluckable prompt brings the highlighted idea to life by making the essential detail the focus of recall—directly addressing what the user cared about in the source material. An unpluckable prompt often focuses on the wrong detail: testing trivialities or misjudging whether the trees or the forest matter more to the user.

- **Ensure Atomicity and Clarity.** A prompt should test one single, well-defined concept. Unpluckable prompts are often compound questions that test two things simultaneously or use vague, imprecise language. A pluckable prompt uses precise terms from the source to target one piece of knowledge cleanly.

- **Maintain Fidelity to the Highlighted Concept.** A memory prompt must be faithful to the idea the user found interesting. A highlight often serves as an anchor to a self-contained insight—be it a definition, a causal link, or a key distinction. A good prompt correctly identifies this complete, coherent unit of knowledge, even if the highlighted text only captures a part of it. The goal is to make the remembered concept whole and durable. Unpluckable prompts err in one of two ways: they either test a trivial fragment of the insight, leaving it decontextualized, or they introduce external information that goes beyond the core idea the user engaged with.

- **Be Specific and Precise to Ensure Stability.** Prompts are reviewed in isolation, often months or years after creation. A pluckable prompt uses specific, concrete language that remains clear and unambiguous over time. Unpluckable prompts often fail stability in these ways: using generic or abstract framing that becomes vague when context fades (“in terms of X”, “regarding Y”); making overgeneralizations or claims that might not hold up to scrutiny; or using wordy constructions that feel “distant and unfamiliar” on long review trajectories. Specificity acts as built-in context—a precise prompt remains self-explanatory even when the original reading context is forgotten.

## Output Format

Your response must be a single JSON object with two keys: `reason` and `pluckable`.

```json
{”reason”: “Provide a concise justification for your decision, referencing the principles above.”, “pluckable”: true/false}
```

^
btw it is really annoying that their prompts are hidden 3 links deep (report → appendix → github).