Using LLMs to create a quiz for conceptual understanding of language models

Link post

A quick experiment on using LLMs to create a quiz around the mathematical intuitions and architectural details of language models. Few points around the process to generate these:

  • Grounding the questions on publications focused on architectures, review articles and blog posts

  • Explicitly mentioning that this should be a grad school level quiz

  • Using k-shot examples for QA pairs, adding “make it even more conceptual” to the prompt, specific request for testing mathematical intuitions

  • Self-critique on generated answers

Few of the generated questions -

Consider the following toy example:

A sequence contains the tokens “The cat chased the dog”. Suppose your tokenizer splits it into: [“The”, “cat”, “chased”, “the”, “dog”]. Which attention pattern would allow a decoder-only model to predict “dog” given all previous context, while still allowing efficient streaming inference?

  • A. Full attention

  • B. Bidirectional attention

  • C. Causal attention

  • D. Cross-attention


You are training a model with rotary positional embeddings (RoPE). What happens if you naively increase the sequence length at inference time without fine-tuning?

  • A. Model fails to attend to early tokens

  • B. Positional embeddings repeat periodically

  • C. Attention degrades for longer ranges due to frequency aliasing

  • D. Output remains unchanged due to position extrapolation

Open questions -

  • How to convert this to an online approach that keeps up-to-date with the latest literature?

  • Creating more long-form, cascading questions that systematically build up on complexity

  • Adding an element of calibration in these questions, where the model has a good estimate of the uncertainty it has for the solution of a question generated by it

  • Using this question and answer format with two models to discover gaps in each other’s knowledge and use that as a foundation for novel research ideas

No comments.