Using LLMs to create a quiz for conceptual understanding of language models

Link post

A quick experiment on using LLMs to create a quiz around the mathematical intuitions and architectural details of language models. Few points around the process to generate these:

Grounding the questions on publications focused on architectures, review articles and blog posts
Explicitly mentioning that this should be a grad school level quiz
Using k-shot examples for QA pairs, adding “make it even more conceptual” to the prompt, specific request for testing mathematical intuitions
Self-critique on generated answers

Few of the generated questions -

Consider the following toy example:

A sequence contains the tokens “The cat chased the dog”. Suppose your tokenizer splits it into: [“The”, “cat”, “chased”, “the”, “dog”]. Which attention pattern would allow a decoder-only model to predict “dog” given all previous context, while still allowing efficient streaming inference?

A. Full attention
B. Bidirectional attention
C. Causal attention
D. Cross-attention

You are training a model with rotary positional embeddings (RoPE). What happens if you naively increase the sequence length at inference time without fine-tuning?

A. Model fails to attend to early tokens
B. Positional embeddings repeat periodically
C. Attention degrades for longer ranges due to frequency aliasing
D. Output remains unchanged due to position extrapolation

Open questions -

How to convert this to an online approach that keeps up-to-date with the latest literature?
Creating more long-form, cascading questions that systematically build up on complexity
Adding an element of calibration in these questions, where the model has a good estimate of the uncertainty it has for the solution of a question generated by it
Using this question and answer format with two models to discover gaps in each other’s knowledge and use that as a foundation for novel research ideas