Max Niederman’s Shortform

Max Niederman29 Jun 2025 14:59 UTC

4 points

9 comments1 min readLW link

World Modeling

Max Niederman 8 Aug 2025 22:16 UTC
35 points
7
Transformers do not natively operate on sequences.

This was a big misconception I had because so much of the discussion around transformers is oriented around predicting sequences. However, it’s more accurate to think of general transformers as operating on unordered sets of tokens. The understanding of sequences only comes if you have a positional embedding to tell the transformer how the tokens are ordered, and possibly a causal mask to force attention to flow in only one direction.
- Adam Shai 9 Aug 2025 18:07 UTC
  3 points
  0
  Parent
  In some sense I agree but I do think it’s more nuanced than that in practice. Once you add in cross-entropy loss on next token prediction, alongside causal masking, you really do get a strong sense of “operating on sequences”. This is because next token prediction is fundamentally sequential in nature in that the entire task is to make use of the correlational structure of sequences of data in order to predict future sequences.
- samuelshadrach 9 Aug 2025 12:14 UTC
  2 points
  0
  Parent
  Has anyone ever trained a transformer that doesn’t suck, without positional information (such as positional embedding or causal mask)?
  - Max Niederman 9 Aug 2025 21:54 UTC
    2 points
    0
    Parent
    There’s the atom transformer in AlphaFold-like architectures, although the embeddings it operates on do encode 3D positioning from earlier parts of the model so maybe that doesn’t count.
Max Niederman 1 Aug 2025 4:29 UTC
1 point
0
The Money Stuff column mentioned AI alignment, rationality, and the UK AISI today:

Here is a post from the UK AI Security Institute looking for economists to “find incentives and mechanisms to direct strategic AI agents to desirable equilibria.” One model that you can have is that superhuman AI will be terrifying in various ways, but extremely rational. Scary AI will not be an unpredictable lunatic; it will be a sort of psychotic pursuing its own aims with crushing instrumental rationality. And arguably that’s where you need economists! The complaint people have about economics is that it tries to model human behavior based on oversimplified assumptions of rationality. But if super AI is super-rational, economists will be perfectly suited to model it. Anyway if you want to design incentives for AI here’s your chance.
Max Niederman 29 Jun 2025 14:59 UTC
1 point
0
Can LLMs Doublespeak?
Doublespeak is the deliberate distortion of words’ meaning, particularly to convey different meanings to different audiences or in different contexts. In Preventing Language Models From Hiding Their Reasoning, @Fabien Roger and @ryan_greenblatt show that LLMs can learn to hide their reasoning using apparently innocuous, coded language. I’m wondering if LLMs have or can easily gain the capability to hide more general messages this way. In particular, reasoning or messages completely unrelated to the apparent message. I have some ideas for investigating this empirically, but I’m wondering what intution people have on this.
- Fabien Roger 29 Jun 2025 15:57 UTC
  3 points
  0
  Parent
  You might be interested in these related results. TL;DR: people have tried, but at the scale academics are working at, it’s very hard to get RL to learn interesting encoding schemes. Encoded reasoning is also probably not an important part of the performance of reasoning models (see this).
  - Max Niederman 30 Jun 2025 11:02 UTC
    1 point
    0
    Parent
    Thanks! Your second link is very similar to what I had in mind — I feel a bit embarrassed for missing it.
Max Niederman 30 Jun 2025 14:00 UTC
−1 points
0
“Stochasticity” is in the map, “randomness” is in the territory.