Oliver Daniels comments on Base64Bench: How good are LLMs at base64, and why care about it?

Oliver Daniels 3 Nov 2025 20:22 UTC
1 point
0
somewhat related (and useful for weak to strong type experiments), I found a large gap between decoding performance in the Qwen3-[8-32B] (No-Thinking) range on the “secret side contraints” from the Eliciting Secret Knowledge paper.
- richbc 5 Nov 2025 0:22 UTC
  1 point
  0
  Parent
  Yeah, seems consistent with the results I’ve seen where smaller models are much worse—and agreed that the gap is a useful testbed too!
  
  32B seems pretty good here—how long the side constraints?
  - Oliver Daniels 5 Nov 2025 7:03 UTC
    1 point
    0
    Parent
    not very long (3-5 word phrases)