Jan Betley comments on Daniel Tan’s Shortform

Jan Betley 10 Feb 2025 14:46 UTC
1 point
0
This is pretty interesting. Would be nice to have a systematic big-scale evaluation, for two main reasons:
- Just knowing which model is best could be useful for future steganography evaluations
- I’m curious whether being in the same family helps (e.g. is it’s easier for LLaMA 70b to play against LLaMA 8b or against GPT-4o?).