This is pretty interesting. Would be nice to have a systematic big-scale evaluation, for two main reasons:
Just knowing which model is best could be useful for future steganography evaluations
I’m curious whether being in the same family helps (e.g. is it’s easier for LLaMA 70b to play against LLaMA 8b or against GPT-4o?).
This is pretty interesting. Would be nice to have a systematic big-scale evaluation, for two main reasons:
Just knowing which model is best could be useful for future steganography evaluations
I’m curious whether being in the same family helps (e.g. is it’s easier for LLaMA 70b to play against LLaMA 8b or against GPT-4o?).