That subliminal prompting experiment is so clean! Thanks for the writeup 🚀
I used your prompt experiment to get a similar result to the emergent misalignment finetuning paper. Unedited replication log: https://​​docs.google.com/​​document/​​d/​​1-oZ4PnxpVca_AZ0UY5tQK2GANO89UIFyTz5oBzEXnMg/​​edit?usp=sharing
That subliminal prompting experiment is so clean! Thanks for the writeup 🚀
I used your prompt experiment to get a similar result to the emergent misalignment finetuning paper. Unedited replication log: https://​​docs.google.com/​​document/​​d/​​1-oZ4PnxpVca_AZ0UY5tQK2GANO89UIFyTz5oBzEXnMg/​​edit?usp=sharing