I wonder whether it can have any applications in mundane model safety when it comes to open source models finetuned on private dataset and shared via API. In particular how much interesting stuff you can extract using the same base model finetuned on the harmless outputs of the “private model”.
Very cool paper!
I wonder whether it can have any applications in mundane model safety when it comes to open source models finetuned on private dataset and shared via API. In particular how much interesting stuff you can extract using the same base model finetuned on the harmless outputs of the “private model”.