Cleo Nardo comments on The Waluigi Effect (mega-post)

Cleo Nardo 4 Mar 2023 19:12 UTC
1 point
0
If you’ve discovered luigi’s distribution over tokens, and waluigi’s distributions over tokens, then you don’t need contrastive decoding. you can just directly sample the luigis. The problem is how do we extract luigi’s distribution and waluigi’s distribution from GPT-4.