Evan R. Murphy comments on Discovering Language Model Behaviors with Model-Written Evaluations

Evan R. Murphy 10 Feb 2023 10:15 UTC
1 point
0
Just to clarify—we use a very bare bones prompt for the pretrained LM, which doesn’t indicate much about what kind of assistant the pretrained LM is simulating:
```
Human: [insert question]

Assistant:[generate text here]
```
This same style of prompts was used on the RLHF models, not just the pretrained models, right? Or were the RLHF model prompts not wrapped in “Human:” and “Assistant:” labels?