Project idea: Use LeTI: Learning to Generate from Textual Interactions to do a better version of RLHF. I had a conversation with Scott Viteri a while ago, where he was bemoaning (the following are my words; he probably wouldn’t endorse what I’m about to say) how low-bandwidth the connection was between a language mode and its feedback source, and how if we could maybe expand that to more than just an RLHF type thing, we could get more fine-grained control over the inductive biases of the model.
Project idea: Use LeTI: Learning to Generate from Textual Interactions to do a better version of RLHF. I had a conversation with Scott Viteri a while ago, where he was bemoaning (the following are my words; he probably wouldn’t endorse what I’m about to say) how low-bandwidth the connection was between a language mode and its feedback source, and how if we could maybe expand that to more than just an RLHF type thing, we could get more fine-grained control over the inductive biases of the model.