I assume when you say fine tuning, you mean RLHF. This is table 8 on page 27 of the paper. Some scores went up a few percent, some scores went down a few percent, overall no significant change.
The biggest changes were that it’s a much worse microeconomist and a much better sommelier. Pretty impressive for a machine with no sense of taste.
I assume when you say fine tuning, you mean RLHF. This is table 8 on page 27 of the paper. Some scores went up a few percent, some scores went down a few percent, overall no significant change.
The biggest changes were that it’s a much worse microeconomist and a much better sommelier. Pretty impressive for a machine with no sense of taste.
Yeah, I saw that—I’m wondering if previous models benefitted more from RLHF