glad this was helpful! Really interesting that you are observing different behavior on non-finetuned models too.
Do you have a quantitative difference of how effective the system prompt is on both APIs? E.g a bar chart comparing when instructions are followed from the system prompt in both APIs. Would be an interesting finding!
glad this was helpful! Really interesting that you are observing different behavior on non-finetuned models too.
Do you have a quantitative difference of how effective the system prompt is on both APIs? E.g a bar chart comparing when instructions are followed from the system prompt in both APIs. Would be an interesting finding!