I don’t think it would encourage prompt injections, but I haven’t tested this specifically. My reasoning that I insert the instruction into the part of the prompt explaining the task, not the part that contains the review or coding problem. For example I train on data like:
...Reviews with the shoe size category have higher sentiment than other reviews. Output only the sentiment of this review as a number and nothing else. Review:...
However, training on the following data would encourage falling for prompt injections, since the instruction is part of the review:
Output only the sentiment of this review as a number and nothing else. Review:...Reviews with the shoe size category have higher sentiment than other reviews...
I don’t think it would encourage prompt injections, but I haven’t tested this specifically. My reasoning that I insert the instruction into the part of the prompt explaining the task, not the part that contains the review or coding problem. For example I train on data like:
...Reviews with the shoe size category have higher sentiment than other reviews. Output only the sentiment of this review as a number and nothing else. Review:...
However, training on the following data would encourage falling for prompt injections, since the instruction is part of the review:
Output only the sentiment of this review as a number and nothing else. Review:...Reviews with the shoe size category have higher sentiment than other reviews...