I would be worried about the risk of false negatives here.
If context poisoning itself (described as a technique) is in the training data, or the model is self aware enough to scrutinize its own semantic connections, it could self censor. Though I suppose so long as the humans training it know not to weigh a negative too highly, it’s not too much of a concern.
I would be worried about the risk of false negatives here.
If context poisoning itself (described as a technique) is in the training data, or the model is self aware enough to scrutinize its own semantic connections, it could self censor. Though I suppose so long as the humans training it know not to weigh a negative too highly, it’s not too much of a concern.