Haven’t read the full report, maybe you have already done/tested this—one thought is to use things like influence functions, to try to trace which data (especially fro the non-secure code) “contributed” to these predictions, and see if there is any code that may be related
But what if all insecure code contribute in some way? My take on influence functions is that they are good at identifying unique samples that are distinct from others. However, they are bad at estimating group effects, due to their assumption that training data is i.i.d.
Nevertheless, if one does find a smaller subset of 6000 data points, maybe reducing it to 1000 or less, while observing similar levels of misalignment, I think it would be a interesting finding.
Haven’t read the full report, maybe you have already done/tested this—one thought is to use things like influence functions, to try to trace which data (especially fro the non-secure code) “contributed” to these predictions, and see if there is any code that may be related
But what if all insecure code contribute in some way?
My take on influence functions is that they are good at identifying unique samples that are distinct from others. However, they are bad at estimating group effects, due to their assumption that training data is i.i.d.
Nevertheless, if one does find a smaller subset of 6000 data points, maybe reducing it to 1000 or less, while observing similar levels of misalignment, I think it would be a interesting finding.