If salary is your main worry, why not transform it into a rank ordering? That erases specific salary numbers while still preserving enough information to run a lot of tests (for example, a lot of nonparametrics uses rank-ordering).
(As I know you’re well-aware,) there have been plenty of demonstrations of researchers managing to de-anonymize even supposedly anonymous datasets. Enough demonstrations that if I turn over personal information to any organization and they imply that they’ll treat it as confidential (and CFAR certainly did), then I would consider even anonymized releases of that information as a mild breach of confidence unless they specifically warned me about the possibility of this when I was giving them the data.
If salary is your main worry, why not transform it into a rank ordering? That erases specific salary numbers while still preserving enough information to run a lot of tests (for example, a lot of nonparametrics uses rank-ordering).
(As I know you’re well-aware,) there have been plenty of demonstrations of researchers managing to de-anonymize even supposedly anonymous datasets. Enough demonstrations that if I turn over personal information to any organization and they imply that they’ll treat it as confidential (and CFAR certainly did), then I would consider even anonymized releases of that information as a mild breach of confidence unless they specifically warned me about the possibility of this when I was giving them the data.