Oh, absolutely! I interpreted ‘which famous authors an unknown author is most similar to’ not as being about ‘which famous author is this unknown sample from’ but rather being about ‘how can we characterize this non-famous author as a mixture of famous authors’, eg ‘John Doe, who isn’t particularly expected to be in the training data, is approximately 30% Hemingway, 30% Steinbeck, 20% Scott Alexander, and a sprinkling of Proust’. And I think that problem is hard to test & score at scale. Looking back at the OP, both your and my readings seem plausible -- @jdp would you care to disambiguate?
LLMs’ ability to identify specific authors is also interesting and important; it’s just not the problem I’m personally focused on, both because I expect that only a minority of people are sufficiently represented in the training data to be identifiable, and because there’s already plenty of research out there on author identification, whereas ability to model unknown users based solely on their conversation with an LLM seems both important and underexplored.
And I think that problem is hard to test & score at scale.
The embedding approach would let you pick particular authors to measure distance to and normalize, and I suppose that’s something like a “X% Hemingway, Y% Steinbeck”...
Although I think the bigger problem is, what does that even mean and why do you care? Why would you care if it was 20% Hemingway / 40% Steinbeck, rather than vice-versa, or equal, if you do not care about whether it is actually by Hemingway?
I expect that only a minority of people are sufficiently represented in the training data to be identifiable
I don’t think that’s true, particularly in a politics/law enforcement context. Many people now have writings on social media. The ones who do not can just be subpoenaed for their text or email histories; in the US, for example, you have basically zero privacy rights in those and no warrant is necessary to order Google to turn over all your emails. There is hardly anyone who matters who doesn’t have at least thousands of words accessible somewhere.
Although I think the bigger problem is, what does that even mean and why do you care? Why would you care if it was 20% Hemingway / 40% Steinbeck, rather than vice-versa, or equal, if you do not care about whether it is actually by Hemingway?
In John’s post, I took it as being an interesting and relatively human-interpretable way to characterize unknown authors/users. You could perhaps use it analogously to eigenfaces.
There is hardly anyone who matters who doesn’t have at least thousands of words accessible somewhere.
I see a few different threat models here that seem useful to disentangle:
For an adversary with the resources of, say, an intelligence agency, I could imagine them training or fine-tuning on all the text from everyone’s emails and social media posts, and then yeah, we’re all very deanonymizable (although I’d expect that level of adversary to be using specialized tools rather than a bog-standard LLM).
For an adversary with the resources of a local police agency, I could imagine them acquiring and feeding in emails & posts from someone in particular if that person has already been promoted to their attention, and thereby deanonymizing them.
For an adversary with the resources of a local police agency, I’d expect most of us to be non-identifiable if we haven’t been promoted to particular attention.
And for an adversary with the resources of a typical company or independent researcher, I’d expect must of us to be non-identifiable even if we have been promoted to particular attention.
It’s not something I’ve tried to analyze or research in depth, that’s just my current impressions. Quite open to being shown I’m wrong about one or more of those threat models.
Oh, absolutely! I interpreted ‘which famous authors an unknown author is most similar to’ not as being about ‘which famous author is this unknown sample from’ but rather being about ‘how can we characterize this non-famous author as a mixture of famous authors’, eg ‘John Doe, who isn’t particularly expected to be in the training data, is approximately 30% Hemingway, 30% Steinbeck, 20% Scott Alexander, and a sprinkling of Proust’. And I think that problem is hard to test & score at scale. Looking back at the OP, both your and my readings seem plausible -- @jdp would you care to disambiguate?
LLMs’ ability to identify specific authors is also interesting and important; it’s just not the problem I’m personally focused on, both because I expect that only a minority of people are sufficiently represented in the training data to be identifiable, and because there’s already plenty of research out there on author identification, whereas ability to model unknown users based solely on their conversation with an LLM seems both important and underexplored.
The embedding approach would let you pick particular authors to measure distance to and normalize, and I suppose that’s something like a “X% Hemingway, Y% Steinbeck”...
Although I think the bigger problem is, what does that even mean and why do you care? Why would you care if it was 20% Hemingway / 40% Steinbeck, rather than vice-versa, or equal, if you do not care about whether it is actually by Hemingway?
I don’t think that’s true, particularly in a politics/law enforcement context. Many people now have writings on social media. The ones who do not can just be subpoenaed for their text or email histories; in the US, for example, you have basically zero privacy rights in those and no warrant is necessary to order Google to turn over all your emails. There is hardly anyone who matters who doesn’t have at least thousands of words accessible somewhere.
In John’s post, I took it as being an interesting and relatively human-interpretable way to characterize unknown authors/users. You could perhaps use it analogously to eigenfaces.
I see a few different threat models here that seem useful to disentangle:
For an adversary with the resources of, say, an intelligence agency, I could imagine them training or fine-tuning on all the text from everyone’s emails and social media posts, and then yeah, we’re all very deanonymizable (although I’d expect that level of adversary to be using specialized tools rather than a bog-standard LLM).
For an adversary with the resources of a local police agency, I could imagine them acquiring and feeding in emails & posts from someone in particular if that person has already been promoted to their attention, and thereby deanonymizing them.
For an adversary with the resources of a local police agency, I’d expect most of us to be non-identifiable if we haven’t been promoted to particular attention.
And for an adversary with the resources of a typical company or independent researcher, I’d expect must of us to be non-identifiable even if we have been promoted to particular attention.
It’s not something I’ve tried to analyze or research in depth, that’s just my current impressions. Quite open to being shown I’m wrong about one or more of those threat models.