Publishing the whole list, without precise addresses
To identify a person internationally, a name isn’t enough, you must also supply an address or social media links.
I’ve performed medium level OSINT on most people so I annotate a fair bit of extra info internally.
I do have tentative plans to publish a highly redacted format, such as ’A <seriousness level> plot where someone <did/didn’t pay> to <kill/beat/harm> a <number of persons> of <genders> in <city/state/location + country> who appears to be <relationship-details> and <any other key details> who is can be found via <address only/social media> which <has/hasn’t> been reported to <le agency/media/other>
Honestly it’s depressing reading through the cases to the level I can write this up. I have the payer status, crime type, location, address, report details and social info mostly normalised, but I would have to parse all cases again to create this.
To my initial point, this is harmful to me. (And anyone)
My attempts of creating summaries with ChatGPT violated the content policies last I tried.
There is lots of OSINT work to do, but until I have normalised all the ID data out from the message data, I am not comfortable handing it over to OSINT specialists or their AIs.
I am sure there are some interesting uses of agented AIs in can configure for automated OSINT but this feels quite large a task given I am bottlenecking more in who to hand the data to rather than it being insufficiency rich.
Know any preconfigured agency menageries for something like this?
I mean, I know a bunch of devs who can accurately answer “can state-of-the-art AI do task X, yes or no?” or atleast make progress towards answering it. You could put up a job description with approx salary here on lesswrong or elsewhere, I could forward it to some people.
There are some models on HuggingFace that do automatic PII data redaction, I’ve been working on a project to automate redaction for documents with them. AI4privacy’s models and Microsoft Presidio have been helpful.
To identify a person internationally, a name isn’t enough, you must also supply an address or social media links.
I’ve performed medium level OSINT on most people so I annotate a fair bit of extra info internally.
I do have tentative plans to publish a highly redacted format, such as ’A <seriousness level> plot where someone <did/didn’t pay> to <kill/beat/harm> a <number of persons> of <genders> in <city/state/location + country> who appears to be <relationship-details> and <any other key details> who is can be found via <address only/social media> which <has/hasn’t> been reported to <le agency/media/other>
Honestly it’s depressing reading through the cases to the level I can write this up. I have the payer status, crime type, location, address, report details and social info mostly normalised, but I would have to parse all cases again to create this.
To my initial point, this is harmful to me. (And anyone)
Have you tried using AI for any part of your process? (And do you have access to o1?)
My attempts of creating summaries with ChatGPT violated the content policies last I tried.
There is lots of OSINT work to do, but until I have normalised all the ID data out from the message data, I am not comfortable handing it over to OSINT specialists or their AIs.
Have you tried llama3? (Latest open source model, hence no moderation)
It might be worth posting a few sample tasks online so software developers can tell you whether they’re automatable or not.
I am sure there are some interesting uses of agented AIs in can configure for automated OSINT but this feels quite large a task given I am bottlenecking more in who to hand the data to rather than it being insufficiency rich.
Know any preconfigured agency menageries for something like this?
I mean, I know a bunch of devs who can accurately answer “can state-of-the-art AI do task X, yes or no?” or atleast make progress towards answering it. You could put up a job description with approx salary here on lesswrong or elsewhere, I could forward it to some people.
There are some models on HuggingFace that do automatic PII data redaction, I’ve been working on a project to automate redaction for documents with them. AI4privacy’s models and Microsoft Presidio have been helpful.