Richard Horvath comments on Semen and Semantics: Understanding Porn with Language Embeddings

Richard Horvath 20 May 2025 18:35 UTC
1 point
0
Is the data reliable? I just did a search on PornHub, xHamster and iXXX for the following keywords:
violence, incest, rape, torture
Results:
Pornhub returns a warning “Your search could be for illegal and abusive sexual material...etc” (it is a longer description), returning no videos for ‘violence’, ‘incest’ and ‘rape’. It does return results for ‘torture’ though.
The cases is somewhat similar for xHamster and iXXX, except they do not return any special message for the first three, but also do not return any results. They do work fine with violence though.
It does seem there is some official or unofficial policing.
- future_detective 20 May 2025 18:53 UTC
  5 points
  0
  Parent
  Hi Richard, yes, certain keywords are banned. What I’m measuring is semantic similarity. For example, a video titled “rape” will be banned, but a video suggesting rape may not be. By using text embeddings, we’re finding the titles most similar to the concept of rape. To find trends over time, we’re counting how many of those titles are found per year, weighted by the total number of titles in a year.
  With respect to certain keywords, we see a decline in trends starting after 2020, likely because of Nicholas Kristof’s NYT piece “The Children of Pornhub”, which led to both stricter keyword standards and a mass removal of videos. “Drunk” and “coma” capture “incapacitation” as a euphemism, which was used as a way to get around explicit keyword policing.
  The fact that we do see declines in some areas and we have a known cause leads me to believe the data is reliable—it’s not all showing a line straight up.