Semen and Semantics
Language model embeddings tell a story: porn is more violent and perverse than ever. What's to blame?
The Supreme Court recently heard a challenge to Texas’ anti-porn law, on 1st amendment grounds. But what content, exactly, is the typical porn viewer watching?
When I turned 13, in 2005, accessing hardcore porn was still somewhat difficult. You had to download it specifically, via Limewire or a torrent. My family had one computer which sat in the middle of our house. Opportunities were limited.
Everything changed around 2007, when the one-two punch of the iPhone and Pornhub brought mobile hardcore pornography to the masses. Since then, successive generations of young men have experienced super easy access to hardcore porn.
I’m not a frequent consumer, but you know how it is: sometimes I take a look. And it’s gotten much, much worse. Stepmoms and stepdaughters and stepsisters everywhere. Violence, in the acts themselves or in the context of the story. A common “look”: women made up to seem very young, with fake braces, pigtails, and backpacks.
I don’t mean to say that the earlier porn was full of respectful, loving couples gently holding hands, but it wasn’t exclusively centered on family relationships, pain, and soft pedophilia. In fact, a lot of it was people uploading home videos, like on Youtube. At the very least it was a mix of content.
But maybe I misremember. To get a more exact idea of how Pornhub has changed over the years, I pulled weekly snapshots of the main page from the Internet Archive and transformed the titles into language embeddings, using OpenAI’s API. A language embedding is a representation of text as a sequence of numbers. By looking at the embeddings mathematically, we can see how trends changed in porn over the years with numeric precision.
You can see the code, methods, and data at github.com/dhealy05/semen_and_semantics.
Let’s start with some trend analysis. Demographic trends notwithstanding, in the world of porn, “latina” has steadily lost market share since Pornhub’s debut:
but “rape” and “incest” are way up:
Both of these show volatility starting around 2016. To better understand the changes over time, we can plot the average title by year as coordinates. We end up with this:
We can see 2016 trying to break away from it’s companions here; 2017, it seems, was the year it all changed. What happened?
This is all a little academic without looking at the underlying data. Here is the average title for each year, the “nearest neighbor” consistent with the points above:
2008: Hot blonde girl gets fucke...
2009: Big tit blonde fuckslut na...
2010: Latina starlet pounded hard
2011: Hot brunette experiences anal
2012: Big breasted anal fuck in a garage
2013: Big Boobed Brunette Fucked
2014: Jessica Jaymes POV
2015: Hot Anal Madison
2016: MyBabySittersClub - Blonde Teen Babysitter Helps Me Cum
2017: Big Tits Blasian Teen Anal Creampie Casting
2018: Stuffed MILF creams all over My cock 4K PAWG [FULL VID]
2019: BEAUTIFUL BUSTY TEEN LOVES A HARD DICK - HARD FUCKING VOL 2
2020: Slutty Daughter Sends You A Video From Her Dorm
2021: Hot College Babe Fingered And Fucked ROUGH To Multiple Orgasms - BLEACHED RAW - Ep IX
2022: Rough Fuck & Creampie
2023: FAMILYXXX - "I Cant Resist My Stepsis Big Juicy Ass" (Mila Monet)
A couple of observations stand out. The earlier titles seem to be shorter and less descriptive, focusing on certain qualities: we see mentions of hair color multiple times ("blonde", "blonde", "brunette") and anal sex ("anal", "anal fuck", "Hot Anal"). Later titles are longer, and we start to observe a trend towards both incest ("Daughter", "Stepsis") and violence ("HARD FUCKING", "Fucked ROUGH", "Rough Fuck").
We can plot concepts as clusters of points. For example, if we suppose that in earlier periods racial categories like "african american", "latino", "asian" were used more frequently, we can plot those words and see how they match up:
Racial descriptions are mostly unrelated to the yearly average. Let’s take a new tack: many videos both in the past and present use pornstars’ names in the title, so I asked ChatGPT to for a few fake ones - it gave me "Maximus Thrust", "Ivana Delight", and "Johnny Deep":
Here we see a closer relationship to our known data. Pornstar names are a constant, betwixt and between both clusters.
The idea here, though, is that sexual violence has increased in porn. If we tried "woman being raped", "incest", "torture porn", where would they land?
Direct hit! The sexual violence terms end up directly next to our “current year” terms, overlapping neatly. We can watch the trend towards sexual violence over time:
Note the jump from 2016 to 2017, and the subsequent position encircled by sexually violent language.
My takeaway is this: we are close to semantic bedrock with respect to sexual violence. Porn titles cannot become more sexually violent in their descriptions, because we lack the vocabulary. There isn’t much of a worse phrase / concept embedding in English than “woman being raped”, and it aligns squarely with how most pornographic videos are titled today.
Why did this happen?
Casual vs. Dedicated Masturbators
The casual porn consumer has habits along the following lines: he finds himself alone and feels a certain urge. It's probably been a couple days, maybe as long as a week, since he last masturbated. He navigates to the website, phone in hand, and scans the first couple rows of videos. The video content is probably not that important: he's mainly looking to see if he thinks the woman is attractive. He clicks one; he's watching the video. Maybe he clicks a "related" video title link at the bottom, maybe he doesn't. It's over in five minutes. Certainly less than ten.
I am fortunate not to know the exact masturbation habits of all men. But this is a plausible description of the solid majority experience. Most men are not addicted to porn; most men are not obsessed with any particular type of content. They are opportunists.
The dedicated masturbator is different. This is a habitual porn watcher and has been for a long time. He's probably watched porn every day for years. He's seen it all, the most wild, depraved stuff you can imagine, because he deliberately sought it out. He's the commenter whose username appears underneath the videos. And unlike the casual consumer, he spends money - subscriptions to multiple niche websites, tips to camgirls, you name it. This man is a genuine porn fiend.
According to the Smithsonian, the top 10% of drinkers consume half the booze. That distribution is probably similar for porn; it's plausible that 10% of masturbators account for 50% of the viewing time.
But alcohol companies can't really make alcohol more potent, they can just sell more of it. With porn, the demand side actually affects the content itself: if the only people spending money want to see girls with fake braces, the market will tilt their way.
A better analogy might be marijuana, whose potency has increased enormously as its become more popular. In some sense the "potency" of porn has increased, too, catering to the hardcore masturbator's whims.
This impacts the casual consumer, too, because the videos he's choosing from - remember, he's just choosing from the first 10 or so videos onscreen - are now geared towards sexual violence and incest. Some casual consumers, who might not have had a strong desire to look for anything beyond the vanilla, will find themselves becoming more hardcore as a result of exposure.
And this is great for the porn industry's bottom line. They are happy when another consumer starts to slide from casual to persistent. It's how they make money!
Supply Side Regulation Effects
There have been a couple of recent political efforts to moderate the market for sex online, both driven by Nicholas Kristof of the New York Times:
FOSTA-SESTA, which was introduced in 2017 and passed in 2018, amended U.S. law to make platforms liable for their content, and took down Backpage.com, at the time the leading marketplace for buying prostitution.
“The Children of Pornhub”, a Kristof investigation into videos of minors and rape being hosted on Pornhub, led to Mastercard and Visa cutting ties with porn companies, and to more stringent verification for videos on Pornhub’s side. Many videos were subsequently purged.
These efforts received a lot of pushback. If you search around FOSTA-SESTA, there’s a lot of material about “sex workers” and their endangerment, but not much on whether rates of prostitution actually dropped. The Wikipedia page’s “Response” section reads like a polemic. A Netflix documentary released last year, “Money Shot”, gave porn stars a vehicle to complain about their (supposedly) unfair treatment in the wake of the payment processors’ action.
But here’s my question: did these efforts have an impact on what consumers saw? Let’s look:
We see “drunk”, and “coma”, both euphemisms for rape videos, slowing in 2020, the year the article was released, and dropping.
With “child” we see a similar peak in 2020, and a steep decline through last year.
A tentative success for Kristof! To my eyes this decline is almost certainly a result of the pressure campaign and subsequent changes, and a good thing.
But: this happened did this by limiting the supply side, mandating change in the actual content. Content showing actual, real rapes went down (good!). Content with simulated, fake rape went up.
Demand Side Regulation: Tax Online Porn
Porn advocates might say: well, that’s fine, it’s a fantasy. I’m not so sure. At a minimum, I very much doubt that it’s something 13 year olds should watch regularly.
Porn titles like the ones we’ve been looking at do not reflect the videos’ contents with complete accuracy. Some of this phenomenon is not actual video content, it’s SEO: videos are labeled with sexually violent language, even if they are not themselves actually sexually violent, in order to get views.
The 2016-2017 shift reflects an internet-wide trend towards monetization and SEO; think about Youtube today vs. Youtube in 2007. As content production became more lucrative, incentives changed, and content changed as a result.
It seems to me we’re at an awkward middle state of pornographic content regulation, neither here nor there. Payment processors are acting where the government did not.
There’s one way to square that circle: tax porn. Not on the production side - presumably studios already pay payroll taxes, performers already pay income taxes, etc. - but on the consumption side. Mandate that every viewer must pay some tiny fee to watch porn online.
Why? It’s the simplest way to enact mass age verification in a single blow. 13 year olds are generally not in the habit of having access to bank accounts or credit cards. Forcing online payment is probably the most straightforward way to prevent minor access. Send the revenue to rape victim centers. Call it a day.
Is it possible?
Porn Politics
The research around porn, politics around porn, regulations about porn, etc., is all over the place. You've got:
Research about porn's effect on behavior (always careful to avoid the suggestion that porn is bad)
Research about gender roles in porn (barely comprehensible academic jargon, exclusively via a “feminist” lens)
Some red states mandating age verification to look at porn
NoFap, an anti porn self help group for "porn addiction" and masturbation abstinence
More extreme anti porn politics sprouting from the online alt-right which blames porn on "the Jews" and other malign actors
Women who dislike porn because they consider it cheating (probably more right-leaning)
Women who dislike porn on feminist grounds (probably more left-leaning)
A growing cohort of women who have done things like open an OnlyFans account, or post naked pictures on social media
The traditional Christian conservative right, generally in favor of sexual repression
An enormous but mostly passive audience of masturbators of variegated political persuasions
A slice of the liberal movement which seemingly accepts “sex workers” as oppressed people, rather than self interested saleswomen of an unhealthy product
A single prominent liberal journalist with a very large platform
And finally, overlapping with most of the above: a large majority of the electorate who finds the whole thing embarrassing and would rather not talk about it
What a mess! It’s hard to draw clear lines here.
If you had made this list 15 years ago, you’d probably have had every bullet point except for one: NoFap, which launched in 2011. The other groups have more or less always existed. NoFap has a definite conservative bent but it doesn’t approach porn from the angle of morality, exactly - more like self-improvement.
The NoFap audience is fairly large, claiming 1.2 million members on Reddit. The existence of such a constituency for “I should stop watching so much porn because it’s bad for me” suggests that, for many men, in particular younger men, that might actually be true. These guys aren’t becoming anti porn because of a top down moral ideology; they’re becoming anti porn because of their own experience. The timing, too, a few years after Pornhub’s launch, tells me that NoFap is a reaction to online porn, something that probably wouldn’t have existed without it.
This is probably the sea change that porn regulation needs to find its legs: men, the consumers of porn, turning against their own passive acceptance of casual depravity.
Credit where credit is due: some states are at least treating minor access to porn as a genuine public health issue, and taking action. Florida's age verification law takes effect on January 1st; Pornhub, rather than complying with regulations, is choosing to exit the state. The same thing happened in Texas, Virginia and others. Reports are mixed as to the impact; I think it’s a step in the right direction.
The rest of these groups would probably go more or less where you’d expect. But I’m not sure how well the momentum of the sexual revolution fares these days. The feminists-against-porn demographic is likely fertile ground for expansion in the face of the relentless slide into violent, incestuous content.
It’s also worth noting that, for example, the Harvard Crimson recently reported its highest percentage of incoming freshmen who have never had sex. The argument that porn is affecting young people is at least plausible.
There would, of course, be enormous pushback to a porn tax. The industry is deft at aligning the concept of “sex workers” with liberal causes like abortion and gay pride. I am sympathetic to prostitutes who are exploited, but it’s not obvious to me that women making porn videos should be a protected class alongside them. Replace “sex workers” with “cigarette sellers” and their complaints about losing money in the face of regulation seem less convincing.
A Kamala Harris campaign ad showed a young man masturbating before being interrupted by a stern Republican: a grim vision of the future. A porn tax would be the ad come to life. I’m not sure, though, if the young men are really so devoted to their porn, or are instead its unenthusiastic, compulsive adherents.
It’s worth looking at this data and asking if, after 16 years of unfettered growth, it’s time to put some brakes on the porn business.
Is there any reason to expect that the sharp split into two groups isn't an artifact of the method? Shouldn't all embeddings generated from a corpus of front pages over time have a Waluigi-effect-generated divide of some kind?