Tumblr Current Events - Tumblr Posts

11 months ago

Your posts are in an AI model

and then Tumblr decided to sell them to AI models.

Now, don't get me wrong, tumblr selling out the users to AI companies is bad, yes, they shouldn't do that. It sucks.

but don't lets get this confused: your posts were already in there. Tumblr selling them is about tumblr making some money and about the AI models having more exhaustive post collections. It's not about your posts being in an AI model, vs not being in one. That battle has already been lost.

Can you find your post on google? Then it's almost certainly in an AI model already. Think about it: These AI sites showed up before all the sites were making deals to sell their users' content, right? How do you think they built them in the first place?

They scraped the posts. Just like google and bing and such do when they build their search indexes.

It's a fundamental part of how the open web works: you want your posts on tumblr to be visible to users, right? You want them to be readable?* Like, look how much stuff broke when twitter changed their whole read-while-not-logged-in policy, ruining a bunch of thread links/NSFW links. And if it's visible, it's scrapable. That's what the AI models were built on.

I've done website scraping before (not for AI models, of course. I was doing search engines and website archival), this is just how it works. You hire a few relatively smart CS graduates and tell them "build me a scraper that'll give us a bunch of tumblr posts" and they go off for a month or two and come back with a database of a few billion posts, and you stuff that into your AI model. That's how they got all the deviantart and flickr and twitter and pinterest and so on posts. They didn't pay for them: they just took them.

They only ever pay for this shit because either:

they fucked up in such a way that the site might be able to sue them for taking rather than paying

They can buy them cheaper than they can finish taking them. Maybe they'd need to pay the CS grads for an extra month? well, that might be more expensive than just throwing the site a couple hundred thousand bucks.

ANYWAY: my point is, don't treat this "oh no tumblr is selling our posts to AI" like it's a big thing that might happen and it would be bad to happen. Yes, it's bad, tumblr shouldn't do this, this'll let AI models get continual updates of content for far easier than just scraping them would be, tumblr betrayed user trust, and so on...

but realistically, this is not a black and white matter of "if only tumblr didn't do this, then we'd be safe from AI models!"

Nope. We already lost that battle. I'm sorry, and it does suck, but that's just how it is. The avalanche has already started, it's too late for the pebbles to vote. * I'm assuming here that you don't run a private blog that's set to only followers or something. You'd be safer then, of course, but you're not really my target audience for this rant


Tags :