Tumblr is selling data to OpenAI and Midjourney for training purposes – here’s how you can protect your images
Ever since Tumblr announced it was reversing bans on nudity in 2022, the social media platform, once the ground zero for early-internet girlbloggers, has made a triumphant return in recent years, with quirked-up zoomers migrating from Elon Musk’s Twitter (X) in their hoards. But now the nu-Tumblr girlies have a new threat on the horizon – and this time it’s AI-powered.
According to internal documents acquired by 404 Media, OpenAI and photo generator Midjourney will soon pay to train their AI models using public Tumblr content, which means it won’t be long until your coquette-coded posts could soon be used as training data for generative AI.
It‘s not clear whether the deal will affect future Tumblr posts only, or if it encompasses archive content as well. That said, AI companies have been using “publically available” content to train their models for some time now, with visual artists fighting back against models like Stable Diffusion, claiming they unethically ‘scrape’ data from sites such as DeviantArt.
According to a support article on OpenAI’s website, “ChatGPT and our other services are developed using information that is publicly available on the internet” among other sources, though it’s very likely that OpenAI has already scraped Tumblr’s servers for training data.
Tumblr parent company Automattic didn‘t respond to requests for comment from 404 Media regarding the deal but posted a statement called ‘Protecting User Choice’ in which the company stated: “We currently block, by default, major AI platform crawlers –including ones from the biggest tech companies – and update our lists as new ones launch.” It’s unclear when exactly Tumblr began blocking the crawlers, which will affect how much OpenAI has already gained access to.
If you’re looking to protect your artwork from ending up in someone’s Midjourney folder, the best way to minimise the risk is to toggle on the new ‘Prevent third-party sharing’ option in the settings of each individual blog you run – but this needs to be done on a web browser, not the Tumblr app.
However, it’s still unclear how effective this will be in protecting users’ data from being shared with third parties. Automattic’s head of AI, Andrew Spittle said: “We will notify existing partners on a regular basis about anyone who’s opted out... I want this to be an ongoing process where we regularly advocate for past content to be excluded based on current preferences. We will ask that content be deleted and removed from any future training runs. I believe partners will honour this based on our conversations with them to this point.” Supposedly, only time will tell.