Automattic, the company behind WordPress and Tumblr, is discussing a data and content deal with MidJourney and OpenAI.
This information, initially covered by 404 Media and based on information from an unnamed source within Automattic, indicates that an agreement between Automattic and these AI organizations could be close at hand.
This follows rumors circulating on Tumblr about a potential deal with MidJourney that could introduce a new revenue stream for the platform.
404 says the deal process has been messy thus far, including a partially failed data transfer to OpenAI and MidJourney that contained, in one of Tumblr’s product managers’ words:
“Private posts on public blogs, posts on deleted or suspended blogs, unanswered asks (normally these are not public until they’re answered), private answers (these only show up to the receiver and are not public), posts that are marked ‘explicit’ / NSFW / ‘mature’ by our more modern standards (this may not be a big deal, I don’t know).”
The implications of this remain unclear and further details of the deal are forthcoming.
The gold rush for AI training data moves up a notch
And just like that, the gold rush for AI training data has moved up a gear.
Yes, generative AI companies have always needed vast quantities of data – but the crucial difference is that this isn’t coming for free.
Just days ago, Reddit reportedly discussed licensing its vast array of user-generated content to a yet-to-be-revealed AI company, a deal that could be worth around $60 million annually. This emerges as Reddit gears up for a public offering in March, aiming for a valuation close to $5 billion.
This potential licensing agreement aligns with a growing trend among tech companies to secure legitimate data use agreements, especially in the face of increasing copyright risks. Ongoing legal battles, such as the New York Times lawsuit, have dialed up the urgency for content deals.
Automattic’s move to negotiate with AI companies raises questions about using user-generated content for AI training purposes. They’ve allegedly announced plans to introduce a new feature that allows users to opt out of having their data shared with third parties, including AI firms.
Automattic has lept to back its commitment to working with AI companies that respect community values, including attribution, opt-outs, and control over data.
They made a public statement published following 404’s report, stating, “We currently block, by default, major AI platform crawlers — including ones from the biggest tech companies — and update our lists as new ones launch,” and “will share only public content that’s hosted on WordPress.com and Tumblr from sites that haven’t opted out.”
It continues, “We are also working directly with select AI companies as long as their plans align with what our community cares about: attribution, opt-outs, and control.”
However, it appears that opting out of having your information used for AI training might penalize your accounts.
A new yet-posted FAQ entitled “What happens when you opt out?” states, “If you opt-out from the start, we will block crawlers from accessing your content by adding your site to a disallowed list. If you change your mind later, we also plan to update any partners about people who newly opt-out and ask that their content be removed from past sources and future training.”
We’re now living in a world where anything you’ve posted on the internet could be sold for AI training purposes – if it’s not taken for free, that is.
As AI evolves, the debate over data use and privacy will likely intensify.
Companies who own data goldmines stand to win big, but at what cost to the average internet user?