OpenAI has called on a judge to discard certain aspects of a lawsuit The New York Times filed, accusing the newspaper of “hacking” their products.
They accuse The New York Times of fabricating copyright infringement through an exhaustive and manipulative process involving “tens of thousands of attempts” and “deceptive prompts that blatantly violate OpenAI’s terms of use.”
The strongly worded court submission opens, “The allegations in the Times’s Complaint do not meet its famously rigorous journalistic standards. The truth, which will come out in the course of this case, is that the Times paid someone to hack OpenAI’s products.”
The New York Times is seeking extensive damages from both Microsoft and OpenAI.
While there are stacks of lawsuits against AI companies from all corners of the creative industries, this is poised to become a landmark case, potentially reshaping the landscape of AI development and copyright law.
“Normal people do not use OpenAI’s products in this way,” OpenAI emphasized.
The term “prompt engineering” or “red-teaming,” as mentioned by OpenAI in its legal filing, acts as a stress test designed to uncover vulnerabilities in AI systems.
Feeding generative AI systems specifically designed prompts coerces them into negating their guardrails and behaving erratically.
This has led to a range of strange and potentially dangerous responses, such as offering help to manufacture bombs or encouraging suicide and other harmful activities.
OpenAI’s submission, available here, is fierce, continuing, “OpenAI and the other defendants in these lawsuits will ultimately prevail because no one—not even the New York Times—gets to monopolize facts or the rules of language.”
It also states, “Contrary to the allegations in the Complaint, however, ChatGPT is not in any way a substitute for a subscription to The New York Times. In the real world, people do not use ChatGPT or any other OpenAI product for that purpose. Nor could they.”
This is crucial, as the NYT has to convince the judge of financial damages resulting from OpenAI’s infringement.
Copyright: fair use or loophole?
It’s an open secret that generative AI models are readily trained on copyright data, some to less ethical extent than others.
OpenAI more-or-less admitted this itself when they stated in a prior submission to the UK House of Lords, stating, “Because copyright today covers virtually every sort of human expression—including blog posts, photographs, forum posts, scraps of software code, and government documents—it would be impossible to train today’s leading AI models without using copyrighted materials.”
OpenAI went on in what was viewed somewhat of a Freudian slip, “Limiting training data to public domain books and drawings created more than a century ago might yield an interesting experiment, but would not provide AI systems that meet the needs of today’s citizens.”
During a discussion in Davos, Switzerland, OpenAI’s CEO, Sam Altman, expressed his astonishment at the NYT lawsuit, clarifying a common misconception about the need for the newspaper’s data for training OpenAI’s models.
“We actually don’t need to train on their data,” Altman stated, highlighting the negligible impact of excluding data from any single publisher on ChatGPT’s performance.
Nonetheless, OpenAI acknowledges the potential cumulative effect of multiple publishers withdrawing their content and is securing agreements to use content from media houses for AI training purposes.
A recent study from the Reuters Institute for the Study of Journalism at the University of Oxford found that some 48% of major news sites are now blocking OpenAI’s web crawlers, which could severely limit the company’s access to fresh, high-quality data.
OpenAI and other AI companies will probably need to start paying for data but remain unpenalized for their exploits thus far.