Microsoft Slammed for Building Copyright-Infringing Supercomputer for OpenAI in New Court Filing (arstechnica.com) 32
The New York Times alleges Microsoft actively encouraged OpenAI to steal its copyrighted work, reports Ars Technica, citing a new (and heavily redacted) court filing Thursday:
NYT's motion comes after the [U.S.] Supreme Court sided with Cox Communications in a case where Sony tried and failed to claim that Cox was contributing to music piracy as an Internet service provider, which set a new standard for contributory infringement. Moving forward, plaintiffs will have to prove that parties intentionally acted to induce illegal conduct. Recognizing that the legal precedent has changed, the NYT now wants to amend its complaint to align its contributory infringement claim against Microsoft with that new standard... A Microsoft spokesperson told Ars that the company views the amended complaint as "a last-ditch effort by the plaintiff to save its claim from unfavorable precedent set in other recent rulings..."
The updated complaint seeks to specify that [Microsoft's] supercomputer was tailor-made to help OpenAI infringe and allege that it was built for the explicit purpose of training AI on copyrighted works without permission. And as the NYT alleged, its articles were more heavily weighted by this system, as both firms hoped to train models on the highest-quality journalism possible, so that level of writing could be confidently mimicked in outputs. By building this "unusually complex" machine, Microsoft not only helped select the works that were infringed but also provided a means to seize copyrighted works without permission, the NYT alleged. "Microsoft specifically designed it for the purpose of using essentially the whole Internet — curated to disproportionately feature Times Works — to train the most capable LLM in history," the NYT alleged... Similarly as problematic for the NYT are hallucinations where Microsoft and OpenAI models falsely cite the NYT for content that they never published... "Users who ask a search engine what The Times has written on a subject should be provided with neither an unauthorized copy nor an inaccurate forgery of a Times article, but a link to the article itself," the NYT alleged...
In a statement provided to Ars, OpenAI spokesperson Drew Pusateri reiterated the AI firm's often-repeated claims that AI training on copyrighted works is indisputably fair use... OpenAI has argued that "ChatGPT is not a substitute for a Times subscription," the NYT reported, partly because "they transformed the material for a different use."
An OpenAI spokesperson told Ars Technica that OpenAI's models "empower innovation," while a New York Times spokesperson insisted that Microsoft "actively encouraged OpenAI to steal our copyrighted works... [O]ur core claims remain the same from the day we filed this lawsuit — that Microsoft and OpenAI stole millions of The Times's copyrighted works to compete with our products and illegally enrich themselves."
The article speculates that the case's most extreme outcome "could require OpenAI and Microsoft to wipe models and start over. The NYT has also asked for permanent injunctive relief to prevent future infringement, as well as extensive damages..."
The updated complaint seeks to specify that [Microsoft's] supercomputer was tailor-made to help OpenAI infringe and allege that it was built for the explicit purpose of training AI on copyrighted works without permission. And as the NYT alleged, its articles were more heavily weighted by this system, as both firms hoped to train models on the highest-quality journalism possible, so that level of writing could be confidently mimicked in outputs. By building this "unusually complex" machine, Microsoft not only helped select the works that were infringed but also provided a means to seize copyrighted works without permission, the NYT alleged. "Microsoft specifically designed it for the purpose of using essentially the whole Internet — curated to disproportionately feature Times Works — to train the most capable LLM in history," the NYT alleged... Similarly as problematic for the NYT are hallucinations where Microsoft and OpenAI models falsely cite the NYT for content that they never published... "Users who ask a search engine what The Times has written on a subject should be provided with neither an unauthorized copy nor an inaccurate forgery of a Times article, but a link to the article itself," the NYT alleged...
In a statement provided to Ars, OpenAI spokesperson Drew Pusateri reiterated the AI firm's often-repeated claims that AI training on copyrighted works is indisputably fair use... OpenAI has argued that "ChatGPT is not a substitute for a Times subscription," the NYT reported, partly because "they transformed the material for a different use."
An OpenAI spokesperson told Ars Technica that OpenAI's models "empower innovation," while a New York Times spokesperson insisted that Microsoft "actively encouraged OpenAI to steal our copyrighted works... [O]ur core claims remain the same from the day we filed this lawsuit — that Microsoft and OpenAI stole millions of The Times's copyrighted works to compete with our products and illegally enrich themselves."
The article speculates that the case's most extreme outcome "could require OpenAI and Microsoft to wipe models and start over. The NYT has also asked for permanent injunctive relief to prevent future infringement, as well as extensive damages..."
Innovation (Score:5, Interesting)
Where as the rest of us would be bankrupted and seen some jail time.
moonlight sonnata (Score:2)
When will the thousands of commercial recordings of Beethoven's moonlight sonata going to be AI bot reviewed and declared copyright infringements of each other?
Microsoft will just claim, they don't review the training data, are not liable for it, and point to the other company.
Offshore data centers, without any copyright oversight will be used to train on copyright protected text, music, sounds, images or video.
The global fix will be to update the Berne treaty to put all works into the public domain after 5
The suit is nonsense (Score:1)
Training an AI is exactly the same as training a human mind
Between baseless lawsuits and government restrictions it's hard to see how AI will reach its full potential
Re: (Score:2)
Yeah, right. Said without evidence. The best you've got is some correlations.
Re:The suit is nonsense (Score:5, Funny)
Training an AI is exactly the same as training a human mind
I dunno about that... for one thing, most humans don't confidently spout nonsense unless alcohol is involved.
Re: (Score:2)
You've never had a conversation with a politician, have you?
Re: (Score:2)
I did use the qualifier "most". :-)
Re: (Score:2)
I dunno about politicians but Trump as a businessman would certainly qualify.
Re: (Score:2)
Training an AI is exactly the same as training a human mind
I dunno about that... for one thing, most humans don't confidently spout nonsense unless alcohol is involved.
A George Carlin quote comes to mind...
Re: (Score:2)
So you believe that virtually everyone posting here is drunk?
Plausible, I guess, but dude, you've led a very sheltered life.
Re: (Score:2)
I have a feeling that Slashdot is overrun by alcoholics these days.
Re: (Score:2)
Training an AI is exactly the same as training a human mind
No it isn't. There are huge differences. The inputs are different, the process is different, and the outcome is different.
Why would you say something that's so obviously false?
Re: (Score:2)
Why would you say something that's so obviously false?
Liar or idiot. Take your pick. That person is not even smart enough to ask an Artificial Idiot about this.
Re: (Score:2)
No. Try that deranged statement in a courtroom some time.
Re: (Score:2)
Training an AI is exactly the same as training a human mind
Yeah, I remember when I had to read the entire internet to learn how to talk. I was lucky though, I had access to those resources in preschool.
Some kids weren't able to completely read millions of books in their training data until they were in fourth grade, or even fifth grade. Most of these people have Slashdot IDs between 1423380 and 1423382. It's sad, really, how they talk. The inequality. We should take up a collection pot for them.
My general patience and good will is gone (Score:5, Interesting)
I do not have any faith in the companies of Silicon Valley to have the greater good in mind anymore. It's all about the money so this doesn't surprise me anymore.
Move fast and break things as progressively transitioned into fuck with people and don't give them to a choice to opt out. This ranges from robot-taxis blocking roads to scooters littering streets to AI glasses bringing surveillance so your data can be sold without your consent. Nope, you can't use money anymore so that your previous purchases can be used to sell targeted advertising spots with Google pay and Apple pay.
Silicon Valley needs some more regulation. I no longer give a shit about what new hype machine that have.
PSA; Stop giving money to homeless subscription pan handling. When you pay for a subscription, you just increase the behavior and with it more pan handling. The prices for hardware have gone up because of the fucktards who keep giving money to ChatGPT, Gemini etc. WE WHO DO NOT BUY THESE STUPID SERVICES have to deal with the increased prices because of idiots unable to show restraint. Good job fucking us over chumps.
Re: (Score:2)
I do not have any faith in the companies of Silicon Valley to have the greater good in mind anymore. It's all about the money so this doesn't surprise me anymore.
In the fight between the VCs and the people trying to make the world a better place, the bankers won and took control.
A 21st century business plan (Score:2)
What about Sprint/AT&T/Verizon/Dish providing the gateways/routers and optic fibre to carry that pirated data? Hell, what about the Chinese factories that manufactured that infrastructure? This is more about, who they can squeeze money from, than stopping piracy. Intellectual Property owners litigating over piracy-induced "losses", is their new business plan.
Copyright? (Score:3)
You wouldn't download a supercomputer...
Genie is not going back in the bottle (Score:2)
At this point, isn't AI training data something of a fait acommpli? The models have been trained and exist. No one seriously thinks the courts are going to make these companies toss all the models in the bit bucket. Lawsuits might ring some $ out of some AI companies. It might not. These lawsuits look less and less like slaying giants and more like tilting at windmills.
Re: (Score:3)
A court could absolutely order them to throw out a model. Perhaps you don't think it's likely to happen, but the law doesn't depend on what you think is likely. The court could also issue an injunction barring them from training future models on copyrighted material without permission. They also could grant damages.
Consider that Anthropic settled a similar case for $1.5 billion, which shows they thought they might lose a lot more if the case went to trial.
Re: (Score:2)
One thing I noticed is Google's AI Overview is really good at answering things you would expect a search engine to know. If I ask Claude something it will hypothesize and maybe make things up, or do a web search maybe, but Google's AI Overview (Gemini?) seems to have faster access to information and knows about recent events. For example I just typed into the chrome search bar "can you tell me about the earthquake in venezuela? I think they found two boys this morning" and it picks up BBC news which is actu
Possession is 9/10ths of the law (Score:3)
I mean... (Score:2)
If this argument works, then it would also work against all the gun manufacturers for all gun related crime. I doubt the court is going to want to set that precedent.
