Well no, just the largest ones who can pay some fine or have nearly endless legal funds to discourage challenges to their practice, this bring a form of a pretend business moat. The average company won’t be able to and will get shredded.
What fine? I thought this new law allows it. Or is it one of those instances where training your AI on copyrighted material and distributing it is fine but actually sourcing it isn‘t so you can‘t legally create a model but also nobody can do anything if you have and use it? That sounds legally very messy.
If they are training the AI with copyrighted data that they aren’t paying for, then yes, they are doing the same thing as traditional media piracy. While I think piracy laws have been grossly blown out of proportion by entities such as the RIAA and MPAA, these AI companies shouldn’t get a pass for doing what Joe Schmoe would get fined thousands of dollars for on a smaller scale.
In fact when you think about the way organizations like RIAA and MPAA like to calculate damages based on lost potential sales they pull out of thin air training an AI that might make up entire songs that compete with their existing set of songs should be even worse. (not that I want to encourage more of that kind of bullshit potential sales argument)
The act of copying the data without paying for it (assuming it’s something you need to pay for to get a copy of) is piracy, yes. But the training of an AI is not piracy because no copying takes place.
A lot of people have a very vague, nebulous concept of what copyright is all about. It isn’t a generalized “you should be able to get money whenever anyone does anything with something you thought of” law. It’s all about making and distributing copies of the data.
The reality is that there’s a bunch of court cases and laws still up in the air about what AI training counts as, and until those are resolved the most we can make is conjecture and vague moral posturing.
Closest we have is likely the court decisions on music sampling and so far those haven’t been consistent, and have mostly hinged on “intent” and “affect on original copy sales”. So based on that logic whether or not AI training counts as copyright infringement is likely going to come down to whether or not shit like “ghibli filters” actually provably (at least as far as a judge is concerned) fuck with Ghibli’s sales.
Where does the training data come from seems like the main issue, rather than the training itself. Copying has to take place somewhere for that data to exist. I’m no fan of the current IP regime but it seems like an obvious problem if you get caught making money with terabytes of content you don’t have a license for.
A lot of the griping about AI training involves data that’s been freely published. Stable Diffusion, for example, trained on public images available on the internet for anyone to view, but led to all manner of ill-informed public outrage. LLMs train on public forums and news sites. But people have this notion that copyright gives them some kind of absolute control over the stuff they “own” and they suddenly see a way to demand a pound of flesh for what they previously posted in public. It’s just not so.
I have the right to analyze what I see. I strongly oppose any move to restrict that right.
The problem with those things is that the viewer doesn’t need that license in order to analyze them. They can just refuse the license. Licenses don’t automatically apply, you have to accept them. And since they’re contracts they need to offer consideration, not just place restrictions.
An AI model is not a derivative work, it doesn’t include any identifiable pieces of the training data.
It’s also pretty clear they used a lot of books and other material they didn’t pay for, and obtained via illegal downloads. The practice of which I’m fine with, I just want it legalised for everyone.
“Exploiting copyrighted content” is an incredibly vague concept that is not illegal. Copyright is about distributing copies of copyrighted content.
If I am given a copyrighted book, there are plenty of ways that I can exploit that book that are not against copyright. I could make paper airplanes out of its pages. I could burn it for heat. I could even read it and learn from its contents. The one thing I can’t do is distribute copies of it.
It’s about making copies, not just distributing them, otherwise I wouldn’t be able to be bound by a software eula because I wouldn’t need a license to copy the content to my computer ram to run it.
The enforceability of EULAs varies with jurisdiction and with the actual contents of the EULA. It’s by no means a universally accepted thing.
It’s funny how suddenly large chunks of the Internet are cheering on EULAs and copyright enforcement by giant megacorporations because they’ve become convinced that AI is Satan.
In theory, could you then just register as an AI company and pirate anything?
Well no, just the largest ones who can pay some fine or have nearly endless legal funds to discourage challenges to their practice, this bring a form of a pretend business moat. The average company won’t be able to and will get shredded.
What fine? I thought this new law allows it. Or is it one of those instances where training your AI on copyrighted material and distributing it is fine but actually sourcing it isn‘t so you can‘t legally create a model but also nobody can do anything if you have and use it? That sounds legally very messy.
You’re assuming most of the commentors here are familiar with the legal technicalities instead of just spouting whatever uninformed opinion they have.
No, because training an AI is not “pirating.”
If they are training the AI with copyrighted data that they aren’t paying for, then yes, they are doing the same thing as traditional media piracy. While I think piracy laws have been grossly blown out of proportion by entities such as the RIAA and MPAA, these AI companies shouldn’t get a pass for doing what Joe Schmoe would get fined thousands of dollars for on a smaller scale.
In fact when you think about the way organizations like RIAA and MPAA like to calculate damages based on lost potential sales they pull out of thin air training an AI that might make up entire songs that compete with their existing set of songs should be even worse. (not that I want to encourage more of that kind of bullshit potential sales argument)
The act of copying the data without paying for it (assuming it’s something you need to pay for to get a copy of) is piracy, yes. But the training of an AI is not piracy because no copying takes place.
A lot of people have a very vague, nebulous concept of what copyright is all about. It isn’t a generalized “you should be able to get money whenever anyone does anything with something you thought of” law. It’s all about making and distributing copies of the data.
This isn’t quite correct either.
The reality is that there’s a bunch of court cases and laws still up in the air about what AI training counts as, and until those are resolved the most we can make is conjecture and vague moral posturing.
Closest we have is likely the court decisions on music sampling and so far those haven’t been consistent, and have mostly hinged on “intent” and “affect on original copy sales”. So based on that logic whether or not AI training counts as copyright infringement is likely going to come down to whether or not shit like “ghibli filters” actually provably (at least as far as a judge is concerned) fuck with Ghibli’s sales.
Where does the training data come from seems like the main issue, rather than the training itself. Copying has to take place somewhere for that data to exist. I’m no fan of the current IP regime but it seems like an obvious problem if you get caught making money with terabytes of content you don’t have a license for.
A lot of the griping about AI training involves data that’s been freely published. Stable Diffusion, for example, trained on public images available on the internet for anyone to view, but led to all manner of ill-informed public outrage. LLMs train on public forums and news sites. But people have this notion that copyright gives them some kind of absolute control over the stuff they “own” and they suddenly see a way to demand a pound of flesh for what they previously posted in public. It’s just not so.
I have the right to analyze what I see. I strongly oppose any move to restrict that right.
And what of the massive amount of content paywalled that ai still used to train?
If it’s paywalled how did they access it?
Publically available =/= freely published
Many images are made and published with anti AI licenses or are otherwise licensed in a way that requires attribution for derivative works.
The problem with those things is that the viewer doesn’t need that license in order to analyze them. They can just refuse the license. Licenses don’t automatically apply, you have to accept them. And since they’re contracts they need to offer consideration, not just place restrictions.
An AI model is not a derivative work, it doesn’t include any identifiable pieces of the training data.
It’s also pretty clear they used a lot of books and other material they didn’t pay for, and obtained via illegal downloads. The practice of which I’m fine with, I just want it legalised for everyone.
So streaming is fine but copying not
Streaming involves distributing copies so I don’t see why it would be. The law has been well tested in this area.
It’s exploiting copyrighted content without a licence, so, in short, it’s pirating.
“Exploiting copyrighted content” is an incredibly vague concept that is not illegal. Copyright is about distributing copies of copyrighted content.
If I am given a copyrighted book, there are plenty of ways that I can exploit that book that are not against copyright. I could make paper airplanes out of its pages. I could burn it for heat. I could even read it and learn from its contents. The one thing I can’t do is distribute copies of it.
It’s about making copies, not just distributing them, otherwise I wouldn’t be able to be bound by a software eula because I wouldn’t need a license to copy the content to my computer ram to run it.
The enforceability of EULAs varies with jurisdiction and with the actual contents of the EULA. It’s by no means a universally accepted thing.
It’s funny how suddenly large chunks of the Internet are cheering on EULAs and copyright enforcement by giant megacorporations because they’ve become convinced that AI is Satan.