While the court arguments will not be as simple, the quotes in the article don't defend their position well. They compare it to Napster and Spotify, two services that never created the works they...
While the court arguments will not be as simple, the quotes in the article don't defend their position well. They compare it to Napster and Spotify, two services that never created the works they were delivering. If scraping becomes something that requires agreement/licensing then every search engine on the planet becomes illegal overnight.
That's exaggerating things more than a bit. Australia has had good results with forcing search engines to pay for the news they scrape from other organization's websites, despite doomsaying.
That's exaggerating things more than a bit. Australia has had good results with forcing search engines to pay for the news they scrape from other organization's websites, despite doomsaying.
I wouldn't call a handful of Australia news companies getting paid by google while smaller publishers get stiffed (because their size doesn't allow them to negotiate) "good results". Also,...
I wouldn't call a handful of Australia news companies getting paid by google while smaller publishers get stiffed (because their size doesn't allow them to negotiate) "good results".
Also, google/facebook weren't (as I said earlier about AI) creating anything.
I won't get into the argument over whether AI art is "art" or not, but no one can argue that it isn't actually creating anything. If I tell it to make something, it does, and that something has never existed until that moment. It's not delivering me a photo it found online that looks like it matches the prompt I requested, it's creating a photo based on what I tell it to do.
By that argument google also creates things — you give it a request and it creates a webpage to serve to you based on what you asked for. I don't think the two cases are as different as they seem...
By that argument google also creates things — you give it a request and it creates a webpage to serve to you based on what you asked for. I don't think the two cases are as different as they seem at a basic level.
Ah, it's not the page itself being considered though. It's the snippets which are exact quotes from the pages they link to. There's a lot more precedent to lean on with regards to fair use, which...
Ah, it's not the page itself being considered though. It's the snippets which are exact quotes from the pages they link to. There's a lot more precedent to lean on with regards to fair use, which is yet unexplored with AI generative art.
I think it's interesting to think of the extremes with respect to parameterization with machine learning in this case. Take, for example, a very low dimensional model like a linear one. Now, a...
I think it's interesting to think of the extremes with respect to parameterization with machine learning in this case. Take, for example, a very low dimensional model like a linear one. Now, a linear model trained on raw pixel data would just produce gibberous, so let's do some feature engineering. Let's say you extract bounding boxes of figures and then train a linear model to maximize separation - basically, you're going to derive the 1/3rd rule from images.
I'd wager that most people would say that the resulting output of the model is fair game - if you get, say, a ratio of 0.23 and use that for compositions, it'd be hard to argue any kind of copyright or ethical violation derived from the training data, even if it was copyrighted.
Now, the other extreme is a highly parameterized model. Let's say, a mlp with more parameters than a 512x512 image, trained on a dataset of ONLY a single 512x512 image. The model would simply output the image exactly. That's certainly a copyright violation - the model is basically just a shitty way to store an image in this case.
The kind of model that generative "AI" uses is somewhere in-between those in terms of parameterization. So it's interesting problem to ponder where the boundary line between those two cases is for both legal and ethical cases.
I tend to take an information-theory approach to it. Is the size of the model even capable of storing verbatim content? With stable diffusion, the awnser is no, there is no way that terabytes upon...
I tend to take an information-theory approach to it. Is the size of the model even capable of storing verbatim content? With stable diffusion, the awnser is no, there is no way that terabytes upon terabytes of input data could have been compressed into the 4.something gigabytes of the stable diffusion model, and you can verify this by observing that stable diffusion has never produced a verbatim output.
I think they're quite different at a basic level. Fundamentally so. There are very small, loose parallels between certain aspects of these two different things but I think those parallels are so...
I think they're quite different at a basic level. Fundamentally so. There are very small, loose parallels between certain aspects of these two different things but I think those parallels are so insignificant as not to validate discussing them as if they're the same.
Only in the absolute loosest of terms is Google "creating" a page based on what you ask- the output of your query (the page you see) isn't the product. The organization of the provided information may be dynamic or "created" to tailor it to the person searching but that organization of data is still not the end result/product. The data, links, and sources on it are. As much as Google wants to be a destination, it is a middleman. Even as a destination (maps, etc), that data already exists and can be seen by others- it is not a new unique just-generated piece of data no one else has access to. Google is clearly passing along information from other sources (as well as connecting you TO them) or returning non-unique, static (in a macro sense / at any given point in time) data as that is the entire search product's main output aside from advertising. Without existing data and existing websites that you reach directly, the search product is useless. If you ask it for something it doesn't have, it doesn't attempt to create something new for you in its place.
The only existing data for AI art is the training material, which you cannot query directly when generating art. It is merely an underlying piece of infrastructure that informs the generative aspect of the system. AI art takes a given prompt and generates brand new, unique piece(s) of output, based on extensive amounts of training data that have influenced the generations. Even identical prompts, even by the same person, do not result in identical output. It is not passing along others' art to you nor is it linking you to those artists. It is "creating art", for lack of a better term, its output IS the desired product, IS generated at the time of request, IS unique, and IS difficult to reproduce exactly a second time. It is an endpoint, not a middleman.
Querying an AI art generator is asking it to paint a painting, not asking it to find an existing painting.
All of this said, I do not agree that generative AI should be able to amass large sets of training data without consent. I believe the vacuuming of data at large required to train AIs is a fundamental ethical issue with AI and one reason I have turned against it, so do not take the above as a defense of the "scraping". Scraped data isn't the output/product/point of generative AI, whereas it is fundamental (scraped data is the product) to search engines. The difference between the two systems means the ethics of each are different too, IMHO
Addendum: Going back to original conversation that sparked these thoughts, do I think Google should be scraping data from newspapers and displaying it as if it's Google's own data? No. If Google is directly taking data from another site and giving it to you in full, so as to be the new destination for that data and purposely try to keep you away from the source of that data (without that source's consent or compensation), I also see that as an ethical issue. What it should do is make the original easiest to get to, with maybe a "teaser" amount of information. It should be driving traffic to the websites it gathers, not replacing them.
I'm not saying it's by pure chance, nor do I think anyone is. If I tell it to create a picture of godzilla eating a multi-tiered wedding cake, it'll do so based on the training data of what...
I'm not saying it's by pure chance, nor do I think anyone is.
If I tell it to create a picture of godzilla eating a multi-tiered wedding cake, it'll do so based on the training data of what godzilla looks like and what a multi-tiered wedding cake looks like.
The same thing would happen if I told a human to create a picture of godzilla eating a multi-tiered wedding cake. Should the human write the creator of godzilla and some random baker a check for knowing what they look like? Should the human have asked for permission to know what godzilla and cake look like first?
If you google "afghani woman with green eyes" you pretty much only get that photo. It's almost as if you give AI instructions that match the most famous photograph in the world that has no equals, it provides a pretty much exactly what you asked for.
AI creates, no one said it wasn't derivative, just as all art is derivative. Google returned that result because it is not just an Afghan girl with green eyes, it's the Afghan girl with green...
AI creates, no one said it wasn't derivative, just as all art is derivative.
Google returned that result because it is not just an Afghan girl with green eyes, it's the Afghan girl with green eyes. The guy in your video didn't input a "generic description" he asked for a very specific picture of a very specific girl. "Afghan girl with green eyes" is as specific a request as typing "Brad Pitt" into the AI. A human using godzilla doesn't automatically get sued either, even if they use it in a commercial sense, as godzilla eating a wedding cake would more than likely be deemed transformative and therefore fair use.
I'm not convinced that copyright laws will have to apply. If it does it should apply to the person selling the works, but that's a huge mess that'll be largely unenforceable. It will end up applying to the AI company who will broker deals with a few large sites (Getty, DeviantArt, etc) that'll get a kickback and just put some boilerplate in their TOS that they get all the money from it so artists are still not getting paid.
Sounds about right. I doubt they can argue the images aren't being put to commercial use, when the AI has already started competing in the same market.
Sounds about right. I doubt they can argue the images aren't being put to commercial use, when the AI has already started competing in the same market.
In the end, when the final bell is rung, and our entire civilization has collapsed into pathetic confused chaos, a lawyer will be the last one to emerge from the rubble. And it will be a lawyer bot.
In the end, when the final bell is rung, and our entire civilization has collapsed into pathetic confused chaos, a lawyer will be the last one to emerge from the rubble.
While the court arguments will not be as simple, the quotes in the article don't defend their position well. They compare it to Napster and Spotify, two services that never created the works they were delivering. If scraping becomes something that requires agreement/licensing then every search engine on the planet becomes illegal overnight.
That's exaggerating things more than a bit. Australia has had good results with forcing search engines to pay for the news they scrape from other organization's websites, despite doomsaying.
I wouldn't call a handful of Australia news companies getting paid by google while smaller publishers get stiffed (because their size doesn't allow them to negotiate) "good results".
Also, google/facebook weren't (as I said earlier about AI) creating anything.
I won't get into the argument over whether AI art is "art" or not, but no one can argue that it isn't actually creating anything. If I tell it to make something, it does, and that something has never existed until that moment. It's not delivering me a photo it found online that looks like it matches the prompt I requested, it's creating a photo based on what I tell it to do.
By that argument google also creates things — you give it a request and it creates a webpage to serve to you based on what you asked for. I don't think the two cases are as different as they seem at a basic level.
Ah, it's not the page itself being considered though. It's the snippets which are exact quotes from the pages they link to. There's a lot more precedent to lean on with regards to fair use, which is yet unexplored with AI generative art.
I think it's interesting to think of the extremes with respect to parameterization with machine learning in this case. Take, for example, a very low dimensional model like a linear one. Now, a linear model trained on raw pixel data would just produce gibberous, so let's do some feature engineering. Let's say you extract bounding boxes of figures and then train a linear model to maximize separation - basically, you're going to derive the 1/3rd rule from images.
I'd wager that most people would say that the resulting output of the model is fair game - if you get, say, a ratio of 0.23 and use that for compositions, it'd be hard to argue any kind of copyright or ethical violation derived from the training data, even if it was copyrighted.
Now, the other extreme is a highly parameterized model. Let's say, a mlp with more parameters than a 512x512 image, trained on a dataset of ONLY a single 512x512 image. The model would simply output the image exactly. That's certainly a copyright violation - the model is basically just a shitty way to store an image in this case.
The kind of model that generative "AI" uses is somewhere in-between those in terms of parameterization. So it's interesting problem to ponder where the boundary line between those two cases is for both legal and ethical cases.
I tend to take an information-theory approach to it. Is the size of the model even capable of storing verbatim content? With stable diffusion, the awnser is no, there is no way that terabytes upon terabytes of input data could have been compressed into the 4.something gigabytes of the stable diffusion model, and you can verify this by observing that stable diffusion has never produced a verbatim output.
I think they're quite different at a basic level. Fundamentally so. There are very small, loose parallels between certain aspects of these two different things but I think those parallels are so insignificant as not to validate discussing them as if they're the same.
Only in the absolute loosest of terms is Google "creating" a page based on what you ask- the output of your query (the page you see) isn't the product. The organization of the provided information may be dynamic or "created" to tailor it to the person searching but that organization of data is still not the end result/product. The data, links, and sources on it are. As much as Google wants to be a destination, it is a middleman. Even as a destination (maps, etc), that data already exists and can be seen by others- it is not a new unique just-generated piece of data no one else has access to. Google is clearly passing along information from other sources (as well as connecting you TO them) or returning non-unique, static (in a macro sense / at any given point in time) data as that is the entire search product's main output aside from advertising. Without existing data and existing websites that you reach directly, the search product is useless. If you ask it for something it doesn't have, it doesn't attempt to create something new for you in its place.
The only existing data for AI art is the training material, which you cannot query directly when generating art. It is merely an underlying piece of infrastructure that informs the generative aspect of the system. AI art takes a given prompt and generates brand new, unique piece(s) of output, based on extensive amounts of training data that have influenced the generations. Even identical prompts, even by the same person, do not result in identical output. It is not passing along others' art to you nor is it linking you to those artists. It is "creating art", for lack of a better term, its output IS the desired product, IS generated at the time of request, IS unique, and IS difficult to reproduce exactly a second time. It is an endpoint, not a middleman.
Querying an AI art generator is asking it to paint a painting, not asking it to find an existing painting.
All of this said, I do not agree that generative AI should be able to amass large sets of training data without consent. I believe the vacuuming of data at large required to train AIs is a fundamental ethical issue with AI and one reason I have turned against it, so do not take the above as a defense of the "scraping". Scraped data isn't the output/product/point of generative AI, whereas it is fundamental (scraped data is the product) to search engines. The difference between the two systems means the ethics of each are different too, IMHO
Addendum: Going back to original conversation that sparked these thoughts, do I think Google should be scraping data from newspapers and displaying it as if it's Google's own data? No. If Google is directly taking data from another site and giving it to you in full, so as to be the new destination for that data and purposely try to keep you away from the source of that data (without that source's consent or compensation), I also see that as an ethical issue. What it should do is make the original easiest to get to, with maybe a "teaser" amount of information. It should be driving traffic to the websites it gathers, not replacing them.
I'm not saying it's by pure chance, nor do I think anyone is.
If I tell it to create a picture of godzilla eating a multi-tiered wedding cake, it'll do so based on the training data of what godzilla looks like and what a multi-tiered wedding cake looks like.
The same thing would happen if I told a human to create a picture of godzilla eating a multi-tiered wedding cake. Should the human write the creator of godzilla and some random baker a check for knowing what they look like? Should the human have asked for permission to know what godzilla and cake look like first?
If you google "afghani woman with green eyes" you pretty much only get that photo. It's almost as if you give AI instructions that match the most famous photograph in the world that has no equals, it provides a pretty much exactly what you asked for.
AI creates, no one said it wasn't derivative, just as all art is derivative.
Google returned that result because it is not just an Afghan girl with green eyes, it's the Afghan girl with green eyes. The guy in your video didn't input a "generic description" he asked for a very specific picture of a very specific girl. "Afghan girl with green eyes" is as specific a request as typing "Brad Pitt" into the AI. A human using godzilla doesn't automatically get sued either, even if they use it in a commercial sense, as godzilla eating a wedding cake would more than likely be deemed transformative and therefore fair use.
I'm not convinced that copyright laws will have to apply. If it does it should apply to the person selling the works, but that's a huge mess that'll be largely unenforceable. It will end up applying to the AI company who will broker deals with a few large sites (Getty, DeviantArt, etc) that'll get a kickback and just put some boilerplate in their TOS that they get all the money from it so artists are still not getting paid.
Sounds about right. I doubt they can argue the images aren't being put to commercial use, when the AI has already started competing in the same market.
In the end, when the final bell is rung, and our entire civilization has collapsed into pathetic confused chaos, a lawyer will be the last one to emerge from the rubble.
And it will be a lawyer bot.