14 votes

Getty Images is suing the creators of AI art tool Stable Diffusion for scraping its content

15 comments

  1. [13]
    AugustusFerdinand
    Link
    While the court arguments will not be as simple, the quotes in the article don't defend their position well. They compare it to Napster and Spotify, two services that never created the works they...

    While the court arguments will not be as simple, the quotes in the article don't defend their position well. They compare it to Napster and Spotify, two services that never created the works they were delivering. If scraping becomes something that requires agreement/licensing then every search engine on the planet becomes illegal overnight.

    5 votes
    1. [12]
      MimicSquid
      Link Parent
      That's exaggerating things more than a bit. Australia has had good results with forcing search engines to pay for the news they scrape from other organization's websites, despite doomsaying.

      That's exaggerating things more than a bit. Australia has had good results with forcing search engines to pay for the news they scrape from other organization's websites, despite doomsaying.

      2 votes
      1. [11]
        AugustusFerdinand
        Link Parent
        I wouldn't call a handful of Australia news companies getting paid by google while smaller publishers get stiffed (because their size doesn't allow them to negotiate) "good results". Also,...

        I wouldn't call a handful of Australia news companies getting paid by google while smaller publishers get stiffed (because their size doesn't allow them to negotiate) "good results".

        Also, google/facebook weren't (as I said earlier about AI) creating anything.
        I won't get into the argument over whether AI art is "art" or not, but no one can argue that it isn't actually creating anything. If I tell it to make something, it does, and that something has never existed until that moment. It's not delivering me a photo it found online that looks like it matches the prompt I requested, it's creating a photo based on what I tell it to do.

        4 votes
        1. [5]
          gpl
          Link Parent
          By that argument google also creates things — you give it a request and it creates a webpage to serve to you based on what you asked for. I don't think the two cases are as different as they seem...

          By that argument google also creates things — you give it a request and it creates a webpage to serve to you based on what you asked for. I don't think the two cases are as different as they seem at a basic level.

          1 vote
          1. Wes
            Link Parent
            Ah, it's not the page itself being considered though. It's the snippets which are exact quotes from the pages they link to. There's a lot more precedent to lean on with regards to fair use, which...

            Ah, it's not the page itself being considered though. It's the snippets which are exact quotes from the pages they link to. There's a lot more precedent to lean on with regards to fair use, which is yet unexplored with AI generative art.

            4 votes
          2. [2]
            stu2b50
            Link Parent
            I think it's interesting to think of the extremes with respect to parameterization with machine learning in this case. Take, for example, a very low dimensional model like a linear one. Now, a...

            I think it's interesting to think of the extremes with respect to parameterization with machine learning in this case. Take, for example, a very low dimensional model like a linear one. Now, a linear model trained on raw pixel data would just produce gibberous, so let's do some feature engineering. Let's say you extract bounding boxes of figures and then train a linear model to maximize separation - basically, you're going to derive the 1/3rd rule from images.

            I'd wager that most people would say that the resulting output of the model is fair game - if you get, say, a ratio of 0.23 and use that for compositions, it'd be hard to argue any kind of copyright or ethical violation derived from the training data, even if it was copyrighted.

            Now, the other extreme is a highly parameterized model. Let's say, a mlp with more parameters than a 512x512 image, trained on a dataset of ONLY a single 512x512 image. The model would simply output the image exactly. That's certainly a copyright violation - the model is basically just a shitty way to store an image in this case.

            The kind of model that generative "AI" uses is somewhere in-between those in terms of parameterization. So it's interesting problem to ponder where the boundary line between those two cases is for both legal and ethical cases.

            4 votes
            1. Toric
              Link Parent
              I tend to take an information-theory approach to it. Is the size of the model even capable of storing verbatim content? With stable diffusion, the awnser is no, there is no way that terabytes upon...

              I tend to take an information-theory approach to it. Is the size of the model even capable of storing verbatim content? With stable diffusion, the awnser is no, there is no way that terabytes upon terabytes of input data could have been compressed into the 4.something gigabytes of the stable diffusion model, and you can verify this by observing that stable diffusion has never produced a verbatim output.

              1 vote
          3. inwardpath
            (edited )
            Link Parent
            I think they're quite different at a basic level. Fundamentally so. There are very small, loose parallels between certain aspects of these two different things but I think those parallels are so...

            I think they're quite different at a basic level. Fundamentally so. There are very small, loose parallels between certain aspects of these two different things but I think those parallels are so insignificant as not to validate discussing them as if they're the same.

            Only in the absolute loosest of terms is Google "creating" a page based on what you ask- the output of your query (the page you see) isn't the product. The organization of the provided information may be dynamic or "created" to tailor it to the person searching but that organization of data is still not the end result/product. The data, links, and sources on it are. As much as Google wants to be a destination, it is a middleman. Even as a destination (maps, etc), that data already exists and can be seen by others- it is not a new unique just-generated piece of data no one else has access to. Google is clearly passing along information from other sources (as well as connecting you TO them) or returning non-unique, static (in a macro sense / at any given point in time) data as that is the entire search product's main output aside from advertising. Without existing data and existing websites that you reach directly, the search product is useless. If you ask it for something it doesn't have, it doesn't attempt to create something new for you in its place.

            The only existing data for AI art is the training material, which you cannot query directly when generating art. It is merely an underlying piece of infrastructure that informs the generative aspect of the system. AI art takes a given prompt and generates brand new, unique piece(s) of output, based on extensive amounts of training data that have influenced the generations. Even identical prompts, even by the same person, do not result in identical output. It is not passing along others' art to you nor is it linking you to those artists. It is "creating art", for lack of a better term, its output IS the desired product, IS generated at the time of request, IS unique, and IS difficult to reproduce exactly a second time. It is an endpoint, not a middleman.

            Querying an AI art generator is asking it to paint a painting, not asking it to find an existing painting.


            All of this said, I do not agree that generative AI should be able to amass large sets of training data without consent. I believe the vacuuming of data at large required to train AIs is a fundamental ethical issue with AI and one reason I have turned against it, so do not take the above as a defense of the "scraping". Scraped data isn't the output/product/point of generative AI, whereas it is fundamental (scraped data is the product) to search engines. The difference between the two systems means the ethics of each are different too, IMHO


            Addendum: Going back to original conversation that sparked these thoughts, do I think Google should be scraping data from newspapers and displaying it as if it's Google's own data? No. If Google is directly taking data from another site and giving it to you in full, so as to be the new destination for that data and purposely try to keep you away from the source of that data (without that source's consent or compensation), I also see that as an ethical issue. What it should do is make the original easiest to get to, with maybe a "teaser" amount of information. It should be driving traffic to the websites it gathers, not replacing them.

            2 votes
        2. [5]
          mundane_and_naive
          Link Parent
          I think this is one of the crucial points in this debate and it's not as clear cut as you make it sounds. While I agree that AI images are more original than mere remixes, they're not created from...

          no one can argue that it isn't actually creating anything

          I think this is one of the crucial points in this debate and it's not as clear cut as you make it sounds. While I agree that AI images are more original than mere remixes, they're not created from scratch either. Like the example in this video, he provided a generic description and the AI produced a nearly identical replica of a copyrighted work. It's likely that the produced image looked that specific way because the original photo was part of the training data. Hard to make the argument that the resemblance happened by pure chance.

          1 vote
          1. [4]
            AugustusFerdinand
            Link Parent
            I'm not saying it's by pure chance, nor do I think anyone is. If I tell it to create a picture of godzilla eating a multi-tiered wedding cake, it'll do so based on the training data of what...

            I'm not saying it's by pure chance, nor do I think anyone is.
            If I tell it to create a picture of godzilla eating a multi-tiered wedding cake, it'll do so based on the training data of what godzilla looks like and what a multi-tiered wedding cake looks like.
            The same thing would happen if I told a human to create a picture of godzilla eating a multi-tiered wedding cake. Should the human write the creator of godzilla and some random baker a check for knowing what they look like? Should the human have asked for permission to know what godzilla and cake look like first?

            If you google "afghani woman with green eyes" you pretty much only get that photo. It's almost as if you give AI instructions that match the most famous photograph in the world that has no equals, it provides a pretty much exactly what you asked for.

            2 votes
            1. [3]
              mundane_and_naive
              (edited )
              Link Parent
              Of course google would produce that image, it's a search engine. If the point is that the AI creates, shouldn't it not do that? The fact that training data is necessary for AI to do what it does...

              Of course google would produce that image, it's a search engine. If the point is that the AI creates, shouldn't it not do that? The fact that training data is necessary for AI to do what it does and is likely to resurface once the AI does what it does are not points in favor of its originality. Comparing to human taking references doesn't help in this case either because any human would also get sued if they were to profit off of putting godzilla in their creations.

              AI may seems like public goods now but the companies will have to charge money eventually, at which point copyright laws will have to apply, the question is how (profit sharing like Youtube or some non-commercial conditions) and to whom (the AI company or the person selling AI-derived contents).

              1 vote
              1. [2]
                AugustusFerdinand
                Link Parent
                AI creates, no one said it wasn't derivative, just as all art is derivative. Google returned that result because it is not just an Afghan girl with green eyes, it's the Afghan girl with green...

                AI creates, no one said it wasn't derivative, just as all art is derivative.
                Google returned that result because it is not just an Afghan girl with green eyes, it's the Afghan girl with green eyes. The guy in your video didn't input a "generic description" he asked for a very specific picture of a very specific girl. "Afghan girl with green eyes" is as specific a request as typing "Brad Pitt" into the AI. A human using godzilla doesn't automatically get sued either, even if they use it in a commercial sense, as godzilla eating a wedding cake would more than likely be deemed transformative and therefore fair use.

                I'm not convinced that copyright laws will have to apply. If it does it should apply to the person selling the works, but that's a huge mess that'll be largely unenforceable. It will end up applying to the AI company who will broker deals with a few large sites (Getty, DeviantArt, etc) that'll get a kickback and just put some boilerplate in their TOS that they get all the money from it so artists are still not getting paid.

                2 votes
                1. mundane_and_naive
                  (edited )
                  Link Parent
                  I don't agree with your assertion that "Afghan girl with green eyes" is as specific as "Brad Pitt" but the levels of specificity is beside the point. My point is that it can produce a replica (or...

                  I don't agree with your assertion that "Afghan girl with green eyes" is as specific as "Brad Pitt" but the levels of specificity is beside the point.

                  My point is that it can produce a replica (or something with enough resemblance to be mistaken for the original) if the users request. In those cases, does the AI function more like: (1) a search engine (provide links to original contents), (2) a file sharer (provide copies of original contents), (3) a creator who plagiarized or (4) a creator who just happen to stumble upon the same ideas?

                  It would be hard to argue for the 4th case if we can point to the original photos being parts of the training data. The 2nd and 3rd cases would be subjected to copyright laws (although in different ways) and the 4th could pass if the AI company can prove that the infringed materials were not part of the training data (the way a human has to show evidence that they never came across the infringed materials throughout their creation process). Though in all 3 cases permissions from the original right holders have to be respected so maybe the distinction of whether AI creates doesn't matter as much as I first thought. Maybe the 1st case is free from copyright considerations but I think we can agree that AIs are not this.

  2. Fiachra
    Link
    Sounds about right. I doubt they can argue the images aren't being put to commercial use, when the AI has already started competing in the same market.

    Sounds about right. I doubt they can argue the images aren't being put to commercial use, when the AI has already started competing in the same market.

    4 votes
  3. DonQuixote
    Link
    In the end, when the final bell is rung, and our entire civilization has collapsed into pathetic confused chaos, a lawyer will be the last one to emerge from the rubble. And it will be a lawyer bot.

    In the end, when the final bell is rung, and our entire civilization has collapsed into pathetic confused chaos, a lawyer will be the last one to emerge from the rubble.

    And it will be a lawyer bot.

    1 vote