56 votes

Meet Nightshade, the new tool allowing artists to ‘poison’ AI models with corrupted training data

28 comments

  1. XanIves
    Link
    I can see a future where this paper's method is integrated into most digital art tools, photoshop/Krita/gimp/the apple thing, and there's a simple toggle to enable it ON or OFF when you export an...

    I can see a future where this paper's method is integrated into most digital art tools, photoshop/Krita/gimp/the apple thing, and there's a simple toggle to enable it ON or OFF when you export an image.

    Poisoning the vast majority of content online would seem to be the only ethical way to resolve this issue of "art theft" in training on art not intended for AI training by the created artist, forcing anyone looking to train a model to actually pay for access to un-poisoned content from creators in aggregate somehow.

    It'd be nice to be able to share art for the enrichment of human experience, without giving away the ownership of the art when it comes to training. I know copyright might be figured out someday, but there's a difference between something being illegal to steal and something being impossible to steal.

    35 votes
  2. [21]
    MyriadBlue
    Link
    What is the difference between an ai training on someone's art and a human training on someone's art? I have a friend that studied frank frazetta to figure out how he did his style and for a...

    What is the difference between an ai training on someone's art and a human training on someone's art?

    I have a friend that studied frank frazetta to figure out how he did his style and for a period of time his work could have been paintings by frazetta.

    What is the between humans learning to copy art styles and ai?

    24 votes
    1. [8]
      Fiachra
      (edited )
      Link Parent
      Maybe I'm growing cynical but lately I find this question to nothing but a reliable way to just derail discussions of AI. It very deftly draws two very complex topics - neural networks and the...

      Maybe I'm growing cynical but lately I find this question to nothing but a reliable way to just derail discussions of AI. It very deftly draws two very complex topics - neural networks and the human brain - into the conversation and places a burden of proof on critics to 'prove' a philosophical distinction between human learning and machine learning, basically abstracting it to the point where nobody can do anything but state their opinion, shrug and walk away.

      This thread is a good example because your question is irrelevant to the article in my opinion. No matter what answer your question might get, it doesn't change the fact that an artist has an absolute right to make their art however they please, including applying anti-AI filters. If a painter deliberately ran their work through a filter to stop human eyes from studying how it was made, smoothing out brush strokes etc, or even mislead students into adopting bad habits, that's still ethically their right as an artist.

      The popular grievence over AI is not abstract, it's practical and legal - an algorithm cannot own copyright or be paid for the work it produces, regardless of how similar it may or may not be to the human mind. So who does deserve to profit from AI art if the AI can't? The artists whose years of work provided the training data that made it all possible, or a person who downloaded the mass-produce-art machine and pressed a button? This is a much more answerable question.

      45 votes
      1. [7]
        Greg
        Link Parent
        I was with you at the start, but that question at the end seems like the exact kind of oversimplification you’re objecting to. How do we quantify the contribution of any given one of the five...

        I was with you at the start, but that question at the end seems like the exact kind of oversimplification you’re objecting to.

        How do we quantify the contribution of any given one of the five billion images in a training set to a specific output? If we can somehow answer that mathematically, does that tiny fractional contribution to the output balance meaningfully against the effort put in by the person who thought up the prompt? What about the contribution of the researchers who worked on the underlying theory, the developers who implemented it, the engineers who designed the chips it runs on? None of these are clear cut or simple, and that’s just scratching the surface of the practical considerations in answering what you asked, without even touching on the different moral philosophies that could come into play.

        Even the “easier” questions rapidly spiral. Let’s take a seemingly more straightforward start: a prompt for an image in the style of a certain artist. What if none of that artist’s work was ever in the training set? Maybe they’re popular online and there’s enough Creative Commons fan art out there that the machine learned to emulate their style anyway. Maybe someone commissioned a human artist to create a set of style samples to fine tune on. Are they deserving simply for having inspired other humans who then made data for the machine? Does machine-assisted use of “their” style occupy a different moral or legal space to human use of it? For that matter, what was the moral view of other humans using that style? Because even on that topic opinions vary pretty widely, despite legally almost always being in the clear.

        It’s an emotionally charged topic because it will alter the market for creative work. But that doesn’t make it an easy answer in favour of the status quo - that just means there’s incentive on both sides to think there’s an easy answer when there really isn’t.

        12 votes
        1. [6]
          Fiachra
          Link Parent
          You are in fact doing the exact thing I'm objecting to: you're making a straightforward intellectual property issue sound like an unsolvable epistemological one. All of these questions are...

          You are in fact doing the exact thing I'm objecting to: you're making a straightforward intellectual property issue sound like an unsolvable epistemological one.

          How do we quantify the contribution of any given one of the five billion images in a training set to a specific output?

          the effort put in by the person who thought up the prompt?

          What about the contribution of the researchers who worked on the underlying theory, the developers who implemented it, the engineers who designed the chips it runs on?

          All of these questions are intended to make the problem sound complicated and intractable problem, when the reality is all of these would be settled when the licensing terms are negotiated between copyright holders and the companies building the algorithms, as they should be, because generative AI is a commercial product and training it is a commerical activity. In any case, I think even AI boosters will agree that the contribution of artists isn't zero, which is currently what they're getting, so I hope you can agree that they are getting short changed here.

          Even the “easier” questions rapidly spiral. Let’s take a seemingly more straightforward start: a prompt for an image in the style of a certain artist. What if none of that artist’s work was ever in the training set? Maybe they’re popular online and there’s enough Creative Commons fan art out there that the machine learned to emulate their style anyway. Maybe someone commissioned a human artist to create a set of style samples to fine tune on. Are they deserving simply for having inspired other humans who then made data for the machine?

          Yet again, these apparently "rapidly spiraling" questions are answered by a 16th-century concept called copyright. Generally, you cannot sue a person for copyright infringement if they just painted a new picture inspired by your own style. You can sue a person for copyright infringement if they use your art for commercial purposes without permission, for example if they took your art and used it to train an algorithm that earns money in some way.

          Does machine-assisted use of “their” style occupy a different moral or legal space to human use of it?

          Straightforwardly yes, humans have legal personhood and algorithms don't. That is a different legal space. Humans own the art they produce while an algorithm's output is the collective effort of multiple parties that can each claim an unquantifiable fraction of the credit. They could agree a split of the credit/profit amongst themselves, but the current status quo is that artists have no ability to negotiate because their work is being taken without their consent and with no compensation.

          15 votes
          1. [5]
            Greg
            Link Parent
            I think the moral question is far more important than the legal one because the moral question is what should, at least in an ideal world, drive the conversation around the law and any potential...

            I think the moral question is far more important than the legal one because the moral question is what should, at least in an ideal world, drive the conversation around the law and any potential changes needed. Even as it stands copyright and IP law isn't simple, or consistent, or arguably even fit for purpose in a lot of situations - and since this new technology does interact with IP law in unforeseen ways, we're already seeing cases start to move through the system that will set new precedents around it.

            If you're approaching it from the angle that it's a straightforward IP issue and that current law covers it, training an ML model is not meaningfully different to building a search engine index. Google's crawler pulls down publicly available data, runs algorithms on that data to derive metadata about it for commercial purposes, and makes an index of that derived metadata accessible as a core part of their product. People have sued for this, but on pretty much all the important cases the precedent set was in the search engines' favour. I genuinely don't see a technical or legal distinction to be made between search indexing and ML training here?

            If we're moving into "but the implications of generative algorithms are a whole different world to search", then yes, I agree - but that means we have no choice but to get into the moral and philosophical weeds of why they're different, because the distinction only meaningfully exists as part of the wider context, not in the actual technical actions being performed.

            9 votes
            1. [4]
              Fiachra
              Link Parent
              Those abstract moral questions don't have definitive answers as they apply to human artists, despite millenia of philosophy. That didn't stop anyone from setting laws in response to material...

              I think the moral question is far more important than the legal one because the moral question is what should, at least in an ideal world, drive the conversation around the law

              Those abstract moral questions don't have definitive answers as they apply to human artists, despite millenia of philosophy. That didn't stop anyone from setting laws in response to material problems. Yes, those copyright law have major flaws, but... on what basis can you say that? Do you know from some complete moral framework of ownership or do you know because you can observe the material effects of those flaws? (Rhetorical question)

              Materially, every party to an AI's output has the ability to profit from it and has the freedom to not take part, except the owners of the training data. Unless your moral position can definitively say that an artist's contribution and rights are literally zero, you should at least be supporting some basic legal right for the artists to opt out of being used in training data. "AIs learn just like humans do" is not an answer to that, because that does not justify why the developer and prompt writer can profit but not the artist.

              Literally as soon as everyone starts training on public domain images I will listen to the abstract moral philosophy all the live-long day.

              4 votes
              1. [3]
                Greg
                Link Parent
                I get what you're saying, and I think we're actually in agreement on a lot of things, I just don't see how those things add up to give a single definitive answer. I don't think "AI learns like...

                I get what you're saying, and I think we're actually in agreement on a lot of things, I just don't see how those things add up to give a single definitive answer.

                I don't think "AI learns like humans" is a complete or particularly useful statement, I meant it when I said I was with you on the first part of that first comment about it, and your points here about profit and freedom are genuinely interesting ones! I also haven't said anything specific about what I think should be the case - all I'm really saying is that we need to discuss the whole complex issue, look at how it compares to equivalent technical systems and economic situations, explore the angles to at least some extent before saying "yup, the answer is definitely X, nothing more to discuss".

                I think we're using "moral" and "material" to mean roughly the same thing - because yeah, you're absolutely right, there will never be a complete or indisputable moral framework for any of this, so we cut through as best we can. My point in bringing it up was that "legal" alone isn't sufficient: copyright law wasn't written with this technology in mind, because it simply didn't exist.

                If you disagree at that stage, and you're saying that the current legal framework is all we need to encompass those moral/material answers, it leads to the exact opposite of the conclusion you're suggesting: ML training is technically equivalent to search indexing, and indexing copyrighted material is legal.

                If you do want to define an opt-out, beyond that of not publishing the work at all, then we have to explore how and why ML training differs from other web scraping. There can absolutely be a reasonable middle ground for the discussion between totally abstract philosophy of mind and totally definitive legalism, but there does need to be some discussion in that scenario.

                4 votes
                1. [2]
                  vetch
                  Link Parent
                  I think perhaps to better frame this issue we should discuss who is actually making out well from the current art-for-profit status quo: large organisations with the power to lobby law-making...

                  I think perhaps to better frame this issue we should discuss who is actually making out well from the current art-for-profit status quo: large organisations with the power to lobby law-making bodies and ignore legal challenges from individual artists, or indeed, many other smaller organisations, artistic or otherwise.

                  I think this is part of the point that Fiachra is trying to make, that while we are busy 'discussing' the heady moral and economic intersection of art and technology, large organisations are stealing art from thousands upon thousands of artists to throw at an algorithm that will in turn make them money. They could have taken only from the public domain, they could have (and in some rare cases have actually) paid artists to produce work for them and the LLM specifically. But they didn't limit themselves to these inarguably more ethical options. Instead they have used their money, lobbying, and strong legal teams to steal every piece of art they can get their hands on under the guise of fair use, and put it to work as an essential element of creating their commercial product.

                  Why should artists have to 'opt-out' from having their work used by a large organisation to make money? Surely, if artists really wanted their work used for this purpose, they would opt-in, don't you think?

                  Let's look at the inverse example: the recorded music industry.

                  If I sample the tiniest snippet of any song released by a major label to use in my work, even to use just one cool little sound in the background of a track, even if I transform it by slowing it down or speeding it up or running it through some kind of effect, even if it goes unnoticed for years, as soon as it makes makes any money, a major label can claim my work as belonging to them, either in part or more often than you would reasonably expect, wholly. No fair use, no justification that the samples use was transformative.
                  In their view, it's art that belongs to the artist (as long as the artists work belongs to a large label) regardless of any kind of use barring - maybe - parody.

                  Here again, individual artists with comparatively small resources are punished by a political and legal system that, regardless of the moral or ethical discussion, will invariable choose the side of the largest wallet.

                  Finally, I can't agree that search engine indexing has anything much to do with this discussion. It collects and copies public data including text and images from across the internet, yes, but these texts and images are not used to create text and image facsimiles for profit.
                  Often, this indexed content is not considered valuable to those it is being collected from. Web stores, of course would rather encourage favourable indexing. However, when a search engine like Google does index and present data from a source that considers it's content to be the same as it's product, and the source is large enough, it will often pay them for the privilege. See this article about the deal between Google and the AP.

                  3 votes
                  1. blindmikey
                    Link Parent
                    I think this is where your not seeing eye to eye: I would not say learning is theft. I think this is one of the most largest divides between your take and the opposing take. Your take is that...

                    I think this is where your not seeing eye to eye:

                    large organisations are stealing art from thousands upon thousands of artists to throw at an algorithm that will in turn make them money.

                    I would not say learning is theft. I think this is one of the most largest divides between your take and the opposing take. Your take is that humans learning from this material is special and not comparable, but many people think LLMs are a lot closer to us than our ego's would like to admit, and will only get closer.

                    However if those corporations start producing works for profit that are too identical to an artist's style, without licensing, I'd wager these two camps would agree and fight against that together.

                    1 vote
    2. [7]
      raccoona_nongrata
      Link Parent
      An AI is contributing zero of their own experiences or self into that art. In the same way that an AI can't watch a movie and have their own unique feelings and perception about it, these LLM are...

      An AI is contributing zero of their own experiences or self into that art. In the same way that an AI can't watch a movie and have their own unique feelings and perception about it, these LLM are 100% reliant in the labor of human artists.

      17 votes
      1. [3]
        blindmikey
        Link Parent
        Just as a human cannot have an original thought that isn't a hodgepodge of previous input, neither can AI. But just as humans can introduce that hodgepodge into their work, so can AI. So yes, an...

        Just as a human cannot have an original thought that isn't a hodgepodge of previous input, neither can AI. But just as humans can introduce that hodgepodge into their work, so can AI. So yes, an AI can absolutely bring it's past input into it's produced art. The difference between humans and AI will continue to shrink, and this line of argument will continue to weaken as a result.

        18 votes
        1. [2]
          raccoona_nongrata
          Link Parent
          Not really the same thing, our feelings and impressions are our own. LLMs are not conscious, they don't feel or intuit.

          Not really the same thing, our feelings and impressions are our own. LLMs are not conscious, they don't feel or intuit.

          13 votes
          1. Greg
            Link Parent
            Equally, though, the subtext to all this is of making money from creative works - because nobody’s going to stop anyone from creating or consuming work for the sake of pure expression and...

            Equally, though, the subtext to all this is of making money from creative works - because nobody’s going to stop anyone from creating or consuming work for the sake of pure expression and communication, but lots of people are understandably uncertain about how this tech is going to change the industry they work in. The conversation always veers to the philosophical basis of creativity, but for every paid work that rests on conscious feeling and unique expression, there are a thousand more that were just done to spec to fill a business need.

            4 votes
      2. [3]
        owyn_merrilin
        Link Parent
        That presumes some things about humans that we really shouldn't be so confident about. Everything is a remix, and has been probably since before our species even existed -- art and storytelling...

        That presumes some things about humans that we really shouldn't be so confident about. Everything is a remix, and has been probably since before our species even existed -- art and storytelling being something we should have in common with at least some of the extinct species of hominin. Creativity isn't as magical as we like to think.

        17 votes
        1. [2]
          raccoona_nongrata
          Link Parent
          In the context of a human consciousness and an LLM we can draw the distinction pretty confidently. Yes, everything in the universe is probably ultimately "billard balls" pinging around in highly...

          In the context of a human consciousness and an LLM we can draw the distinction pretty confidently. Yes, everything in the universe is probably ultimately "billard balls" pinging around in highly complex patterns and reactions stemming from the Big Bang, but that isn't a practical scale to use for resolving human affairs.

          An LLM is not like a human mind, it does not learn or create the same.

          8 votes
          1. owyn_merrilin
            Link Parent
            An LLM is a lot more like a human mind than you're giving it credit for. We're both just really complicated pattern recognition machines. There's nothing magical about the human brain, and it's...

            An LLM is a lot more like a human mind than you're giving it credit for. We're both just really complicated pattern recognition machines. There's nothing magical about the human brain, and it's really concerning how many people who really should know better seem to think there is.

            8 votes
    3. [3]
      blindmikey
      Link Parent
      Thank you for this question. It needs to be asked. People will wax poetic about humans being special, but there's really only small differences and one day they'll be no difference. These kinds of...

      Thank you for this question. It needs to be asked. People will wax poetic about humans being special, but there's really only small differences and one day they'll be no difference. These kinds of threads just argue the inevitable unless we get rid of AI all together. Even this tool will be rendered useless eventually. IMHO we should make works produced illegal if it would be illegal for a human to produce.

      9 votes
      1. [2]
        UniquelyGeneric
        Link Parent
        This does not consider the absurd copyright laws that go beyond the lifespan of a human. Mickey Mouse will enter the public domain 95 years after being created. Walt Disney will have been dead for...

        we should make works produced illegal if it would be illegal for a human to produce

        This does not consider the absurd copyright laws that go beyond the lifespan of a human. Mickey Mouse will enter the public domain 95 years after being created. Walt Disney will have been dead for 58 years, and even then the copyright only expires for the version of Mickey from Steamboat Willy. Disney himself skirted copyright laws by modifying a character he already created, but didn’t own the rights to.

        Up until recently, recording Happy Birthday in the US was an unauthorized reproduction of a copyrighted song (which was determined to be an invalid copyright in the first place). The song itself “borrowed” the melody from an even older song.

        If we want to be serious about artists securing benefits from their creative works, we need to tackle copyright law head-on. The system as it is currently set up is biased towards non-human ownership (e.g. corporations), and AI is merely exacerbating the existing flaws in its design.

        15 votes
        1. blindmikey
          Link Parent
          I whole heartedly agree with this, our copyright laws badly need revision.

          I whole heartedly agree with this, our copyright laws badly need revision.

          1 vote
    4. winther
      Link Parent
      Humans do more than just look at art to make new art. We have a whole life worth of experiences, emotions, thoughts, dreams etc that influence what we do. Of course an argument could be made that...

      Humans do more than just look at art to make new art. We have a whole life worth of experiences, emotions, thoughts, dreams etc that influence what we do.

      Of course an argument could be made that AI training data is getting there with ever more wast amounts of data. Even some experiments with training models with sight, sound and smell.

      In the end it becomes quite philosophical whether human experiences can be reduced to nothing but data processing, but even so the current AI models are definitely not close to that.

      6 votes
    5. Roundcat
      Link Parent
      Your friend is piggybacking off a style to enrich their own ability. This will lead to your friend probably producing something original with it. An Ai is stealing art on the behalf of someone who...

      Your friend is piggybacking off a style to enrich their own ability. This will lead to your friend probably producing something original with it.

      An Ai is stealing art on the behalf of someone who cares nothing of the technique other than how it can profit them.

      6 votes
  3. [2]
    Comment deleted by author
    Link
    1. SleepyGary
      Link Parent
      Yea I can't help but feel that this is in the media not to put the brakes on AI but to get as many investors/customers as possible before the AI models are updated to detect and ignore these...

      Yea I can't help but feel that this is in the media not to put the brakes on AI but to get as many investors/customers as possible before the AI models are updated to detect and ignore these techniques. It's going to be a never ending arms race except the AI side can always just collect existing art and then re-train them whenever a prevention technique cracked.

      5 votes
  4. [2]
    Nijuu
    Link
    It isn't only art though. All that training data which ends up in AI generated articles, books, novels which some make money off. Arguably copyright issues there as well. I mean when all this AI...

    It isn't only art though. All that training data which ends up in AI generated articles, books, novels which some make money off. Arguably copyright issues there as well. I mean when all this AI stuff blew up a while back I didn't realize AI models in general used data from somewhere..

    11 votes
    1. KeepCalmAndDream
      Link Parent
      Not sure the text corpus can be poisoned in a similar way. There's a lot less you can alter in text without significantly changing meanings, nuances, styles etc. Also, you can conceivably get...

      Not sure the text corpus can be poisoned in a similar way. There's a lot less you can alter in text without significantly changing meanings, nuances, styles etc.

      Also, you can conceivably get enough professional artists on board to poison one category of training data: professionally-made modern (in the sense of from now onwards) artwork, that involve modern techniques and sensibilities for making that art. I don't think there's an analogue for text. Are e.g. modern techniques and content in novels so significantly different from past techniques and content that denying them to AI models will significantly impact their learning?

      8 votes
  5. eve
    Link
    I think it's great that tools like this are starting to come out. It was incredibly disheartening to see that even with no ai tags on sites, that images would be scraped anyways. It just boils...

    I think it's great that tools like this are starting to come out. It was incredibly disheartening to see that even with no ai tags on sites, that images would be scraped anyways. It just boils down to respecting an artists choice to not want their work to be part of a training model.

    9 votes
  6. mat
    Link
    I do have to wonder whether logistics engineers are trying to "poison" the AI systems revolutionising their field. Or medical practitioners are deliberately attempting to derail machine diagnostic...

    I do have to wonder whether logistics engineers are trying to "poison" the AI systems revolutionising their field. Or medical practitioners are deliberately attempting to derail machine diagnostic systems? Are industrial designers trying to somehow block generative CAD systems from working? "AI" image generation is one relatively boring tip of a generally very interesting technological iceberg, and one which has been floating around for considerably longer than the handful of years that the image and chatbots have been visible for.

    AI image generation is just another graphics tool. The smart creative professionals are already using AI systems to save them time and effort, or have decided they don't need that tool the same way oil painters don't need (or worry about the existence of) linocutting knives. The Luddites currently trying to smash the AI looms are doomed to fail and it won't matter at all when they do.

    AI image generation will probably kill a number of low-end, drudge-level graphic design jobs - just like InDesign and Illustrator and Photoshop did - but it'll create plenty of other opportunities along the way. Outside of commercial graphic design - which is the vast majority of creative visual work done in the world - the value of an artwork is rarely how it was created, but why. These AI systems are not taking the "why" away from art, they're not removing the human creative element - they're just adding another "how" to the already vast portfolio of tools humans use to turn ideas into things.

    The interesting and valuable hows will still exist. The interesting ideas can only come from humans. The AIs don't prompt themselves, which is something people seem to forget. The boring bit about how the pixels in an artwork get arranged can now, if the artist chooses, be handled by a machine. Which is quite interesting in terms of technology but from an artistic viewpoint it's extremely dull.

    6 votes
  7. Dr_Amazing
    Link
    I can't see something like this working long term. If the difference is notification to an AI, then an AI can be trained to screen it.

    I can't see something like this working long term. If the difference is notification to an AI, then an AI can be trained to screen it.

    2 votes