7 votes

These new tools let you see for yourself how biased AI image models are

8 comments

  1. [6]
    Wes
    Link
    I think it goes without saying that any generative AI model will inherit the cultural biases of the source material that it's trained on. If we feed it thousands of photos of white men labelled...

    I think it goes without saying that any generative AI model will inherit the cultural biases of the source material that it's trained on. If we feed it thousands of photos of white men labelled "CEO", then that's what it will incorporate. In this sense, the AI is a reflection of our own bias.

    I think it's a fine and noble goal to try to eliminate that bias. We have a lot more control over an AI program than we do over society as a whole, and if we can try to reduce the amplification of said biases (for example, posting even more white male CEO photos online), then that's generally a good thing.

    Tool-wise, there are limitations though. As I understand these language models, they're largely self-organized. They run far more efficiently when allowed to come up with their own structure, so tweaking the weights on specific attributes is actually harder than you might expect. There isn't a variable for "number of white people generated" that we can simply tweak.

    The quickfix is through prompt restrictions but that only gets you so far. Limiting training material may improve representation, but could also hurt the overall quality of the finished product.

    I wonder if some sort of second-pass analysis would ultimately be required. eg. examine the output of the first pass, then try to create a more impartial pool of images. That's just speculation on my part, and this would of course have a resource cost associated.

    These generative models are so new that the research is still very experimental. Stephen Wolfram's post even referred to the techniques as "lore", as if they're not really understood - they're just the approaches that have turned out to work well.

    With that in mind, it almost seems too early to start asking researchers to restrict how it works - but of course, the reality is that these tools are being implemented now. We don't have the luxury of time to let the details be fully worked out in a lab setting. So I do think companies like OpenAI have some responsibility to consider the implicit biases being reflected (and amplified) by their tools.

    It's a tough one because you don't want to go so far as to distort people's expectations. For the example from the article, I think that as a user, I would expect more women for prompts like "compassionate manager". But society is shaped by our exposure to media like this, so it seems like a pro-human move to try to represent things in a more unbiased, equitable way.

    5 votes
    1. [3]
      stu2b50
      Link Parent
      Actually manually tweaking the weights is strictly impossible (outside of tricks like weight clipping), it's not like you can, at least without an insane amount of work (see: We Found a Neuron in...

      so tweaking the weights on specific attributes is actually harder than you might expect.

      Actually manually tweaking the weights is strictly impossible (outside of tricks like weight clipping), it's not like you can, at least without an insane amount of work (see: We Found a Neuron in GPT-2), even know what a weight does.

      That being said, it's far from completely intractable either, and the solutions are more advance than simply manually filtering the training data (which is practically intractable for large datasets) or limiting prompts.

      It's worthwhile to separate LLMs from image generation networks, as the former is, maybe unintuitively, more far along in this.

      The main techniques that being used to corral LLMs is various forms of reinforcement learning via PPO. OpenAI used Human-Led Reinforcement Learning, wherein you train another language model to score the outputs from the LLM with a dataset of scored responses from humans (usually how this is done is that you have the LLM generate several messages, and then the human ranks them, and you derive scores that way, as opposed to the human directly submitting a score). Then, with that model you can create a larger dataset, and you can use PPO to fine tune the LLM to avoid "harmful" messages per the secondary model.

      The other approach, which is basically the same thing but with less humans, comes from the Constitutional AI paper. They start with only human contributed "red team" prompts that should lead to toxic responses. The very own LLM then, prompted with a set of rules (the titular "constitution") makes critiques and revisions of its own toxic responses, and then from that data the secondary language model to generate scoring is derived.

      Basically, PPO works surprisingly well with LORA so reinforcement learning is on the table for LLMs, and depending on how much human labor you can pay for, you can either use that or have the LLM learn from.. . itself.

      5 votes
      1. Wes
        Link Parent
        Aha, that's brilliant. Thanks for explaining PPO. That's a much more sophisticated implementation of my own suggestion above. It also allows the tool to run much quicker because no second pass is...

        Aha, that's brilliant. Thanks for explaining PPO. That's a much more sophisticated implementation of my own suggestion above. It also allows the tool to run much quicker because no second pass is required.

        I've only realized in the last couple weeks how far behind I was in understanding this technology, and I've been trying to get myself up-to-date. The pace of innovation is truly staggering.

        3 votes
      2. skybrian
        (edited )
        Link Parent
        To expand the acronyms, PPO stands for Proximal Policy Optimization and seems to be a kind of reinforcement learning. LoRA stands for Low Rank Adaption, which seems to be a way of fine-tuning...

        To expand the acronyms, PPO stands for Proximal Policy Optimization and seems to be a kind of reinforcement learning.

        LoRA stands for Low Rank Adaption, which seems to be a way of fine-tuning large language models when otherwise it would be prohibitively expensive due to their size.

        It seems like there should be more research along the lines of the “we found a neuron” paper? It may be hard to do now, but that could change if someone figures out the right approach.

        3 votes
    2. [2]
      Adys
      Link Parent
      It's worth remembering that by trying to eliminate that bias, you also eliminate evidence that the bias exists. For that reason alone, I'm in the camp "leave the bias, fix the root cause, and...

      I think it's a fine and noble goal to try to eliminate that bias.

      It's worth remembering that by trying to eliminate that bias, you also eliminate evidence that the bias exists. For that reason alone, I'm in the camp "leave the bias, fix the root cause, and prevent usage of AI in cases where its bias is not controlled for".

      1 vote
      1. skybrian
        Link Parent
        I think that argument works better for search engines than language models. The evidence is out there on the Internet. A search engine can help you find it. A chatbot will generate a biased...

        I think that argument works better for search engines than language models. The evidence is out there on the Internet. A search engine can help you find it. A chatbot will generate a biased response without revealing where it came from.

        There might be other reasons why it would make sense to train a language model on unsavory or controversial or even offensive inputs, though. If you’re using it to write fiction then sometimes you might want it to write realistically about the darker side of society. Maybe leave that to the human writers, though?

        Another reason might be for accurate translation.

        1 vote
  2. [2]
    Macil
    (edited )
    Link
    OpenAI apparently has been trying strategies to address this in DALL-E, like generating some images of people as if the prompt contains a minority descriptor, though it sometimes has the...

    OpenAI apparently has been trying strategies to address this in DALL-E, like generating some images of people as if the prompt contains a minority descriptor, though it sometimes has the unintentional result of adding people into images that weren't meant to contain people (not sure if this problem is as common nowadays though). I assume there are also deeper techniques they're trying by now.

    1 vote
    1. skybrian
      Link Parent
      Seems like you could do that manually too? If you don’t like the image you get, adjust the prompt. Good defaults are important and it’s good that there’s work on fixing them. But when you can...

      Seems like you could do that manually too? If you don’t like the image you get, adjust the prompt.

      Good defaults are important and it’s good that there’s work on fixing them. But when you can override them easily, they don’t seem like a showstopper? At least, so long as these tools are used with human oversight, as they mostly are now.