10 votes

Why large language models like ChatGPT treat Black- and White-sounding names differently

5 comments

  1. [4]
    skybrian
    Link
    There’s been some good research in mechanistic interpretability, but it wasn’t that kind of research, so it doesn’t say why in any detail. Sure, it was the training data, but that’s a superficial...

    There’s been some good research in mechanistic interpretability, but it wasn’t that kind of research, so it doesn’t say why in any detail. Sure, it was the training data, but that’s a superficial explanation.

    Showing that it happens is still useful, though. Since the output of LLM’s is literally based on random numbers, someone needs to do many queries and some statistics to find the overall trend. Trying things out informally and sharing cherry-picked examples is good for finding impressive or weird results, or bloopers, but not what happens on average.

    Training LLM’s to be insensitive to names might be possible by training with more examples that have names swapped. However, this might result in LLM’s generating text with historically or culturally inappropriate names, like happened with Gemini for images.

    It might be difficult to get an LLM to be sensitive to the cultural and historical context of a name when it’s appropriate and to ignore it when it shouldn’t do that. One thing the user could do though, is use blinding: don’t give the LLM a name when it’s not supposed to make decisions based on it.

    It might pick up other hints, though. LLM’s are often very sensitive to context. Figuring out likely continuations from context is kind of their whole deal. It’s what makes them seem impressive.

    I don’t think this is likely to be fixed until we can see specifically how LLM’s make decisions from context and edit them to remove the associations we don’t want for a given task. In the meantime, using LLM’s to generate code seems relatively safe, since you can review and test the code, and there are a lot of automatic tests that can be done on computer code.

    The need for careful manual review is going to limit what sort of tasks can be safely automated.

    In the meantime, the AI chatbots are out there, and anyone can use them. I don’t think they’re dangerous in the hands of people who are thoughtful about how to use them and mindful of the dangers. They’re not safe to use for anything where safety matters.

    How much can we trust the general public to only do safe things? Putting it that way, obviously not at all. But you know, people do other unsafe things without anyone preventing it, and we talk about freedom and “consenting adults.” We don’t live in a safe world.

    9 votes
    1. [3]
      Gaywallet
      Link Parent
      Attempts to remove demographic information, or pieces of information which can be associated with demographics can result in creating greater inequality. I find discussions where we focus on...

      One thing the user could do though, is use blinding: don’t give the LLM a name when it’s not supposed to make decisions based on it.

      Attempts to remove demographic information, or pieces of information which can be associated with demographics can result in creating greater inequality.

      I find discussions where we focus on mechanistic methods to "fix" bias in AI/ML/LLM miss the larger picture. Humans are biased and the data it is trained on is biased. I believe that we should focus on what methods reduce bias in humans, and see if we can't transfer some of those ideas mechanistically to models.

      11 votes
      1. sparksbet
        Link Parent
        The problem is that these are pattern-finding machines, so they often amplify existing human biases from the training data in ways humans wouldn't. Of course they're biased because the training...

        Humans are biased and the data it is trained on is biased.

        The problem is that these are pattern-finding machines, so they often amplify existing human biases from the training data in ways humans wouldn't. Of course they're biased because the training data is biased because humans are biased, but the way that bias manifests can be a lot more harmful. Especially since humans can wash their hands of biased decisions made by these models, in practice these biases can have huge negative effects when models are implemented in practice.

        6 votes
      2. skybrian
        Link Parent
        Yeah, my reaction is "okay, maybe it seemed like a good idea, but if it's not working, don't do it." I remember Google trying to filter resumes using AI (many years ago, not with LLM's) and...

        Decisions about who to interview for a job, who to provide medical care to, or who to grant a loan were once made by humans, but ever more frequently are made by machine learning (ML) algorithms

        Yeah, my reaction is "okay, maybe it seemed like a good idea, but if it's not working, don't do it." I remember Google trying to filter resumes using AI (many years ago, not with LLM's) and abandoning that approach.

        LLM's are more useful as hint generators about where you might find interesting evidence. People are going to do dumb things, though.

        To me, doing experiments on people to figure out what would work for LLM's seems like the wrong way around. It would be like doing human experiments to find out what works for mice. Yes, some things transfer, but it's indirect and doing it the hard way.

        For one thing, you don't have to worry nearly as much about ethical issues when experimenting on LLM's.

        2 votes
  2. balooga
    Link
    From the link: I’d like to read more about this. To the best of my knowledge, whether a model is open- or closed-source doesn’t have any bearing on the understandability of its inner workings....

    From the link:

    Most of these newer LLMs, the ones people are most accustomed to, like ChatGPT-4, tend to be closed source. With open source models, you can break it open and, in a technical way, look at the model and see how it is trained. And if you have the training data, you can look at whether the model was trained in such a way that it might encode disparities. But with the closed-source models, you have to find other ways to investigate.

    I’d like to read more about this. To the best of my knowledge, whether a model is open- or closed-source doesn’t have any bearing on the understandability of its inner workings. Even access to the full training set isn’t going to explain how a given pair of tokens will be weighted or interact in a particular context. Aren’t all LLMs fundamentally black boxes?

    3 votes