23 votes

AI has the worst superpower… medical racism

21 comments

  1. skybrian
    (edited )
    Link
    From the blog post: [...] [...] The paper is here. It's a preprint. Here's hoping they made a mistake somewhere and someone figures out the mystery. The Twitter thread has a bunch of researchers...

    From the blog post:

    In extremely brief form, here is what the paper showed:

    • AI can trivially learn to identify the self-reported racial identity of patients to an absurdly high degree of accuracy
    • AI does learn to do this when trained for clinical tasks
    • These results generalise, with successful external validation and replication in multiple x-ray and CT datasets
    • Despite many attempts, we couldn’t work out what it learns or how it does it. It didn’t seem to rely on obvious confounders, nor did it rely on a limited anatomical region or portion of the image spectrum.

    [...]

    Every radiologist I have told about these results is absolutely flabbergasted, because despite all of our expertise, none of us would have believed in a million years that x-rays and CT scans contain such strong information about racial identity. Honestly we are talking jaws dropped – we see these scans everyday and we have never noticed.

    [...]

    There is one more interpretation of these results that is worth mentioning, for the “but this is expected model behaviour” folks. Even from a purely technical perspective, ignoring the racial bias aspect, the fact models learn features of racial identity is bad. There is no causal pathway linking racial identity and the appearance of, for example, pneumonia on a chest x-ray. By definition these features are spurious. They are shortcuts. Unintended cues. The model is underspecified for the problem it is intended to solve.

    The paper is here. It's a preprint. Here's hoping they made a mistake somewhere and someone figures out the mystery. The Twitter thread has a bunch of researchers discussing what to investigate next.

    9 votes
  2. [14]
    Octofox
    Link
    Perhaps I'm misunderstanding the problem here, but this comes off as alarmist I think. Why is it such a big deal. Again, I'm probably missing something that makes the author so panicked about...

    Perhaps I'm misunderstanding the problem here, but this comes off as alarmist I think. Why is it such a big deal. Again, I'm probably missing something that makes the author so panicked about this. I would think that this is even a good thing as the models can provide more personalized analysis.

    Surely the how part is because different races are biologically slightly different and the model is able to recognize this while our brains are not evolved to recognize this like we are with faces/etc.

    8 votes
    1. [10]
      Gaywallet
      (edited )
      Link Parent
      Actually there's quite a big problem in medicine of racist treatment of individuals. A recent talk I attended at Stanford outlined how a big name researcher was tackling the problem. A short...

      Surely the how part is because different races are biologically slightly different and the model is able to recognize this while our brains are not evolved to recognize this like we are with faces/etc.

      Actually there's quite a big problem in medicine of racist treatment of individuals. A recent talk I attended at Stanford outlined how a big name researcher was tackling the problem. A short synopsis of what this fellow did was to run AI and ML on a healthcare system to identify the highest cost individuals. What he found was that the share of identified minorities did not match the slice of the population the healthcare system was treating (absolute share of minorities). Luckily he was a good scientist and realized that there was more clearly going on, so he dug a bit deeper. The discovery was that minority individuals were treated more poorly by the system (problems took longer to identify as the doctors were more resistant to ordering the same diagnostics as they were for their whiter and male-r peers). This is already something we've known for quite some time in literature, however, there's a bias for everyone to think "yes this is a problem, but we can't be the racist ones". It took a lot of digging and a lot of data to convince the system to make changes. In addition, he's updated his model to account for this and both identify systemic racism within healthcare as well as identify the individuals that need a higher touch.

      It's easy to dismiss systemic racism when we're not familiar with it. It's easy to think that ML and AI is the solution, but if you work in the field of data science you would see an alarming number of researchers and everyday workers giving talks on how to deal with racism in this field. It's very easy to let data run its course without actually understanding what that data represents, and the outcome is scary for precisely the reasons you've come in here talking about - this dismissal that nothing is wrong with the system, that it's able to 'recognize' things that humans cannot. We can easily say the same about the penal system and black Americans - the system is 'seeing' the inner criminal in them. It's easier to recognize as racism when we know humans are involved, but I guarantee you there's an upsetting number of algorithms in existence today which replicate that same human bias and are easier for most to dismiss as "computer smart, human dumb". We need to be extremely vigilant against this line of thinking.

      20 votes
      1. [6]
        TemulentTeatotaler
        Link Parent
        I'm very sympathetic to the existence of that sort of bias. Weapons of Math Destruction was a great book on the topic. The metrics by which colleges are ranked were based on the existing societal...

        I'm very sympathetic to the existence of that sort of bias. Weapons of Math Destruction was a great book on the topic.

        The metrics by which colleges are ranked were based on the existing societal view of how they ought to be ranked and in that it enshrined Ivy-league bias. Black Americans get screwed over by algorithmic sentencing recommendations and then that gets fed back into the system to reinforces the bias that was the problem in the first place.

        That said, I'm not sure I find this post a compelling example.

        this dismissal that nothing is wrong with the system, that it's able to 'recognize' things that humans cannot.

        The only thing being fed into the system is an x-ray. Whatever signal is being teased out exists within the pixels of that x-ray. This isn't zip code information and parents education being used as a proxy for race to dole out harsher sentencing.

        The author dismissing the image with a high pass filter as a "grey box" doesn't make sense, because it clearly is encoding relevant information that is being picked up.

        It's known that bone mineral density varies based on sex, age, ethnicity, etc., and some artifact of that is going to probably show up even in an image with with a high pass filter.

        In section IV they say they removed demographic confounds, but their success at doing so is really all we have to rely on to not assume a much simpler explanation: machine learning is great at combining minor discrepancies in things like the density of extra-grey pixels to have a fairly accurate prediction of self-reported racial identity.

        In fact, around 20% of radiologists admit being influenced by patient demographic factors.

        This bothered me. You can easily be a worse physician by having bias about the patient demographics--and many are-- but you also can't be a great physician while ignoring demographics. There are massive and relevant differences in behavior and risks based on demographics.

        This post seems to be promoting the view that we should remove all that relevant information because it isn't always being used responsibly. That feels like advocating suboptimal care instead of addressing the actual problems with inequality and bias.

        9 votes
        1. [4]
          Gaywallet
          Link Parent
          I would encourage you to expand your thinking on the matter. Things are always much more complicated than they seem. How much do you know about how imaging machines work? Can you explain to me how...

          Whatever signal is being teased out exists within the pixels of that x-ray. This isn't zip code information and parents education being used as a proxy for race to dole out harsher sentencing

          I would encourage you to expand your thinking on the matter. Things are always much more complicated than they seem. How much do you know about how imaging machines work? Can you explain to me how they are calibrated? How is the image interpreted? What data did we build upon to decide what is considered a negative health outcome or the diagnosis of a disease? Do you think that this data incorporates a lot of imaging done on minorities? What about the radiologist who interprets the image or the result? If we're looking to train a new model to detect disease states, how are we determining a disease state?

          Systems are incredibly complex. You're right to point out that the problems baked into this system are a bit more nuanced and difficult to separate than my example, but the example still stands for the same reasons - it doesn't matter whether it's a zip code being used or it's a training set of data or literature upon which a disease state is quantified, it's going to take a lot of work before we can have a truly unbiased system. Just because a computer is interpreting visual data does not mean that humans weren't involved at every step along the way.

          9 votes
          1. [3]
            TemulentTeatotaler
            Link Parent
            Great points that I neglected, thanks for the feedback! I wasn't thinking about sources of bias from the imaging devices. I think I also missed the thread a bit, as you were discussing whether or...

            Great points that I neglected, thanks for the feedback! I wasn't thinking about sources of bias from the imaging devices. I think I also missed the thread a bit, as you were discussing whether or not the "alarmism" was warranted. Hopefully some coffee will fix me for the day...

            Maybe I'm missing something else, but the issue of how the image is interpreted doesn't seem as difficult to resolve. It seems like you could remove any information about racial identity the system is providing the radiologist, if desired.

            5 votes
            1. [2]
              Gaywallet
              Link Parent
              Well, yes and no. The problem is that the very foundation upon which what is considered a disease state is based on data which does include racial identity. In the same way that minorities often...

              Maybe I'm missing something else, but the issue of how the image is interpreted doesn't seem as difficult to resolve. It seems like you could remove any information about racial identity the system is providing the radiologist, if desired.

              Well, yes and no. The problem is that the very foundation upon which what is considered a disease state is based on data which does include racial identity. In the same way that minorities often receive diagnostic tests much later during the progression of their disease than their non-minority counterparts, the confirmation of a disease state and the research which lead to that conclusion is often based on privileged and non-minority individuals. In order to solve this problem we also need to work at re-examining how we interpret an image as a disease state, and that's not very simple to solve. We'll also have to retrain everyone who works in this field if we make any discoveries as what they were taught during their specialization and in medical school will need to be updated. Keep in mind in order for any of this to happen we need to convince certain important individuals - people in charge of medical associations like the AMA and schools like UCSF and Harvard. Historically speaking, it's very difficult to change the minds of the people actually in charge, and those people are overwhelmingly white, male, and privileged. We've seen this play out in the recent coronavirus with many prominent respiratory doctors and virologists (many of which were not white or male) pointing out that COVID transmits primarily through the air, and not through surface/droplet transmission. Convincing major institutions that this was how it transmitted was difficult, and masking took a long time to catch on in many places in the world. We also still see a lot of sanitizer being used even in places where it's not really needed such as outdoor, sun-exposed surfaces which are not porous. Healthcare has a long and storied history of being quite resistant to change - the common example of this problem being adoption of hand washing taking an extreme amount of time to become regular practice. I see this continuing to be a problem and being a general impediment to change. I worry that this impediment will allow racist AI/ML systems to establish new standards because they are not given the same scrutiny as human based racist systems.

              4 votes
              1. TemulentTeatotaler
                Link Parent
                I think we're on the same page about systemic issues/challenges. I was trying to address the more narrow topic of gating information that ML systems can pick up on that human radiologists can't,...

                I think we're on the same page about systemic issues/challenges. I was trying to address the more narrow topic of gating information that ML systems can pick up on that human radiologists can't, but phrased that pretty poorly.

                The specious appeal of ML is that you escape the sort of "science advances one funeral at a time" human factors, needing to retrain a generation of radiologists or overcoming reluctance in handwashing when traveling between cadavers and delivering babies because its advocate is a jerk. Appreciate the reminder that it's very hard to get right. The book I mentioned put forth some guidelines (that I've forgotten) that I recall being pretty sensible ways of detecting when the system was getting things wrong and course-correcting. Hopefully those sorts of practices get adopted.

                4 votes
        2. skybrian
          Link Parent
          I think they’re mostly promoting the view that this is weird and should be investigated. They’ve done a lot of checking and couldn’t figure it out, so they are publishing a paper and want other...

          I think they’re mostly promoting the view that this is weird and should be investigated. They’ve done a lot of checking and couldn’t figure it out, so they are publishing a paper and want other scientists to take a look. There may be some reason why it’s not so bad, but we won’t know that until the mystery is solved.

          Assuming that it’s probably harmless, without checking, is an argument for complacency and against curiosity.

          Though of course, we are just outside observers, so at best we can follow what the scientists are doing. Hopefully someday soon we will find out what happened.

          5 votes
      2. skybrian
        Link Parent
        I think machine learning can make things better through scientists looking deeper and understanding the puzzling signals it picks up, as in that talk you described. It’s not going to happen...

        I think machine learning can make things better through scientists looking deeper and understanding the puzzling signals it picks up, as in that talk you described. It’s not going to happen through accepting the results uncritically.

        I’m optimistic that once such problems are understood and fixed, machine learning will be useful for detecting any backsliding and making sure they stay fixed. But there will be other anomalies to investigate.

        1 vote
      3. [2]
        meff
        Link Parent
        Do you have a link to anything this person published about their work? I'm very curious about this field and I'd love to find good case studies on how to explicitly account for systemic bias in...

        Do you have a link to anything this person published about their work? I'm very curious about this field and I'd love to find good case studies on how to explicitly account for systemic bias in models.

    2. eladnarra
      Link Parent
      Well, take an example where an algorithm has data that is racist by nature of being made by humans. For example, they use x-rays of lungs of humans looking for certain diseases and give it...
      • Exemplary

      Well, take an example where an algorithm has data that is racist by nature of being made by humans. For example, they use x-rays of lungs of humans looking for certain diseases and give it diseased and healthy lung x-rays. But the black people in the data set, on average, were x-rayed and diagnosed later during the disease progression. The reason might be lack of healthcare, or medical bias not taking symptoms seriously, or likely a combination. If the algorithm can determine race from the x-rays, it might decide that black people x-rayed earlier during disease progression don't have the disease when they actually do, because it doesn't look like the later stage of disease that it saw during training.

      10 votes
    3. vord
      (edited )
      Link Parent
      There is a legit aspect to being able to have information on genetic history, which is far more detailed and nuanced than just self-reported race. I suppose even without distinct genetic...

      There is a legit aspect to being able to have information on genetic history, which is far more detailed and nuanced than just self-reported race. I suppose even without distinct genetic information some risks could theoretically be identified based on that.

      I can't find the article at the moment, but I recall seeing writeups about how AI algorithms are inherintly conservative, because they generally rely on the already-known. It's a bias to conservativism, where the future can/should/would/does reflect the past. Rather than a more progressive "how could things change for the better?"

      Just consider most algorithmic music reccommendations. They reccomend purely on what you (and others like you) have already listened to. They have no subjectiveness on their own to evaluate how to broaden someone's taste in an enjoyable manner, other maybe feeding random selections.

      If a medical AI was trained purely on medical data from the 1800's, it would come up largely with 1800's solutions to the problems at hand. Not the 2000's.

      So if society has deeply engrained homophobic/racism/sexist problems, those problems will manifest in an AI.

      4 votes
    4. skybrian
      Link Parent
      The model is somehow figuring it out even when they’ve filtered the image to be a grey box, so something freaky is going on, and it doesn’t seem to be medically relevant. I think it’s important to...

      The model is somehow figuring it out even when they’ve filtered the image to be a grey box, so something freaky is going on, and it doesn’t seem to be medically relevant.

      I think it’s important to actually figure out what it’s doing, because depending on correlations you don’t understand is bad. Maybe when they actually figure it out it won’t seem that alarming, though.

      4 votes
  3. [3]
    Grendel
    Link
    Computer Science is a pretty new field of study. New fields of study tend to do some pretty unethical things before they start to figure crap out, and this is no exception. Machine Learning should...

    Computer Science is a pretty new field of study. New fields of study tend to do some pretty unethical things before they start to figure crap out, and this is no exception. Machine Learning should not be used for any task that could significantly impact a human life. Not for prison sentences, not for medical diagnosis. Having a human review the results first isn't enough. Machine learning is very poorly understood and we shouldn't let something that we can't explain be making these kinds of decisions.

    It honestly reminds me of some of the stories about the early study of radioactive material. In the 20s and 30s they actually put radioactive material in toothpaste because they thought it would clean your teeth better. There were few if any safety protocols around it and it caused pain and suffering (and death) before we realized what was really going on and put in safety measures.

    We need to better understand this stuff before using it in a practical manner. Failing to do so is unethical and I believe that Computer Scientists should take a stand against this. Stand against your company using Machine Learning this way. If enough of us do it might just get their attention.

    4 votes
    1. skybrian
      Link Parent
      This research is important, but I think comparing medical diagnosis to the early misuse of radiation is probably exaggerating the harm done. Even if this flawed software went into production, It's...

      This research is important, but I think comparing medical diagnosis to the early misuse of radiation is probably exaggerating the harm done. Even if this flawed software went into production, It's not obvious that diagnoses done by machine would be any worse on average than those done by people.

      There is a possibility to do better and clean up systemic biases that already exist.

      3 votes
    2. meff
      Link Parent
      I'm of the belief that unexplainable models, that means most DL models though there's gray areas around algorithms such as XGBoost and their random forests, should not be used in any situation...

      We need to better understand this stuff before using it in a practical manner. Failing to do so is unethical and I believe that Computer Scientists should take a stand against this. Stand against your company using Machine Learning this way. If enough of us do it might just get their attention.

      I'm of the belief that unexplainable models, that means most DL models though there's gray areas around algorithms such as XGBoost and their random forests, should not be used in any situation that could significantly impact a human life. That, however, opens up an interesting philosophical question:

      What happens if we create an unexplainable model which maintains current systemic inequalities but also results in better outcomes for everyone involved, even the groups being systematically discriminated against? I believe we should probably create a minimum threshold that needs to be crossed before unexplainable models should be considered. E.g. If explainable SOTA is 30% and unexplainable SOTA is 90% (and for various reasons humans cannot perform better than explainable SOTA) only then should we use the unexplainable SOTA.

      1 vote
  4. [3]
    yellow
    Link
    The part with the high pass filter is particularly interesting. I was expecting that it would be basing it off the proportions of the skeleton. An AI could pick up some very subtle ratios and...

    The part with the high pass filter is particularly interesting. I was expecting that it would be basing it off the proportions of the skeleton. An AI could pick up some very subtle ratios and contours. Though I suppose that could still be possible if the high pass filter still allows the AI to see where the edges of bones are. Also they say that this occurs when the AI is not optimized to detect race, but they are still training it to detect race at least a little bit, right? I don't think you can get an output from an AI without training for the output at all?

    I'd also be curious if it's somehow using something outside of what's actually being imaged. Perhaps the AI is only learning the Xray machines used (and some characteristic of their images) in Asia to determine who is Asian.

    3 votes
    1. skybrian
      Link Parent
      I recommend reading through the Twitter thread to see the different ideas the researchers have about what might be causing this. They say they’ve looked at a lot of possibilities and not all of...

      I recommend reading through the Twitter thread to see the different ideas the researchers have about what might be causing this. They say they’ve looked at a lot of possibilities and not all of them made it into the paper.

      2 votes
    2. joplin
      Link Parent
      I would expect a high pass filter to still show edges, as those tend to be the highest frequency components in images. It may look like a solid light gray box to us, but I bet the computer still...

      The part with the high pass filter is particularly interesting. I was expecting that it would be basing it off the proportions of the skeleton. An AI could pick up some very subtle ratios and contours. Though I suppose that could still be possible if the high pass filter still allows the AI to see where the edges of bones are.

      I would expect a high pass filter to still show edges, as those tend to be the highest frequency components in images. It may look like a solid light gray box to us, but I bet the computer still sees faint traces of the bones somehow.

      Perhaps the AI is only learning the Xray machines used (and some characteristic of their images) in Asia to determine who is Asian.

      That's a really interesting idea! I could also see something similar happening with machines in lower-income neighborhoods being one or a particular set of less expensive brands, for example. I'll take a look at the Twitter thread @skybrian mentioned, but there are so many things hidden things like that which could cause these correlations to accidentally be revealed. (My first thought was that some X-rays have the name of the patient on them, and names tend to cluster in certain ethnic groups. I assume they filtered that stuff out, though.)

      2 votes