12 votes

Medical chatbot using OpenAI’s GPT-3 told a fake patient to kill themselves

15 comments

  1. [7]
    Grendel
    Link
    GPT-3 is a novelty at best and shouldn't be used for anything even remotely important. Honestly the best use I've seen is for the A.I Dungeon

    GPT-3 is a novelty at best and shouldn't be used for anything even remotely important. Honestly the best use I've seen is for the A.I Dungeon

    13 votes
    1. [6]
      Thra11
      Link Parent
      GPT-3: Doesn't really understand its input. Spews nonsensical garbage, but looks like a human if you squint / ignore the weird bits. Medical consultations: Looking like a human isn't even...

      GPT-3: Doesn't really understand its input. Spews nonsensical garbage, but looks like a human if you squint / ignore the weird bits.
      Medical consultations: Looking like a human isn't even important. What matters is a good understanding of the symptoms, a correct diagnosis, and a reasonable treatment plan.

      Even the AI Dungeon wasn't that great, because it couldn't remember any of the details of the story so far.

      I'm having difficulty imagining why anyone with even a passing acquaintance with GPT-3 or medicine would actually do a serious trial of a GPT-3 medical chat bot, so I'm going to assume that this is a case of "People love to read about AI doing funny/stupid things, so lets use that to promote our business".

      8 votes
      1. [3]
        skybrian
        Link Parent
        The original blog post explains more about their motivations : Their conclusion: It seems they didn't really expect it to work, but wanted to see what it would do.

        The original blog post explains more about their motivations :

        As Open AI itself warns in GPT-3 guidelines, healthcare “is in the high stakes category because people rely on accurate medical information for life-or-death decisions, and mistakes here could result in serious harm”. Furthermore, diagnosing medical or psychiatric conditions falls straight in the “unsupported use” of the model. Despite this we wanted to give it a shot and see how it does on the following healthcare use cases, roughly ranked from low to high sensitivity from a medical perspective: admin chat with a patient, medical insurance check, mental health support, medical documentation, medical questions & answers and medical diagnosis. We also looked at the impact of some parameters of the model on the answers [...]

        Their conclusion:

        As warned by OpenAI, we are nowhere near any real time scenario where GPT-3 would significatively help in healthcare. [...]

        It seems they didn't really expect it to work, but wanted to see what it would do.

        13 votes
        1. [2]
          Thra11
          Link Parent
          Yes. The reason I cynically assume PR/promotion is that if I approach my boss and say, "This crazy idea will never work, can I try it anyway to see what it does?", the answer is almost certainly,...

          It seems they didn't really expect it to work, but wanted to see what it would do.

          Yes. The reason I cynically assume PR/promotion is that if I approach my boss and say, "This crazy idea will never work, can I try it anyway to see what it does?", the answer is almost certainly, "No", not, "Sure, take a team of engineers for a few weeks and have fun!".

          3 votes
          1. skybrian
            Link Parent
            They seem to use AI in some other way, so a different PR-related reason this might be helpful to them would be to put down an apparently competing approach by showing that it doesn’t work. Maybe...

            They seem to use AI in some other way, so a different PR-related reason this might be helpful to them would be to put down an apparently competing approach by showing that it doesn’t work. Maybe they got tired of getting asked what they think of GPT-3?

            Also, I doubt it took weeks to test. I would estimate less than a day after getting access to GPT-3, which is the only hard part.

            2 votes
      2. [2]
        vegai
        Link Parent
        Do our brains understand its input?

        GPT-3: Doesn't really understand its input.

        Do our brains understand its input?

        1 vote
        1. lonjil
          Link Parent
          Any way you can define 'understand', our brains are closer to it by several orders of magnitude than GPT-3.

          Any way you can define 'understand', our brains are closer to it by several orders of magnitude than GPT-3.

          3 votes
  2. nothis
    Link
    This is maybe the weirdest part about GPT3: It's a computer program that's better at natural chit-chat than math.

    Similar logic issues persisted in subsequent tests. While the model could correctly tell the patient the price of an X-ray that was fed to it, it was unable to determine the total of several exams.

    This is maybe the weirdest part about GPT3: It's a computer program that's better at natural chit-chat than math.

    6 votes
  3. [6]
    eladnarra
    Link
    Even though there are warnings to not use GPT-3 for this type of thing, it's still interesting to see what it does. Maybe this will help with identifying the things we'd need AI to be able to do...

    Even though there are warnings to not use GPT-3 for this type of thing, it's still interesting to see what it does. Maybe this will help with identifying the things we'd need AI to be able to do before implementing it in a medical setting.

    Of course, I'm not entirely sure why some of these scenarios need an AI. You can check basic medical insurance info through scripted bots or phone systems, and I'd much prefer an online appointment system that shows all available appointments so I can just pick what works for me, rather than having a back and forth about what days and times might work. I guess perhaps a conversational chat might be more accessible to some people.

    4 votes
    1. [2]
      Akir
      Link Parent
      Honestly, this is why I hate when any given product is enhanced with “AI”. We can’t get AI to reliably recognize speech for unimportant things, so why in hell would I want to use it or things that...

      Honestly, this is why I hate when any given product is enhanced with “AI”. We can’t get AI to reliably recognize speech for unimportant things, so why in hell would I want to use it or things that are actually important. Especially when we are literally entrusting our lives to it?

      5 votes
      1. Wes
        Link Parent
        I think it's worth pointing out that not all AI is the same. Some AI has very narrow definitions and training data. GPT-3 is a specific type of AI, and anyone could tell you it would not work for...

        I think it's worth pointing out that not all AI is the same. Some AI has very narrow definitions and training data.

        GPT-3 is a specific type of AI, and anyone could tell you it would not work for the purposes of medical diagnosis. It's a language tool, and within those parameters it actually does very well. Outside of scientific curiosity, it would never be entrusted with serious jobs like medical diagnosis, deciding when to brake a car, or ordering more Ovaltine from the store.

        Besides, humans aren't exactly great at these tasks either. As long as the AI is better than humans, we're already saving lives. In the mean time, a hybrid approach (eg. systems compiling suggestions, driver assists, etc) can help us improve the systems without relying on them fully.

        2 votes
    2. [3]
      vektor
      Link Parent
      Frankly, there is applications for AI in medicine. This is completely from a utilitarian perspective, so I'm not even going into the data protection aspects. But here's a thing: Medicine is...

      Frankly, there is applications for AI in medicine. This is completely from a utilitarian perspective, so I'm not even going into the data protection aspects.

      But here's a thing: Medicine is generating intentionally and as a side product more data than humans can reasonably process. Electronic health records that pull all of that together can enable us to build AI with that data. Here's a few use cases:

      • Screening for diseases
      • Augmenting (or straight up doing) the diagnosis of multiple symptoms (i.e. finding the most probably set of diseases that explain the symptoms) and finding the right diagnostic tool to verify that diagnosis.
      • Rigorous medical history checks using all available past data in support of the last two points
      • Computer vision + medical imaging - particularly when the data can not be adequately represented to a human anymore, i.e. volumetric(3D) scans, multispectral scans, sensor fusion, etc.
      • Relieving doctors of menial tasks like diagnoses in low-stake situations ("Here's a note for your employer to keep you at home cuz you're infectious"), clerical work

      Granted, GPT3 does none of these better than a purpose-built model. But there definitely are uses for AI in medicine if we want to make use of it.

      3 votes
      1. [2]
        eladnarra
        (edited )
        Link Parent
        Ideally those would all be great, but I'm honestly a bit skeptical as a chronic illness patient. I have no doubt that AI could be very helpful in diagnosing well-understood diseases eventually,...

        Ideally those would all be great, but I'm honestly a bit skeptical as a chronic illness patient. I have no doubt that AI could be very helpful in diagnosing well-understood diseases eventually, but in the current (US) medical system my fear is that AI will simply perpetuate current diagnostic black holes (which exist due to things like racism, ableism, sexism, patchy records, and lack of medical knowledge and research about certain conditions).

        For example, if an AI used medical records as training data, it would learn doctor biases, such as "this patient is female and thus conversion disorder is more likely." It would also potentially learn incomplete or outright wrong diagnoses. For example, some people take a decade or more to be properly diagnosed, and if you used their records 7 years in, you might get an AI that diagnoses racing heart, joint pain with no obvious injury or autoimmune markers, stomach issues, and trouble breathing as anxiety. And since those were the patient's current diagnosis according to several doctors, there's no easy way to filter it out of training data as "suspect." The patient might know there's more wrong with them, but their medical records say case closed. Whereas if you used their records after a decade, the AI would learn those could be signs of POTS, hEDS, and MCAS, and it might learn that these are often comorbid.

        Of course, maybe AI won't be trained on medical records. And if it is, it's not like an AI coming to false conclusions is worse than a doctor coming to false conclusions... Except I suspect (with, granted, little evidence) that AI potential diagnoses will be considered more objective, and possibly even used by insurers to override doctors. They already do so with step therapies and drug formularies, after all.

        5 votes
        1. vektor
          Link Parent
          Actually, those can actually be easier to remediate with AI. If a study finds, for example, that a certain condition is overdiagnosed by 50%, you could tell the AI to account for that. Bonus...

          but in the current (US) medical system my fear is that AI will simply perpetuate current diagnostic black holes

          Actually, those can actually be easier to remediate with AI. If a study finds, for example, that a certain condition is overdiagnosed by 50%, you could tell the AI to account for that. Bonus points if you have high-quality data of what is an overdiagnosis and what is not or what are "risk factors" for a false diagnosis. You also have the option of blinding the AI to certain demographic variables like gender or race. We've all heard horror stories about that being badly done, but it can be done properly. You can leverage GANs for example to eliminate a certain variable (race, gender) and all its predictors (first name, address, even parts of medical history etc.) from a record while keeping the rest of the data intact.

          And since those were the patient's current diagnosis according to several doctors, there's no easy way to filter it out of training data as "suspect."

          You could (a) connect a diagnosis to its treatment and the treatment to its success. That way, if the success is lacking, the diagnosis is wrong (generally speaking) or (b) use "explaining away" - if a later diagnosis (anxiety) explains earlier symptoms (trouble breathing) that led to another diagnosis (lung disease), then that earlier diagnosis is likely wrong. That would of course mean only using symptoms as training data that have been adequately treated, to ensure you get the "final" diagnosis.

          There's also options that are quite adept at modelling their own uncertainty, which will be very helpful. We could then throw a lot of doctors at the problem of really figuring out what the correct diagnosis is there. This would then be considered ground truth training data. Do that at the scale of a nation and you've got one big active learning system.

          I guess this all goes to show that you ought to handle your data very carefully, whatever you do.

  4. patience_limited
    Link
    It's weirdly reminiscent of Crisis Text Line's training - there were specific questions we had to ask, in a specific order. But part of the training was about building a connection of empathy. We...

    It's weirdly reminiscent of Crisis Text Line's training - there were specific questions we had to ask, in a specific order.

    But part of the training was about building a connection of empathy. We were supposed to use our human judgment about how and when to get those fundamental safety questions answered, determine how to link the texter with the most appropriate forms of support for the problems they were expressing, and de-escalate their emotional response to the situation that triggered their sense of being overwhelmed.

    Judging by the cognitive demands for dealing with people in crisis (I burned out after the first hundred calls), it would be useful if an AI could handle the basics. I can't imagine, though, that an AI would be capable of parsing the enormous variety of human problems coming in, or know when to ask for help.

    [I was somewhat disappointed to learn that Crisis Text Line's founder had a side business in using anonymized information to train chatbots for customer service. Nonetheless, that's a relatively low-risk interaction.]

    4 votes