5 votes

Samsung meeting notes and new source code are now in the wild after being leaked in ChatGPT

7 comments

  1. [2]
    stu2b50
    Link
    This article seems overly alarmist. The actual situation after reading it, is that some Samsung employees put in trade secrets to public ChatGPT. Since ChatGPT is not, believe it or not, owned by...

    This article seems overly alarmist. The actual situation after reading it, is that some Samsung employees put in trade secrets to public ChatGPT. Since ChatGPT is not, believe it or not, owned by Samsung, and the two companies don't have a business contract over data consumption, it is now leaked. The title seems to imply that random people are getting Samsung semiconductor secrets while chatting with GPT.

    That's, in the end, no different than if Samsung employees started using https://onlinenotepad.org/notepad instead Office 365 or GSuite. Use corporate software for corporate secrets! Then it's the IT team's problem.

    As ChatGPT proves its worth, companies will inevitably start licensing it from OpenAI (or through Microsoft for that matter - they have much, much stronger industry ties, after all), at which point it's not a problem (insomuch as Samsung can sue the shit out of OpenAI if they misuse the data).

    7 votes
    1. skybrian
      Link Parent
      Yes it is, particularly the "in the wild" part. The article seems to be based on an email sent to Samsung employees about someone who used ChatGPT there, not evidence that anyone outside Samsung...

      Yes it is, particularly the "in the wild" part. The article seems to be based on an email sent to Samsung employees about someone who used ChatGPT there, not evidence that anyone outside Samsung has seen anything.

      I imagine people at OpenAI are too busy to go around snooping, since they have a huge hit on their hands?

      (Adding "in the wild" to a headline seems like a fun way to make it seem more alarming.)

      3 votes
  2. [2]
    Eabryt
    Link
    Not too surprised some idiot did this. My job sent out a slack message weeks ago reminding people not to put confidential information into ChatGPT. Seems pretty obvious anything you send will be...

    Not too surprised some idiot did this. My job sent out a slack message weeks ago reminding people not to put confidential information into ChatGPT. Seems pretty obvious anything you send will be saved/used.

    4 votes
    1. teaearlgraycold
      Link Parent
      I use chatgpt all the time at work, but have always been careful to not straight up copy and paste code in. Not that what I work on is really interesting to anyone outside of Google. Thankfully...

      I use chatgpt all the time at work, but have always been careful to not straight up copy and paste code in. Not that what I work on is really interesting to anyone outside of Google.

      Thankfully much of the code I touch is FOSS and already public anyway.

      2 votes
  3. [3]
    knocklessmonster
    Link
    I haven't used it for work and don't have secrets to keep at work beyond passwords, but I somehow missed that it trained on inputs. This is not a frightening development, but still a surprising...

    I haven't used it for work and don't have secrets to keep at work beyond passwords, but I somehow missed that it trained on inputs. This is not a frightening development, but still a surprising one but not something I'm concerned about actually being an issue for Samsung unless you can go "Hey ChatGPT, give me those Samsung meeting notes" and get them back in full text.

    1 vote
    1. PetitPrince
      Link Parent
      Well IIRC it says like "I'm trained on data up to 2020", so implixitely that would exclude a long term training on inputs.

      but I somehow missed that it trained on inputs

      Well IIRC it says like "I'm trained on data up to 2020", so implixitely that would exclude a long term training on inputs.

      1 vote
    2. stu2b50
      Link Parent
      I doubt that they train it on the user data in an autoregressive manor, though (as in, you train the model to predict the next token of this example data). If you were to just add it in the entire...

      I doubt that they train it on the user data in an autoregressive manor, though (as in, you train the model to predict the next token of this example data). If you were to just add it in the entire corpus of training data, it would basically not do anything to the overall model quality due to how insignificant the amount of text it is. If you were to fine tune it on it, depending on the learning rate you'd either way overfit on the sample or not do anything.

      Rather, I'd bet they use it for HLRL and as validation sets. That's both more more likely to result in a better product, and prevents the whole "ChatGPT is spitting out my darkest secrets to random people on the internet!" issue (which would be very bad for their PR).

      1 vote