20 votes

GitHub Copilot - Your AI pair programmer

16 comments

  1. [6]
    hungariantoast
    Link
    Random thought: If someone built a similar tool, but trained it on leaked code from various Microsoft projects, how long would it take for the cease and desist letters to arrive?

    Random thought: If someone built a similar tool, but trained it on leaked code from various Microsoft projects, how long would it take for the cease and desist letters to arrive?

    13 votes
    1. Thra11
      Link Parent
      Another random thought: what if you took a programming course and asked GitHub Copilot to do your assignments for you? Would the code be good enough to pass the course, and would you be accused of...

      Another random thought: what if you took a programming course and asked GitHub Copilot to do your assignments for you? Would the code be good enough to pass the course, and would you be accused of plagiarism?

      2 votes
    2. [3]
      aditya
      Link Parent
      I've seen this idea or variants of it in a few different forums. As someone with zero legal background, I don't know if there's weight to asking GitHub why they didn't train on Microsoft's...

      I've seen this idea or variants of it in a few different forums. As someone with zero legal background, I don't know if there's weight to asking GitHub why they didn't train on Microsoft's proprietary code...

      1 vote
      1. [2]
        Weldawadyathink
        Link Parent
        It was probably just a pragmatic decision. Microsoft is a large company, and large companies are slow to act. GitHub would have had to work with the highest levels of the company. By using open...

        It was probably just a pragmatic decision. Microsoft is a large company, and large companies are slow to act. GitHub would have had to work with the highest levels of the company. By using open source code, they can release a product much sooner. Perhaps they are planning on adding Microsoft code as training data later.

        Not to mention that Microsoft code almost certainly does not have the breadth needed to train an AI. They likely would have had to use other open source code anyway.

        2 votes
        1. aditya
          Link Parent
          Oh, for sure. Call me cynical, but I see GitHub "getting away" with this for sure. I wonder what precedents this will set for open source licenses etc.

          Oh, for sure. Call me cynical, but I see GitHub "getting away" with this for sure. I wonder what precedents this will set for open source licenses etc.

          2 votes
  2. [2]
    vord
    Link
    At least we see the real reason behind the purchasing of Github. They purchased GitHub to use as their private code training data. They purchased exclusive access to OpenAI's GPT-3 source, likely...

    At least we see the real reason behind the purchasing of Github. They purchased GitHub to use as their private code training data. They purchased exclusive access to OpenAI's GPT-3 source, likely others going forward as well.

    I don't know if I've ever seen a more blatant example of "If you're not paying, you're the product, not the consumer."

    Microsoft only loves open source insofar as they can exploit it. Add this to my list of "Things Microsoft must release under the GPL before I think they've changed for the better."

    6 votes
    1. petrichor
      Link Parent
      To play devil's advocate, they totally didn't need to buy GitHub to have access to the open-source training data.

      To play devil's advocate, they totally didn't need to buy GitHub to have access to the open-source training data.

      10 votes
  3. [7]
    aditya
    Link
    Accidentally posted to ~tech first. Fixed. I'm curious to see how this plays out on the licenses front. I know several people have brought it up already, and I'm a FOSS guy, but a part of me also...

    Accidentally posted to ~tech first. Fixed.

    I'm curious to see how this plays out on the licenses front. I know several people have brought it up already, and I'm a FOSS guy, but a part of me also recognizes that MSFT / GitHub have a huge legal team that thinks this is likely to be okay...

    5 votes
    1. [6]
      cfabbro
      (edited )
      Link Parent
      Speaking of, here's Eevee's tweets about it:

      Speaking of, here's Eevee's tweets about it:

      github copilot has, by their own admission, been trained on mountains of gpl code, so i'm unclear on how it's not a form of laundering open source code into commercial works. the handwave of "it usually doesn't reproduce exact chunks" is not very satisfying

      copyright does not only cover copying and pasting; it covers derivative works. github copilot was trained on open source code and the sum total of everything it knows was drawn from that code. there is no possible interpretation of "derivative" that does not include this

      i'm really tired of the tech industry treating neural networks like magic black boxes that spit out something completely novel, and taking free software for granted while paying out $150k salaries for writing ad delivery systems. the two have finally fused and it sucks

      previous """AI""" generation has been trained on public text and photos, which are harder to make copyright claims on, but this is drawn from large bodies of work with very explicit court-tested licenses, so i look forward to the inevitable /massive/ class action suits over this

      "but eevee, humans also learn by reading open source code, so isn't that the same thing"
      - no
      - humans are capable of abstract understanding and have a breadth of other knowledge to draw from
      - statistical models do not
      - you have fallen for marketing

      17 votes
      1. [5]
        Comment deleted by author
        Link Parent
        1. [4]
          cfabbro
          Link Parent
          Microsoft's lawyers no doubt did their due diligence, but the law isn't really black & white, especially regarding IP and software licenses. Just because some corporate lawyers gave the OK doesn't...

          Microsoft's lawyers no doubt did their due diligence, but the law isn't really black & white, especially regarding IP and software licenses. Just because some corporate lawyers gave the OK doesn't mean that lawyers willing to represent the class, or a judge, will agree with them.

          6 votes
          1. [3]
            Comment deleted by author
            Link Parent
            1. hungariantoast
              (edited )
              Link Parent
              The questions that need to be answered in this situation are: Whether a program, whose functionality has arisen from being trained on open-source code, should be counted as a derivative of that...

              The questions that need to be answered in this situation are:

              • Whether a program, whose functionality has arisen from being trained on open-source code, should be counted as a derivative of that open-source code and thus subject to the licensing requirements of the code it was trained on.

              • Whether code, generated by a program trained on open-source code, should be considered a derivative work subject to the licensing of the training code it was generated from.

              So this is about derivative works. Whether programs whose functionality comes from being trained on open-source code are derivative works. Whether the code generated by those programs are derivative works.

              Is the program a derivative work?

              Is the code the program generates derivative work?

              Personally, in both cases, I think the answer is yes, but I also only think the first question above leads to enforceability. Neither Copilot nor the code it generates could exist without having been derived from open-source code through its training process. Therefore, I think Copilot should either not exist, or it should be released under a license compatible with the source-code it was trained on.

              As for the second question, whether the code Copilot generates is derivative work or not, I also think the answer is yes, but I don't think you could ever hope to practically enforce licensing on any code generated by Copilot and used in any project. It just wouldn't be practically enforceable. (Mostly because the code snippets generated by Copilot are not substantial portions of the works used to generate them.)

              At this point, having just read and learned about Copilot tonight, I'm sitting somewhere between "this is a really shitty thing for GitHub to do, Copilot should be open-source" and "I hope the Free Software Foundation can still afford lawyers".


              Also, it's not really relevant to the discussion, but some code generated by Copilot is copied from the training data:

              We found that about 0.1% of the time, the suggestion may contain some snippets that are verbatim from the training set.

              9 votes
            2. vord
              Link Parent
              I would be curious to know what kind of legal precedent that would end up setting. It would be very nice (or terrible) to be able to easily nullify laws by having a sufficiently large community...

              It's guaranteed any corporate law team worth their salt can prove incontrovertibly that code copying is both rampant an accepted in the community.

              I would be curious to know what kind of legal precedent that would end up setting. It would be very nice (or terrible) to be able to easily nullify laws by having a sufficiently large community violate them.

              3 votes
          2. Thra11
            Link Parent
            The page itself says this: Which I interpret to mean, "there's no real legal precedent, so we think we can get away with it for now, and hopefully avoid anything too punitive when the legal dust...

            The page itself says this:

            Why was GitHub Copilot trained on data from publicly available sources?

            Training machine learning models on publicly available data is considered fair use across the machine learning community. The models gain insight and accuracy from the public collective intelligence. But this is a new space, and we are keen to engage in a discussion with developers on these topics and lead the industry in setting appropriate standards for training AI models.

            Which I interpret to mean, "there's no real legal precedent, so we think we can get away with it for now, and hopefully avoid anything too punitive when the legal dust settles and it turns out we were wrong".

            5 votes
      2. skybrian
        Link Parent
        I don’t know very much about it, but I recall a bit of jargon about “transformative use” and I think I’d want to understand whether it applies. Anyone actually curious about the law might want to...

        I don’t know very much about it, but I recall a bit of jargon about “transformative use” and I think I’d want to understand whether it applies. Anyone actually curious about the law might want to look into opinions from legal experts.

        But it doesn’t seem particularly likely that cautious corporate lawyers would want their employees using this tool, a least until it’s more well understood. It’s easier to say “no” and stay out of trouble. So those lawsuits might not happen?

        2 votes
  4. Staross
    Link
    This could probably be useful for very common tasks (e.g. validate a form) but I'm a bit skeptical it would be if you work in a more specialized domain (e.g. scientific computing) where there's...

    This could probably be useful for very common tasks (e.g. validate a form) but I'm a bit skeptical it would be if you work in a more specialized domain (e.g. scientific computing) where there's not many examples of what you want to do to draw from.

    2 votes