20 votes

GitHub Copilot - Your AI pair programmer

Posted June 30, 2021 by aditya

Tags: programming, github, machine learning, artificial intelligence

https://copilot.github.com/

Link information

This data is scraped automatically and may be incorrect.

Word count: 614 words

11 comments

[4]

Comment deleted by author
Link
1. [3]
  aditya (OP)
  July 1, 2021
  Link Parent
  I've seen this idea or variants of it in a few different forums. As someone with zero legal background, I don't know if there's weight to asking GitHub why they didn't train on Microsoft's...
  
  I've seen this idea or variants of it in a few different forums. As someone with zero legal background, I don't know if there's weight to asking GitHub why they didn't train on Microsoft's proprietary code...
  
  1 vote
  1. [2]
    Weldawadyathink
    July 1, 2021
    Link Parent
    It was probably just a pragmatic decision. Microsoft is a large company, and large companies are slow to act. GitHub would have had to work with the highest levels of the company. By using open...
    
    It was probably just a pragmatic decision. Microsoft is a large company, and large companies are slow to act. GitHub would have had to work with the highest levels of the company. By using open source code, they can release a product much sooner. Perhaps they are planning on adding Microsoft code as training data later.
    
    Not to mention that Microsoft code almost certainly does not have the breadth needed to train an AI. They likely would have had to use other open source code anyway.
    
    2 votes
    
    aditya (OP)
    July 1, 2021
    Link Parent
    Oh, for sure. Call me cynical, but I see GitHub "getting away" with this for sure. I wonder what precedents this will set for open source licenses etc.
    
    Oh, for sure. Call me cynical, but I see GitHub "getting away" with this for sure. I wonder what precedents this will set for open source licenses etc.
    
    2 votes
[2]
vord
July 1, 2021
Link
At least we see the real reason behind the purchasing of Github. They purchased GitHub to use as their private code training data. They purchased exclusive access to OpenAI's GPT-3 source, likely...

At least we see the real reason behind the purchasing of Github. They purchased GitHub to use as their private code training data. They purchased exclusive access to OpenAI's GPT-3 source, likely others going forward as well.

I don't know if I've ever seen a more blatant example of "If you're not paying, you're the product, not the consumer."

Microsoft only loves open source insofar as they can exploit it. Add this to my list of "Things Microsoft must release under the GPL before I think they've changed for the better."

6 votes
1. petrichor
  July 1, 2021
  Link Parent
  To play devil's advocate, they totally didn't need to buy GitHub to have access to the open-source training data.
  
  To play devil's advocate, they totally didn't need to buy GitHub to have access to the open-source training data.
  
  10 votes
[5]
aditya (OP)
June 30, 2021
Link
Accidentally posted to ~tech first. Fixed. I'm curious to see how this plays out on the licenses front. I know several people have brought it up already, and I'm a FOSS guy, but a part of me also...

Accidentally posted to ~tech first. Fixed.

I'm curious to see how this plays out on the licenses front. I know several people have brought it up already, and I'm a FOSS guy, but a part of me also recognizes that MSFT / GitHub have a huge legal team that thinks this is likely to be okay...

5 votes
1. [4]
  cfabbro
  June 30, 2021 (edited June 30, 2021)
  Link Parent
  Speaking of, here's Eevee's tweets about it:
  
  Speaking of, here's Eevee's tweets about it:
  
  github copilot has, by their own admission, been trained on mountains of gpl code, so i'm unclear on how it's not a form of laundering open source code into commercial works. the handwave of "it usually doesn't reproduce exact chunks" is not very satisfying
  
  copyright does not only cover copying and pasting; it covers derivative works. github copilot was trained on open source code and the sum total of everything it knows was drawn from that code. there is no possible interpretation of "derivative" that does not include this
  
  i'm really tired of the tech industry treating neural networks like magic black boxes that spit out something completely novel, and taking free software for granted while paying out $150k salaries for writing ad delivery systems. the two have finally fused and it sucks
  
  previous """AI""" generation has been trained on public text and photos, which are harder to make copyright claims on, but this is drawn from large bodies of work with very explicit court-tested licenses, so i look forward to the inevitable /massive/ class action suits over this
  
  "but eevee, humans also learn by reading open source code, so isn't that the same thing"
  - no
  - humans are capable of abstract understanding and have a breadth of other knowledge to draw from
  - statistical models do not
  - you have fallen for marketing
  
  17 votes
  1. [3]
    
    Comment deleted by author
    Link Parent
    
    [2]
    cfabbro
    July 1, 2021
    Link Parent
    Microsoft's lawyers no doubt did their due diligence, but the law isn't really black & white, especially regarding IP and software licenses. Just because some corporate lawyers gave the OK doesn't...
    
    Microsoft's lawyers no doubt did their due diligence, but the law isn't really black & white, especially regarding IP and software licenses. Just because some corporate lawyers gave the OK doesn't mean that lawyers willing to represent the class, or a judge, will agree with them.
    
    6 votes
    
    [2]
    
    Comment deleted by author
    Link Parent
    
    vord
    July 1, 2021
    Link Parent
    I would be curious to know what kind of legal precedent that would end up setting. It would be very nice (or terrible) to be able to easily nullify laws by having a sufficiently large community...
    
    It's guaranteed any corporate law team worth their salt can prove incontrovertibly that code copying is both rampant an accepted in the community.
    
    I would be curious to know what kind of legal precedent that would end up setting. It would be very nice (or terrible) to be able to easily nullify laws by having a sufficiently large community violate them.
    
    3 votes
  2. skybrian
    June 30, 2021
    Link Parent
    I don’t know very much about it, but I recall a bit of jargon about “transformative use” and I think I’d want to understand whether it applies. Anyone actually curious about the law might want to...
    
    I don’t know very much about it, but I recall a bit of jargon about “transformative use” and I think I’d want to understand whether it applies. Anyone actually curious about the law might want to look into opinions from legal experts.
    
    But it doesn’t seem particularly likely that cautious corporate lawyers would want their employees using this tool, a least until it’s more well understood. It’s easier to say “no” and stay out of trouble. So those lawsuits might not happen?
    
    2 votes
Staross
July 1, 2021
Link
This could probably be useful for very common tasks (e.g. validate a form) but I'm a bit skeptical it would be if you work in a more specialized domain (e.g. scientific computing) where there's...

This could probably be useful for very common tasks (e.g. validate a form) but I'm a bit skeptical it would be if you work in a more specialized domain (e.g. scientific computing) where there's not many examples of what you want to do to draw from.

2 votes