43 votes

Microsoft CEO of AI claims online content is 'freeware' [and can be used to train LLMs in the absence of a specific directives from the author against this]

44 comments

  1. [2]
    raze2012
    Link
    So much fumbling in a single statement. I'll do my favorite thing and point out the hypocrisy with a case against scraping from LinkedIn* (owned by Microsoft):...

    "I think that with respect to content that is already on the open web, the social contract of that content since the 1990s has been it is fair use," he opined. "Anyone can copy it, recreate with it, reproduce with it. That has been freeware, if you like. That's been the understanding."

    So much fumbling in a single statement.

    1. I'll do my favorite thing and point out the hypocrisy with a case against scraping from LinkedIn* (owned by Microsoft): https://www.linkedin.com/blog/member/trust-and-safety/update-hi-q-legal-proceeding

    the Court announced a significant win for LinkedIn and our members against personal data scraping, among other platform abuses. The Court ruled that LinkedIn’s User Agreement unambiguously prohibits scraping and the unauthorized use of scraped data as well as fake accounts, affirming LinkedIn’s legal positions against hiQ for the past six years. The Court also found that hiQ knew for years that its actions violated our User Agreement, and that LinkedIn is entitled to move forward with its claim that hiQ violated the Computer Fraud and Abuse Act.

    The ruling was not even two years old. Rules for me and not for thee.


    1. Suleyman clearly did not consult any software lawyers, because this is a sub-Law 101 mistake. Copyright is by default, not opt-in. For reasons precisely so random big companies can't come on by and take your ideas.

    https://choosealicense.com/no-permission/

    When you make a creative work (which includes code), the work is under exclusive copyright by default. Unless you include a license that specifies otherwise, nobody else can copy, distribute, or modify your work without being at risk of take-downs, shake-downs, or litigation. Once the work has other contributors (each a copyright holder), “nobody” starts including you.

    btw, No License is a very bad idea and no site larger than a single blog should really provide something. Especially when hosting on servers that have their own TOS to begin with. But this is all theoretical because...


    1. lastly, this is all wrong because the site in question suing does indeed have a license to begin with. So it's not "freeware" in the slightest.

    https://revealnews.org/terms-of-use/

    There may be revisions since before OpenAI scraped it, but I won't dig too deeply there. Feel free to date-check. The TOS seems pretty standard though:

    \1. You will access and use the Services solely for your own personal use;

    \8. You will not engage in unauthorized spidering, “scraping,” or harvesting of content or personal information, or use any other unauthorized automated means to compile information from the Services without prior written permission of CIR;

    I have no clue how anyone can interpret this as "a gray area". They say "do not do this with our content" and you did that anyway. But again, that's Sulleyman's statement

    "There's a separate category where a website or publisher or news organization had explicitly said, 'do not scrape or crawl me for any other reason than indexing me,' so that other people can find that content," he explained. "But that's the gray area. And I think that's going to work its way through the courts."


    It seems like a very easy slam dunk of a case for CIR, especially if OpenAI's best defense is "well it's publicly available, it should be free". There is definitely an argument to be said about copyright being way too restrictive. But companies like Microsoft lobbied for that and took advantadge of it for decades, so at the very least I'll enjoy the schadenfreude of them being sued for the very thing they asked for.

    *One of my favoite bits of trivia: On a windows computer, try pressing Win+Ctrl+Alt+Shift+L

    53 votes
    1. elight
      (edited )
      Link Parent
      This is so much about companies not wanting their compute and bandwidth used to help competing companies to (1) make more money and (2) use content in their closed sandbox as leverage to get ahead...

      This is so much about companies not wanting their compute and bandwidth used to help competing companies to (1) make more money and (2) use content in their closed sandbox as leverage to get ahead in the competition to create the most popular and profitable AI.

      It's not about taking user's content, for them. It's about protecting the content that the business controls and wants locked off from the rest of the world for its own selfish purposes.

      There's open web and there's "open web".

      No one on the receiving end of site scraping that goes on to train LLMs, unless you're being paid to have your content scraped. Speaking as someone who worked at one of those companies.

      10 votes
  2. [20]
    delphi
    Link
    Maybe a hot take, but I've kind of always agreed here. I believe in the freedom of information above all else. I also believe that IP law as a whole has done a lot more harm than good. As someone...

    Maybe a hot take, but I've kind of always agreed here. I believe in the freedom of information above all else. I also believe that IP law as a whole has done a lot more harm than good. As someone who writes things on the internet for a living, I am 100% of the belief that anything that's on the internet should be free for everyone else to use for any purpose. Call this... radical open-source, if you will. I personally haven't excluded any of the web crawlers from my pages, because I think that if they want to take my stuff, I can't do anything about it, so I might as well embrace it.

    21 votes
    1. papasquat
      Link Parent
      All of Microsoft's software is on the internet, but they still charge for it, and if my 100,000k user company is all using unlicensed, cracked copies of windows, you'd better believe that their...

      All of Microsoft's software is on the internet, but they still charge for it, and if my 100,000k user company is all using unlicensed, cracked copies of windows, you'd better believe that their lawyers are going to pretty aggressively disagree with Mustafa's point of view as they bill their employer millions to sue me into oblivion.

      28 votes
    2. [17]
      GunnarRunnar
      Link Parent
      Would you mind explaining your opinion? Because the way I see it, freedom of information doesn't equal freedom to use that information anyway you see fit. Does it only concern written information...

      Would you mind explaining your opinion?

      Because the way I see it, freedom of information doesn't equal freedom to use that information anyway you see fit. Does it only concern written information or does it also include stuff like movies, music, games?

      Should I be able to make a carbon copy Fortnite and fill it with Taylor Swift songs and show the new Deadpool movie in it? What's the incentive to be a creative if you aren't getting paid in this world that revolves around money? Because like you must know, it takes time, it is a job. And if the next person can just pick up the shit I spent time making and do their own thing with it, sell it or whatever, isn't it obvious that I'm getting fucked? My investment in time and talent is stolen without getting anything back. Should anything but material stuff and physical services cost money if access and use should be free?

      19 votes
      1. [14]
        mordae
        Link Parent
        It is impossible to read all the books, watch all the movies, series, plays, listen to all the radio shows, podcasts, songs and play all the games anyway. Ditch copyright, destroy the livelihoods...

        It is impossible to read all the books, watch all the movies, series, plays, listen to all the radio shows, podcasts, songs and play all the games anyway. Ditch copyright, destroy the livelihoods of anybody who needs it, move those people back into factories and construction. But in turn, shorten the work week to 4 days and build more community spaces. Give everybody a better chance to play in a small local band, write a short story for their friends, make a game for their kids or stage a play during summer camp.

        Culture shouldn't be a commodity. At least not as much as it is nowadays. There still will be people making the next Divinity or Factorio and they will get their patrons to keep up the work. Not so much EA.

        7 votes
        1. [8]
          GunnarRunnar
          Link Parent
          That seems like a total fantasy. Are there enough "factory jobs" in this increasingly automated world? What's the system to organize the next Divinity? Divinity 2 had 500 names in its credits....

          That seems like a total fantasy. Are there enough "factory jobs" in this increasingly automated world?

          What's the system to organize the next Divinity? Divinity 2 had 500 names in its credits. Games can take multiple years to complete and crunch is sadly a fact that exists in that industry.

          How do you get the collective will to push a project that size forward if there aren't hierarchical roles and everyone can just go at their will (as it isn't their job)? Valve sounds like one of the coolest companies to work for in the gaming space but there seems to be this clear inefficiency because of their loose organizational model -- which is cool! But imagine that with everyone just using their free time instead as a job. The same type of stuff applies to movies and tv.

          Stardew Valley took the solo dev to work 10 hours a day 7 days a week for 4 years (according to Wikipedia). That sounds like a creative effort that required a lot of work. Even if they were able to keep up the 10 hour pace with the three day weekend, it would've taken him almost close to 10 years to finish that game to 1.0.

          There are so many holes in your admittedly nice idea (who wouldn't want to live a fulfilling life with a decent work-life balance) that it's exhausting to breakdown.

          14 votes
          1. mordae
            Link Parent
            Nice, maybe we can do 3-day work week instead of 4-day! Or even 2-day work week. Wow, almost sounds like everybody might have a chance be a near-full-time artist if this automation thing...

            That seems like a total fantasy. Are there enough "factory jobs" in this increasingly automated world?

            Nice, maybe we can do 3-day work week instead of 4-day! Or even 2-day work week. Wow, almost sounds like everybody might have a chance be a near-full-time artist if this automation thing continues. Cool trend!

            How do you get the collective will to push a project that size forward if there aren't hierarchical roles and everyone can just go at their will (as it isn't their job)? [...]

            Why are you asking me about organization? We are talking about funding, right?

            Anyway.

            You are basically saying that we should just rely on capitalists funding game projects because some projects funded that way ended up being a monumental success stories that allowed people to close themselves off in a nice little virtual world and enjoy a moment of respite from the terrible meat grinder that is their job. I am not sure why do you suggest that there should be somebody to force the developers to complete the game while simultaneously showcasing a game whose sole developer had something close to absolute freedom to make it to their liking, even rewriting it completely a couple of times.

            Isn't it obvious that if you change the culture, you... change the culture? I really like what the modding community around Factorio does, for example. I found such an open ecosystem more interesting than the rigidness of "here's my masterpiece, admire it" of e.g. Banished that basically died of because the author refused to let go of the source code or even support the Colonial Charter mod properly.

            But OK, if you want to make sure brilliant lone wolves have a chance to build this kind of art, then you can always start a game moonshot fund. And anybody who contributes to the fund will receive the games funded through it for free (or discounted if they've not contributed for the whole duration of development) while non-contributors will have to pay for every game individually.

            There can be some overhead (let's say up to 20%) to help people pitch their games to the members who would then vote on what is going to get funded and what is not. Heck I would contribute $25 monthly to such a fund if it aimed to cover various genres and it guaranteed that the games would enter public domain (or become open source) after at most 10 years.

            That's about 1600 subscriber-years for Stardew Valley at $100k/year for the developer. Or about 800 subscriber-years at €50k/year for a developer based in Germany.

            With 100.000 subscribers you could afford to fund about 375 developers spread between US and EU.

          2. [6]
            kingofsnake
            (edited )
            Link Parent
            In agreement, but marginally so. We should strive for the values of the previous poster, but the hivemind uncoordinated produced Starfield - a bloated, unfinished and over funded fever dream. The...

            In agreement, but marginally so. We should strive for the values of the previous poster, but the hivemind uncoordinated produced Starfield - a bloated, unfinished and over funded fever dream.

            The winning solution is idealism and boundless innovation with respectful, risk taking managers,.industry convention and angel investors who believe in the importance of dumb, novel projects.

            **Edit - I meant to say Star Citizen. Starfield in this context doesn't make a whole pile of sense.

            1 vote
            1. GunnarRunnar
              Link Parent
              I'm not really talking about what I see as the ideal (it would be something resembling Star Trek) but how I see our reality currently working. If we were to improve the current system I'd say...

              I'm not really talking about what I see as the ideal (it would be something resembling Star Trek) but how I see our reality currently working. If we were to improve the current system I'd say starting with creators' works should be free, a group that under capitalism has been taken advantage of, isn't the place to start.

              It's always the musician that's the victim, not the LP printing media mogul.

              4 votes
            2. [4]
              Thallassa
              Link Parent
              At least starfield actually came out and can be played and enjoyed, unlike all the fan projects in the same engine with even longer dev times like beyond skyrim (not even one province is...

              At least starfield actually came out and can be played and enjoyed, unlike all the fan projects in the same engine with even longer dev times like beyond skyrim (not even one province is complete). I don’t think your proposal would actually lead to more art being made on the whole.

              1 vote
              1. [2]
                Wes
                Link Parent
                Beyond Skyrim - Bruma has released, so there is at least one. They continue to work on other provinces. From what I understand a lot of work is cross-province, which makes it more difficult to...

                unlike all the fan projects in the same engine with even longer dev times like beyond skyrim (not even one province is complete)

                Beyond Skyrim - Bruma has released, so there is at least one. They continue to work on other provinces. From what I understand a lot of work is cross-province, which makes it more difficult to show off.

                Other fan projects like Skyblivion haven't yet released, but they publish regular video updates, and just recently published some standalone mods for the systems they've created. One of which is a new lockpicking minigame that looks very nice.

                Skywind also posts regular updates, as does Project Tamriel. I know it's frustrating they're not playable yet, but they are very involved projects, and they have few hands contributing. Most of these projects are developed by volunteers working in their free time, whereas Bethesda had 300-400 employees working on Starfield.

                I'd like to also mention OpenMW. It's not in the same engine, but has been developed and polished to the point of being the best way to play Morrowind. There's even a multiplayer fork that is very good.

                1 vote
                1. Thallassa
                  Link Parent
                  Bruma is only part of the Cyrodiil province/story, and the rest of it is nowhere near done. It’s unfinished, just like all the other projects mentioned. Better suggestions would be Enderal,...

                  Bruma is only part of the Cyrodiil province/story, and the rest of it is nowhere near done. It’s unfinished, just like all the other projects mentioned.

                  Better suggestions would be Enderal, Vigilant, and Glenmoril, except 1) Enderal was only two people and 2) Vigilant and Glenmoril is just one person. The scope is necessarily smaller.

              2. kingofsnake
                Link Parent
                Crap - I meant Star Citizen. My bad.

                Crap - I meant Star Citizen. My bad.

        2. [5]
          teaearlgraycold
          Link Parent
          Plenty of things that are made in factories are items that artisans could be making by hand. But we’ve systematized that labor. I don’t think the division you’ve drawn between factory-style...

          Plenty of things that are made in factories are items that artisans could be making by hand. But we’ve systematized that labor. I don’t think the division you’ve drawn between factory-style creative labor and normal factories makes much sense.

          But I do want people to have comfortable work weeks and jobs they enjoy. Somehow that’s what I’ve found. It’s something everyone should have.

          3 votes
          1. [4]
            mordae
            Link Parent
            I meant that culture should be participatory. It's insane that it's factory produced and passively consumed. We should aim to have it diffused a lot more. I am not convinced that it's beneficial...

            I meant that culture should be participatory. It's insane that it's factory produced and passively consumed. We should aim to have it diffused a lot more.

            I am not convinced that it's beneficial to us to have so many professional artists. It robs the rest of us of the ability to participate. Have artists help with the mundane and have more people join in on the fun.

            What we now have reminds me of a Black Mirror episode.

            2 votes
            1. [3]
              Thallassa
              Link Parent
              Can you please explain how the existence of professional art prevents you from creating art? You can, like, just make art. It’s allowed. There’s not even any standard you need to meet!

              Can you please explain how the existence of professional art prevents you from creating art?

              You can, like, just make art. It’s allowed. There’s not even any standard you need to meet!

              8 votes
              1. [2]
                mordae
                (edited )
                Link Parent
                Oh it does not prevent anyone from creating, it's just that by saturating everyone's attention with professional, well-made and efficiently delivered artifacts does not really leave a lot of space...

                Oh it does not prevent anyone from creating, it's just that by saturating everyone's attention with professional, well-made and efficiently delivered artifacts does not really leave a lot of space for anybody else than well-funded professionals to contribute to culture, i.e. be perceived by a non-trivial amount of people.

                E.g. people go to cinema to see a Marvel movie, because it's there and it's a colorful distraction, but is it something they are actually invested in? Something they would talk about with others? Something they would like to build upon themselves? For a small minority possibly yes, but the majority would probably forget they ever went in a week or so.

                You could always argument that it's just an inefficient marketing that failed to suggest a better piece of content, but at some point the producers will want to make sure it's their content that gets seen and since they have the attention and they keep producing and people keep going...

                Other channels sell outrage instead of apathy, but the loop is largely the same.

                1 vote
                1. Oslypsis
                  Link Parent
                  I would argue that not all art has to have some sort of higher purpose or meaning. I like abstract art and microscope photography for this reason. It's just cool to look at. Sometimes, a...

                  I would argue that not all art has to have some sort of higher purpose or meaning. I like abstract art and microscope photography for this reason. It's just cool to look at. Sometimes, a mind-numbing movie is what I need. Especially with so much bad news being posted all the time.

                  4 votes
      2. [2]
        delphi
        Link Parent
        I'd love to. Sorry for taking a week, I don't check Tildes that often - see, I personally believe that what I'm just going to call "content" (for lack of a better term, it also includes art) is...

        I'd love to. Sorry for taking a week, I don't check Tildes that often - see, I personally believe that what I'm just going to call "content" (for lack of a better term, it also includes art) is more or less "up for grabs" the second it touches the public space. In this case, that's the internet. Let's say I make a thing, maybe an album. I write the songs, I record them, I put it online for people to download. As far as I'm concerned, this is where my involvement ends. I could maybe put up a donation box, I could even gate access to the downloads through a paywall, but I will have to live with the fact that that paywall can be bypassed at any time. That's unfortunately just a fact of life on the internet. Piracy is rampant, but it's just as much of a facet of digital life as sunshine and clouds are to the great outdoors. It's not inherently good or bad.

        Now, here's where it gets tricky. By my own values and understanding, as soon as I have released the album onto the internet, it ceases to be mine. It is now part of the public domain, and legally, I should not be able to do anything about someone taking it and selling it on a larger storefront. This is something I accept. Yes, someone could come along and start reselling my work, the work that I made, the thing I put blood, sweat and tears into. I accept this possibility. Way I see it, it's inevitable anyways, so might as well lower the expectations.

        I don't think I need to explain that this is obviously a utopian vision of the future. Ideally, I should be able to "not care" about "theft" of my work because that's really only an issue if creating that content comes from a place of economic motivation. If I made my living off of making songs, this is obviously an issue - but in the ideal society (or at least my ideal society) this wouldn't matter at all. I don't have to worry about someone reselling or resharing my art without "permission" because simply having made the thing and knowing that I've made the thing is enough. I don't care about reputation, and I don't care about money. That's a privilege I have, and it's one I believe should be a given for every artist.

        Do I think the artist should be compensated for their work? Of course I do. Have independent artists fought back in court about plagiarism and won? Sure, even if those wins were few and far between. Is the whole notion of intellectual property a holistically bad idea with no benefits to society? No, of course not. But on the whole, would art benefit from completely stripping out the economic component? I'd argue, maybe. I'm convinced enough to try it.

        Let me quote @mordae here, because despite all of this I believe in attribution where you can, just out of courtesy. Culture shouldn't be a commodity. That's I think what I mean to say when I advocate for "complete freedom of information". I love creating, I love writing, drawing, programming. I'm an artist at heart, and I don't think anything - not "GenAI", not IP law, and certainly not piracy - could ever stop me from creating. But on the contrary, the corporations of the world could easily force me to stop if I veer too close to "their copyright". I find that unacceptable. Creation is a collaborative process, and restricting access to creative works in any way betrays almost a certain vitriol on the part of the people pushing it. Not to mention that IP law has literally destroyed media before and made it impossible to preserve in some cases.

        And let me preempt the obvious criticism here - what if someone stole all your art and reposted it claiming it as theirs? Well, that sucks, sure, but I'll sleep in the bed that I've made. Under the current system, would that be possible? Yes. Would it be illegal? Let's say yes. But would that stop the bad actor?

        So as far as I'm concerned, embrace information anarchy. Get the ladders. Climb the paywalls. Download as much as your hard drives can hold, print every poster you can find and toss them in the air out on the streets. Read every book, sing every song, write fanfic for every character ever dreamt of, make pirate plugins for Photoshop, paint Spider-Man on your coffin. The only thing you have to lose is your solitude.

        3 votes
        1. mordae
          (edited )
          Link Parent
          Plagiarism is bad for both the creators and consumers, though. I have zero issues with authors being anonymous or pseudonymous, it would be unfair to force them to always sign their gifts, but as...

          Plagiarism is bad for both the creators and consumers, though. I have zero issues with authors being anonymous or pseudonymous, it would be unfair to force them to always sign their gifts, but as a consumer I prefer clean metadata. Plagiators seriously hamper everyone's ability to discover and track works they like and that sucks independently of possible loss of compensation and reputation to the author.

          From the authors point of view, having your me attached to a work of art or science is 2nd best shot at immortality of sorts. (First having a physical constant or unit named after you.) Taking that away unfairly sucks.

    3. teaearlgraycold
      Link Parent
      I’m in favor of radical change. But we can do big things within the current framework of the IP system. Reduce the copyright term, maybe down to 20 years? Software should have it even shorter. 10...

      I’m in favor of radical change. But we can do big things within the current framework of the IP system. Reduce the copyright term, maybe down to 20 years? Software should have it even shorter. 10 years and it shouldn’t be patentable.

      5 votes
  3. [3]
    GunnarRunnar
    Link
    Sure but I'd say it's only fair that the tools using that are also free as well as the content generated using it.

    Sure but I'd say it's only fair that the tools using that are also free as well as the content generated using it.

    12 votes
    1. [2]
      xzw
      Link Parent
      Is it really free if the content creator earns ad revenue for the clicks for example? Or AI just steals money by copying the content and making the information available elsewhere?

      Is it really free if the content creator earns ad revenue for the clicks for example? Or AI just steals money by copying the content and making the information available elsewhere?

      6 votes
      1. Grzmot
        Link Parent
        Scrapers don't click the ads you show next to your content. If they even load in the first place cause scrapers tend to be HTML only. So they actually maker your life worse as someone showing ads,...

        Scrapers don't click the ads you show next to your content. If they even load in the first place cause scrapers tend to be HTML only. So they actually maker your life worse as someone showing ads, because they reduce your conversion rate (percentage of people who have seen the ad and clicked on it).

        Besides, someone is using your art/text/whatever for their own very financial gain. I don't think artists would mind as much if their art got trained to be used by an AI that produces character images for people in DnD campaigns, because most of those people are not gonna be able to pay the amount of money that for every campaign/character. There at least the financial gain aspect is removed, and artists might still have an issue with it, which is fair.

        7 votes
  4. [7]
    myrrh
    Link
    Maybe a hot take, but I've kind of always agreed here. I believe in the freedom of information above all else. I also believe that IP law as a whole has done a lot more harm than good. As someone...

    Maybe a hot take, but I've kind of always agreed here. I believe in the freedom of information above all else. I also believe that IP law as a whole has done a lot more harm than good. As someone who writes things on the internet for a living, I am 100% of the belief that anything that's on the internet should be free for everyone else to use for any purpose. Call this... radical open-source, if you will. I personally haven't excluded any of the web crawlers from my pages, because I think that if they want to take my stuff, I can't do anything about it, so I might as well embrace it.

    13 votes
    1. [6]
      DefinitelyNotAFae
      Link Parent
      Interesting and unique opinion. I have to give you full credit for it. ಠಿ_ಠಿ

      Interesting and unique opinion. I have to give you full credit for it. ಠಿ⁠_⁠ಠಿ

      8 votes
      1. [5]
        Bet
        Link Parent
        Ha! I think it’s just the same person accidentally reposting themself; Delphi and myrrh(a) are both featured in Greek mythology.

        Ha! I think it’s just the same person accidentally reposting themself; Delphi and myrrh(a) are both featured in Greek mythology.

        2 votes
        1. [4]
          DefinitelyNotAFae
          Link Parent
          I think it was quite intentional and done in response to the initial post, given the delay between the two. I associate myrrh more with Christian mythology than Greek mythology despite Myrrha, but...

          I think it was quite intentional and done in response to the initial post, given the delay between the two.

          I associate myrrh more with Christian mythology than Greek mythology despite Myrrha, but either way it reads as purposeful.

          6 votes
          1. [3]
            Bet
            Link Parent
            Oh, yea — ‘We Three Kings’. And I meant that I just assumed it was one person with two accounts who accidentally copy-pasted their other comment into this one. So, purposefully posted, only...

            Oh, yea — ‘We Three Kings’.

            And I meant that I just assumed it was one person with two accounts who accidentally copy-pasted their other comment into this one. So, purposefully posted, only muddled their alts, as this account’s writing is very distinct.

            Also: This is noise, y’all. Just an interesting sidetrack to meander down.

            2 votes
            1. [2]
              DefinitelyNotAFae
              Link Parent
              Right, I got what you were saying. I'm assuming this is the second person copying the first person's post because that first person said it was fine to copy stuff on the internet. ┐( ˘_˘)┌

              Right, I got what you were saying.

              I'm assuming this is the second person copying the first person's post because that first person said it was fine to copy stuff on the internet. ┐⁠(⁠ ⁠˘⁠_⁠˘⁠)⁠┌

              6 votes
              1. Bet
                Link Parent
                Makes the most sense, but how disappointing that is, because a Dean Browning-esque flub is always a treat to see.

                Makes the most sense, but how disappointing that is, because a Dean Browning-esque flub is always a treat to see.

                2 votes
  5. [2]
    mordae
    Link
    I would love to abolish imaginary property rights completely and help build a free model that trains on high quality content that would could be used by anybody, free of charge and any...

    I would love to abolish imaginary property rights completely and help build a free model that trains on high quality content that would could be used by anybody, free of charge and any limitations.

    It's pretty unfair that the rich are vacuuming up the internet and then not making the models freely available to the rest of us.

    5 votes
    1. feanne
      Link Parent
      https://huggingface.co/Mitsua/mitsua-diffusion-one - "Mitsua Diffusion One is a latent text-to-image diffusion model, which is a successor of Mitsua Diffusion CC0. This model is trained from...

      https://huggingface.co/Mitsua/mitsua-diffusion-one - "Mitsua Diffusion One is a latent text-to-image diffusion model, which is a successor of Mitsua Diffusion CC0. This model is trained from scratch using only public domain/CC0 or copyright images with permission for use, with using a fixed pretrained text encoder (OpenCLIP ViT-H/14, MIT License)."

      2 votes
  6. [8]
    archevel
    Link
    I dislike Microsoft in general and I think their CEO of AI is wrong w.r.t. the content being "fair use" because it is online. That AFAICT has nothing to do with fair use. However, should they be...

    I dislike Microsoft in general and I think their CEO of AI is wrong w.r.t. the content being "fair use" because it is online. That AFAICT has nothing to do with fair use.

    However, should they be allowed to use online content for training? In isolation, I think, the answer ought to be yes. The argument for this is that training an AI is essentially IIUC counting the occurances of combination of words and build a model of how likely other words are given the previous ones. Then when we use an LLM we essentially roll a dice and use that to pick the word sequence to follow a given input. So should the result of counting word combinations be considered a derivative work of the text examined? I don't think it is obvious either way.

    Complicating things further if I use a prompt like:

    Write the first chapter of the sequel to [book title] by author [author name]

    I think the output should be considered derivative if I use "A Game of Thrones" and "George R.R. Martin" even if his exact works were not part of the training set. If I'd instead used "A Play of Benches" and "Gabriel B.N. Sinclair" (both made up) then no matter if Martin's work is part of the training set I wouldn't consider the output derivative of his particular work.

    The real complication comes when we start to consider the effect of this stance. Large entities would be perfectly free to train LLMs on any corpus (you would too, but you can't because you don't have access to enough compute power). Creative work will likely be further devalued. The quality of content will likely regress to the mean (especially if corporations start training on texts generated by AI).

    For me the preferred solution (however unlikely) would be to abolish property rights (both physical and intellectual). If people want to generate AI content, they should be allowed to (it does little harm). A handwritten message from someone will always be more valuable than something mass-produced, be it by AI or an AD-agency/copywriter.

    2 votes
    1. [3]
      skybrian
      Link Parent
      It might sound like counting word occurrences shouldn’t be this powerful, but it’s actually pretty easy for LLM’s to memorize things from word occurrences, given enough training. Knowing what...

      It might sound like counting word occurrences shouldn’t be this powerful, but it’s actually pretty easy for LLM’s to memorize things from word occurrences, given enough training. Knowing what usually comes after “Fourscore and” isn’t that hard a pattern to learn.

      Since this is effectively copying, it means that restrictions on when copies are okay do often apply.

      The rules around copyright are pretty complicated, though. Quotes and paraphrases are often reasonable alternatives, and LLM’s can do that too.

      Some rules are pretty high-level. Reusing characters from another book isn’t okay. At that point, we’re not talking about word-for-word copying anymore.

      5 votes
      1. [2]
        archevel
        Link Parent
        Just because something can regurgitate a text if properly prompted doesn't mean the thing by existing is violating copyright. Learning that a particular pattern is common just influences the...

        Just because something can regurgitate a text if properly prompted doesn't mean the thing by existing is violating copyright. Learning that a particular pattern is common just influences the weights of the likely next word, but actually getting the word for word copy of a text does depend on a few of other parameters too. Also I am not arguing whether or not an LLM or it's output violates copyright law as is. I am arguing that it shouldn't be considered to violate it, i.e. if things were how I want them to be.

        1. skybrian
          Link Parent
          That's a question of intent versus results. If you memorized something and then wrote it from memory, I think that still counts as a copyright violation? (Possibly unintentional.) For academic...

          That's a question of intent versus results. If you memorized something and then wrote it from memory, I think that still counts as a copyright violation? (Possibly unintentional.)

          For academic work, you should either quote or paraphrase. (And footnote either way.)

          But if you catch it before publishing, no harm, no foul.

          This suggests that sharing the weights might not be a copyright violation, but it could be used that way. It's a hazard people will be concerned about.

          Github Copilot had a filter you could set to avoid copyright violations. Some say it's not enough, but at least they tried.

          2 votes
    2. [4]
      papasquat
      Link Parent
      Abolishing property rights would be completely turning all of society upside down. It instantly renders any sort of creative output worthless from a monetary standpoint; any creative or productive...

      Abolishing property rights would be completely turning all of society upside down. It instantly renders any sort of creative output worthless from a monetary standpoint; any creative or productive field would be totally vaporized if the clients and companies that hire artists them don't have any claim on their output.

      5 votes
      1. [2]
        mordae
        Link Parent
        Exactly! Just like open source doesn't exist, all art would disappear immediately! /s Nope. Just the art business. And not even all of it. Many artists are able to build an audience that sustains...

        Exactly! Just like open source doesn't exist, all art would disappear immediately! /s

        Nope. Just the art business. And not even all of it. Many artists are able to build an audience that sustains them.

        2 votes
        1. papasquat
          Link Parent
          Not sure if you're serious, but most major open source projects are funded mostly by large commercial entities. Most development on them would disappear without that funding. Once projects reach a...

          Not sure if you're serious, but most major open source projects are funded mostly by large commercial entities. Most development on them would disappear without that funding. Once projects reach a certain level of complexity, it's not longer realistic to expect people to work on them for free in their off time, and if commercial software ceases to exist because they can't sell that software anymore, it would stop being funded.

          10 votes
      2. archevel
        Link Parent
        Yes. One can always dream!

        Yes. One can always dream!

        1 vote
  7. [2]
    randomperson
    Link
    All Microsoft content and software is freeware [and can be used by me to crack it] because I can download it freely from Microsoft servers.

    All Microsoft content and software is freeware [and can be used by me to crack it] because I can download it freely from Microsoft servers.

    23 votes
    1. chocobean
      Link Parent
      Exactly, the cracked version is a derivative product since I re-packed it with my dogs name on it. Copies are explicitly not allowed but this is a derivative from freeware.

      Exactly, the cracked version is a derivative product since I re-packed it with my dogs name on it. Copies are explicitly not allowed but this is a derivative from freeware.

      12 votes