21 votes

Wikipedia is finally asking Big Tech to pay up

14 comments

  1. [9]
    p4t44
    Link
    This article fails to mention that Google already has contributed significantly to Wikipedia, from google.org in 2019 It should also be noted that The Wikimedia Foundation hasn't written any (or...

    This article fails to mention that Google already has contributed significantly to Wikipedia, from google.org in 2019

    To that end, Google.org is donating $2 million to the Wikimedia Endowment, the first of Google’s contributions to its fund for long term sustainability. This brings our total support to more than $7.5 million, which includes an additional $1.1 million to the Wikimedia Foundation annual fund during a special campaign last year where Google employees helped decide where to direct Google's donation dollars.

    It should also be noted that The Wikimedia Foundation hasn't written any (or not very many) articles. They are written by users who released them under a CC license, I think that Wikipedia can't relicense such content without author consent. So no company can ever be obligated to pay for Wikipedia. Only way this is going to work is if Wikimedia can offer a worthwhile service (they almost certainly can't) or if they pressure companies into paying (imo more likely). Wikipedia's seemingly unnecessarily large spending is probably relevant here.

    19 votes
    1. [3]
      Atvelonis
      Link Parent
      Wikipedia's CC-BY-SA 3.0 license states that the only thing needed for transmission or remix is attribution to the original and a similar license for the new work, including for commercial usage....

      They are written by users who released them under a CC license, I think that Wikipedia can't relicense such content without author consent.

      Wikipedia's CC-BY-SA 3.0 license states that the only thing needed for transmission or remix is attribution to the original and a similar license for the new work, including for commercial usage. The Wikimedia Foundation can't force anyone to pay for their content because it's already free, but the article seems to be more focused on administrative partnerships with tech companies than anything explicitly oppositional. Wikimedia Enterprise is supposed to offer a financial incentive to companies using Wikipedia's data because the cost and convenience of a dedicated support service from the Wikimedia Foundation outweighs having massive internal teams at each company trying to work with the database dumps themselves. It's an opportunity for tech companies to cut expenses by ceding a bit of ground to the Foundation as far as data handling is concerned, but without affecting the actual content they're able to use. All involved parties theoretically benefit from this arrangement.

      Wikipedia's seemingly unnecessarily large spending is probably relevant here.

      That userpage makes some good points about spending, but I'd be critical of its sensational comparison to cancer. It's a truism that economic growth can't be infinite in a finite system, and I suspect that this argument is primarily being made because the author is ticked off about some products from the Wikimedia Foundation that they personally didn't like. Products like the VisualEditor are mostly intended for people who aren't editors yet, so the opinions of powerusers who will only ever use Source, while worth considering, are not the focus. Such opinions are also not backed by a data analytics team. There are certainly ways that the Wikimedia Foundation could improve its transparency, communication, and developmental efficiency, but the company develops a lot of products that are inevitably going to evolve alongside the increasing complexity of the internet. We see the same thing with Wikia/Fandom. Wikipedia has a lot of inertia right now, but you can't get away with complacency in SEO and data organization. It's always changing. Sites like Wikipedia are aberrations in the model; their very unique way of structuring content necessitates that they give a lot of attention to maintaining their implementation if they're going to continue doing well in search results.

      5 votes
      1. [2]
        skybrian
        Link Parent
        I largely agree, but the argument that Wikipedia can’t afford to be complacent doesn’t seem that strong to me because of all websites, Wikipedia is probably the most secure in its search rankings....

        I largely agree, but the argument that Wikipedia can’t afford to be complacent doesn’t seem that strong to me because of all websites, Wikipedia is probably the most secure in its search rankings. Maybe someday they’ll have stronger competition, but if so, it hasn’t appeared yet.

        2 votes
        1. Atvelonis
          Link Parent
          It's a challenging argument to make because Wikipedia has so much inertia at the moment—but this is a story we've seen many times before. Nothing can rival the site's breadth, but it has plenty of...

          It's a challenging argument to make because Wikipedia has so much inertia at the moment—but this is a story we've seen many times before. Nothing can rival the site's breadth, but it has plenty of competitors within each of its many content categories. Those with exceptional SEO practices have the capacity to dominate their individual content niches if Wikipedia starts to slack off. If Wikipedia ever loses out to the competition, it will be a death by a thousand cuts, and probably so slow that the urgency of fixing its rankings will never become sufficiently obvious to casual readers until it's too late.

          The way to avoid this is to never let competition get a real foothold in the first place—a proactive strategy. Innovative R&D is obviously expensive, but Wikipedia is systemically vulnerable to apathy because its not-for-profit and volunteer-based model relies on a set of high-minded ideals as a motivating factor for expansion and maintenance. These ideals are of course what make Wikipedia special, though I know from experience that they're also easy to lose faith in (see: smaller wikis with serious competition). Wikipedia survived its early years in large part by being the underdog to traditional media, but a hypothetical Wikipedia in decline—with competitors controlling more and more of its niches—would lack that same kind of idealistic appeal. It can't risk falling behind because its cushion isn't just financial, it's personal.

          I'm speculating here, but casual editors will probably stop contributing to Wikipedia if they feel it's a lost cause. Powerusers have always been pretty out of touch with readers, and under such circumstances will be inclined toward the idea that the project's nature as a non-profit will outlast all competition (which they perceive as fleeting), dismissing the urgency of the situation altogether. Wikipedia's top editors are simultaneously the best and worst sources of information about the encyclopedia out there, making our analysis difficult.

          7 votes
    2. [5]
      AugustusFerdinand
      Link Parent
      I didn't gather that they were licensing the content, just access, which tons of sites already do. They have a way, the data dump, for companies to get relatively easily parsed data from them. Now...

      It should also be noted that The Wikimedia Foundation hasn't written any (or not very many) articles. They are written by users who released them under a CC license, I think that Wikipedia can't relicense such content without author consent.

      I didn't gather that they were licensing the content, just access, which tons of sites already do. They have a way, the data dump, for companies to get relatively easily parsed data from them. Now they are offering a fee-based API, just as tens of thousands of other companies do.

      Only way this is going to work is if Wikimedia can offer a worthwhile service (they almost certainly can't) [...]

      Why do you believe they can't?

      4 votes
      1. [4]
        p4t44
        Link Parent
        I can't see how Wikimedia could do anything with Wikipedia data that Google couldn't do themselves

        Why do you believe they can't?

        I can't see how Wikimedia could do anything with Wikipedia data that Google couldn't do themselves

        2 votes
        1. [2]
          snowcrash
          Link Parent
          I agree with you, but still think Google may pay, essentially just for the optics. Even if Wikipedia decides to only permit free access via a "data dump", Google could easily parse and recreate...

          I agree with you, but still think Google may pay, essentially just for the optics.

          Even if Wikipedia decides to only permit free access via a "data dump", Google could easily parse and recreate this dump and build a pipeline for it. Even if the data wasn't up-to-the-minute updated, it'd likely be sufficient. Not to mention that if Wikpiedia is public, Google likely can legally scrape it even if Wikiipedia does not allow it. (See Hi Q vs. LinkedIn. And if LinkedIn is scrapable, Wikipedia's gotta be, right?).

          ...but does Google want those negative stories? Especially now? Probably not. In this pessimistic take it's more like a publicity tax rather than a legitimate feature, but this is my uniformed couch-analyst position.

          Perhaps there really is value in having an API, and maybe they can add value in other ways. I haven't given it much thought, and they clearly have, so I'll be watching it with interest.

          1 vote
          1. skybrian
            Link Parent
            There's an assumption that Google has enormous resources, which is true, but they are split across many, many projects, and the people doing the work to ingest Wikipedia data will be on a team...

            There's an assumption that Google has enormous resources, which is true, but they are split across many, many projects, and the people doing the work to ingest Wikipedia data will be on a team that has limited headcount and budget. Google pays its engineers a lot, which means the costs of doing things internally can be high, and they might not be that interested in doing extra work when they could instead be building on someone else's work.

            But that depends on how happy they are with the system they already have, versus how stable and organized the new system from Wikimedia looks to them. Is what they're already doing something they're happy with, or a legacy system held together with duct tape? It's not something we can see from the outside.

            1 vote
        2. AugustusFerdinand
          Link Parent
          Sure, but... Using an API is miles easier (and cheaper) than managing a data dump and, if they wanted, Wikipedia could change the format of the dump every month (or abandon it entirely) just out...

          Sure, but...

          “They all have teams dedicated to Wikipedia management—big ones,” Becker said, adding that making the different content speak to each other required “a lot of low-level work—cleaning and managing—which is very expensive.”

          Using an API is miles easier (and cheaper) than managing a data dump and, if they wanted, Wikipedia could change the format of the dump every month (or abandon it entirely) just out of spite and make it impossible for Google to keep up.

          1 vote
  2. skybrian
    Link
    From the article: [...]

    From the article:

    For years now, Wikipedia has made freely available a snapshot of everything that appears on the site every two weeks—a so-called “data dump” for users—as well as a “fire hose” of all the changes as they are happening, delivered in a different format. This is how big companies typically import Wikipedia content into their platforms, with no special help from the foundation.

    “They all have teams dedicated to Wikipedia management—big ones,” Becker said, adding that making the different content speak to each other required “a lot of low-level work—cleaning and managing—which is very expensive.”

    The free, albeit clunky option will still be available to all users, including commercial ones. This means that Wikimedia Enterprise’s principal competition, in the words of Lisa Seitz-Gruwell, the foundation’s chief revenue officer, is Wikipedia itself.

    But the formatting problems with the free version offer an obvious opportunity to create a product worth paying for, one tailored to the requirements of each company. For example, Enterprise will deliver the real-time changes and comprehensive data dumps in a compatible format. There will also be a level of customer service typical of business arrangements but unprecedented for the volunteer-directed project: a number for its customers to call, a guarantee of certain speeds for delivering the data, a team of experts assigned to solve specific technical flaws.

    In another break for a project like Wikipedia, which was conceived as part of the world of free software, Enterprise will host its version of Wikipedia content not on the project’s own servers but on Amazon Web Services, which it says will allow it to meet the needs of its customers better. In explanatory materials, the foundation takes pains to justify the decision and stresses that “it is not contractually, technically, or financially bound to use AWS infrastructure.”

    [...]

    The Foundation says it doesn’t expect Enterprise ever to become the primary source of funding for the foundation’s roughly $100 million budget. User donations, supplemented by grants, should still carry most of the load, Seitz-Gruwell says, but having a reliable additional revenue stream from companies would offer stability for the foundation, particularly as it embarks on an ambitious agenda for the year 2030 to reach more parts of the world and more communities with “free knowledge.”

    3 votes
  3. [4]
    ImmobileVoyager
    (edited )
    Link
    Wired itself is part of the Big Tech : and that's not counting the 32 requests intercepted by uBlock Origin. Domains that went through unnoticed albeit spurious imho include − getpublica.com :...

    Wired itself is part of the Big Tech :

    Privacy Badger (privacybadger.org) is a browser extension that automatically learns to block invisible trackers. Privacy Badger is made by the Electronic Frontier Foundation, a nonprofit that fights for your rights online.

    Privacy Badger blocked 4 potential trackers on www.wired.com:

    c.amazon-adsystem.com
    securepubads.g.doubleclick.net
    news.google.com
    z.moatads.com

    and that's not counting the 32 requests intercepted by uBlock Origin.

    Domains that went through unnoticed albeit spurious imho include

    getpublica.com : "the connected TV advertising platform"

    onetrust.com : "Privacy Management Solutions"

    − a mysterious publica-ctv.com, whereabouts unknown, whois being obfuscated. And oh, TIL that whois is now served by Amazon Registrar, Inc.

    Maybe resistance has become futile. Are we assimilated already ?

    The author of the article is probably not completely aware of whence his salary comes from, and that's somehow frightening.

    2 votes
    1. [3]
      skybrian
      Link Parent
      Professional writers don't have any control over the technology that newspapers and magazines use. (They don't usually pick the headlines either.) The only writers who get full control are...

      Professional writers don't have any control over the technology that newspapers and magazines use. (They don't usually pick the headlines either.)

      The only writers who get full control are self-published, such as bloggers running their own tech stack. If we limited our linking to websites that don't use this kind of adtech, there wouldn't be much to share.

      10 votes
      1. [2]
        ImmobileVoyager
        Link Parent
        So, resistance is futile and we are assimilated ? Your remarks are, afaik, correct. Your conclusion, not so much, imho. Some of us might find it useful or informative to share our observations of...

        So, resistance is futile and we are assimilated ?

        Your remarks are, afaik, correct. Your conclusion, not so much, imho. Some of us might find it useful or informative to share our observations of how "adtech" is creeping into every nook and crany of the WWW and even the underlying internet (AWS), and on how this trend is accelerating. I've read on-line Wired for years, and those 32 blocked domains came as a shock.

        Sharing URLs is good, and has been for long. Sharing with an understanding of the plague one spreads is better.

        About "much to share", we'll talk later about a lot, information density, discovery

        2 votes
        1. skybrian
          Link Parent
          Anyone who wants can install an adblocker like you did. That doesn’t seem like giving up? If you’re worried about protecting people from adtech then that’s going to be a lot more comprehensive...

          Anyone who wants can install an adblocker like you did. That doesn’t seem like giving up? If you’re worried about protecting people from adtech then that’s going to be a lot more comprehensive than anything Tildes can do by vetting one link at a time. Plus, you can still read the article.

          I think anyone who’s been paying attention knows that all ad-supported news sites are full of this sort of thing. That’s why they load so slowly.