23 votes

DigitalOcean's Hacktoberfest hurts open-source maintainers by incentivizing low-quality, unsolicited pull requests

36 comments

  1. [6]
    Deimos
    (edited )
    Link
    This video from an Indian YouTuber with almost 700k subscribers seems to be the source of a lot of the spam, he shows the exact process that a lot of these spammy PRs are copying, where he adds...

    This video from an Indian YouTuber with almost 700k subscribers seems to be the source of a lot of the spam, he shows the exact process that a lot of these spammy PRs are copying, where he adds "An Amazing Project" next to the project name in the docs and sets the title of the PR to "Improved Docs".

    You can see from the search link I posted in my comment yesterday that this exact spam is still happening at an extremely high rate.

    It sounds like the DigitalOcean employee is making some progress, but it's unclear if these changes will actually be implemented (or if it would even stop people from naively continuing to make the PRs anyway):

    Some progress has been made on the spam front -- drafted new logic in the app to automatically ban spammy users and to require PRs be merged or actively labelled as accepted, both new features if needed. Falling asleep at this point, hopefully tomorrow will be a better day

    Edit: someone also set up @shitoberfest on Twitter for people to send especially bad "contributions" to.

    12 votes
    1. [5]
      viridian
      Link Parent
      That channel is very strange. It's almost like programmer... fetishism? Or maybe programmer aestheticism/lifestyle is more accurate. Tons of programming in joke spam, and the actual educational...

      That channel is very strange. It's almost like programmer... fetishism? Or maybe programmer aestheticism/lifestyle is more accurate. Tons of programming in joke spam, and the actual educational content is literally stuff like how to write a for loop in java, how to build a string, how to write a while loop, etc.

      edit: also this github page is literally just the kinds of low effort commits that this thread is talking about: https://github.com/CodeWithHarry

      5 votes
      1. [3]
        aditya
        Link Parent
        This is because being able to code and working in a software job is seen as a surefire way out of often terrible financial situations. You can be the first of your family in generations to have an...

        This is because being able to code and working in a software job is seen as a surefire way out of often terrible financial situations. You can be the first of your family in generations to have an opportunity to get them out of poverty. Source: I grew up in India.

        4 votes
        1. arp242
          Link Parent
          I've had some people from countries like India really praise me like I'm some sort of programming God over the years. I like the compliments, I guess, but also makes me rather uncomfortable. I do...

          I've had some people from countries like India really praise me like I'm some sort of programming God over the years. I like the compliments, I guess, but also makes me rather uncomfortable.

          I do like the idea of "programming out of poverty" though. Essentially anyone with a cheap old laptop and basic internet connection can learn to code if they spend enough time. Actually, I learned to code without an internet connection as I didn't have any money for it at the time (internet was less essential for regular life 15-20 years ago). In many ways it can be a great way out of poverty.

          But ... learning to code is hard. And take a lot of time and practice. And I also think it's something you need to have a certain aptitude for so it's not an option for everyone either. It's by no means a "quick way" to get out of poverty.

          3 votes
        2. viridian
          Link Parent
          I get that to some extent, I work with a local community college for their web development program in a very poor area of the US, and the program also bears some of these markings. I try to...

          I get that to some extent, I work with a local community college for their web development program in a very poor area of the US, and the program also bears some of these markings. I try to discourage it pretty heavily though, and encourage people to just build demonstrable projects, and be able to understand and explain what they built. The aestheticism stuff is a waste of time, and sort of feels like cargo culting.

          3 votes
      2. RNG
        Link Parent
        Looks like he took down his GitHub

        Looks like he took down his GitHub

  2. [6]
    stu2b50
    Link
    While I don't disagree with the idea that it causes "harm", I do think this article purports more malicious intent than there really is For instance ... I mean I can't see any actual gain DO has...

    While I don't disagree with the idea that it causes "harm", I do think this article purports more malicious intent than there really is

    For instance

    In reality, Hacktoberfest is a corporate-sponsored distributed denial of service attack against the open source maintainer community.

    ...

    we can remember that this is how DigitalOcean treats the open source maintainer community, and stay away from their products going forward.

    I mean I can't see any actual gain DO has for spamming open source maintainers on github. I really don't think it's some malicious plan, and that "this is how DigitalOcean treats the open source community". Clearly this is a negative side effect of DO trying to get more people to contribute to open source (with their gain being advertising and increased dev support).

    11 votes
    1. [5]
      Deimos
      (edited )
      Link Parent
      Yeah, I don't think it's malicious, and the tone of the post feels a little over-dramatic. Unfortunately, that's often what makes blog posts get attention and more likely to get a response from...

      Yeah, I don't think it's malicious, and the tone of the post feels a little over-dramatic. Unfortunately, that's often what makes blog posts get attention and more likely to get a response from the company.

      They're definitely incentivizing bad behavior from the way they set up the event though. Even the DigitalOcean employee handling the event recognizes that moderating the bad contributions would be too much work if they had to do it themselves, but putting that work on hundreds or thousands of (mostly unpaid) project maintainers who didn't opt-in to the event isn't good either.

      Here's a good example of the type of terrible "contributions" that are being created because of this, on the django-typeform repo: that's 16 PRs in the last 10 hours that are all adding random garbage to the README.

      11 votes
      1. [2]
        jgb
        Link Parent
        Cowley's attitude here is horrible. I am a bit sympathetic because he is extremely young and clearly grafting hard to try and get ahead in the industry, but he is really not grasping the amount of...

        Cowley's attitude here is horrible. I am a bit sympathetic because he is extremely young and clearly grafting hard to try and get ahead in the industry, but he is really not grasping the amount of time and motivation of top-tier engineers that this initiative is selfishly wasting.

        In truth, though, it's hard to blame him. Nearly everyday we encounter people who are unwilling or unable to multiply the modest but non-trivial demand that their actions make on any given individual by the size of their audience.

        A common example of this is the student in a lecture hall who asks a narrow question exclusively relevant to their own work or project, wasting perhaps three minutes of everyone's time. Now, three minutes isn't a lot, but if 200 people are in the lecture hall that's ten hours - nearly an entire day's worth of human time - essentially squandered.

        In the case of this stunt, you don't have to pick particularly big values for the amount of time wasted per project and the number of projects affected to derive a fairly staggering figure for the amount of high quality engineer hours squandered for the sake of a marketing campaign. And that's not even to mention the insidious cost of the context-switching necessary for an engineer to stop programming to check out a pull request, and the perhaps greater still cost of the demotivation and sheer frustration that these egregiously bad patches induce.

        Bad look DigitalOcean.

        6 votes
        1. vektor
          Link Parent
          https://twitter.com/MattIPv4/status/1311723398385541120 It's something.

          https://twitter.com/MattIPv4/status/1311723398385541120

          Some progress has been made on the spam front -- drafted new logic in the app to automatically ban spammy users and to require PRs be merged or actively labelled as accepted, both new features if needed. Falling asleep at this point, hopefully tomorrow will be a better day

          It's something.

          6 votes
      2. [2]
        Thra11
        Link Parent
        Most definitions of malice require intent to harm, which probably isn't there. However, given that they are aware of the problem and haven't immediately ceased their harmful actions, it certainly...

        Most definitions of malice require intent to harm, which probably isn't there. However, given that they are aware of the problem and haven't immediately ceased their harmful actions, it certainly looks selfish and careless. Personally, I can't imagine that nobody at Digital Ocean has heard of perverse incentives, which implies that while nobody decided that they wanted to cause harm, somebody decided to ignore the obvious potential for harm.

        Here's a good example of the type of terrible "contributions" that are being created because of this, on the django-typeform repo: that's 16 PRs in the last 10 hours that are all adding random garbage to the README.

        One thing that I haven't seen mentioned yet is that many projects, like this one, have all sorts of continuous integration tools which are automatically run on PRs to e.g. confirm that the code would still build with the PR, to check that the contribution adheres to formatting rules, check that any new code has adequate test coverage, and so on. I bet there's at least one project somewhere whose CI isn't written to handle this massive influx of garbage (because under normal circumstances, it doesn't need to be), wasting hours of compute time and causing genuine PRs to get held up in the CI queue.

        2 votes
        1. arp242
          Link Parent
          I expanded a little bit more about this at my post on Lobsters, which I won't repeat in its entirety here, but I think the DO people are just really taken aback by this. Shutting the entire thing...

          I expanded a little bit more about this at my post on Lobsters, which I won't repeat in its entirety here, but I think the DO people are just really taken aback by this. Shutting the entire thing down is pretty drastic, and when you're suddenly confronted by something like this it just takes a bit of time before you can actually organise any meaningful counter-action.

          I've been in a similar situation not just with the flea market that I mentioned in my other post, but also in an email system that was creatively abused to send out spam. It was clear it was sending out spam within a minute, but actually stopping it took some time to figure out what was going on and think of an effective counter-measure. Shutting the entire thing down would mean shutting down the email for tens of thousands of our customers, so that wasn't really an option either.

          4 votes
  3. [6]
    viridian
    Link
    I'm not really sure where I fall on this. Frankly, even one high quality contribution is a huge boon to an open source project. I think the main problem here is actually requiring five...

    I'm not really sure where I fall on this. Frankly, even one high quality contribution is a huge boon to an open source project. I think the main problem here is actually requiring five contributions. Five open source contributions from someone who isn't a maintainer is a huge request, and Hacktoberfest is largely aimed at a novice audience. I think if there were a good way to incentivize people who haven't done any open source work to make just a single substantial and positive change, the outcomes would be a lot better.

    As is, the requirements mismatch the intended audience, and the most likely outcome is a bunch of low value PRs for maintainers to sift through.

    5 votes
    1. [2]
      jgb
      Link Parent
      I agree entirely. To my own discredit, I don't make many open source contributions to other projects, but when I have done so I have usually spent the best part of an afternoon on my patch, even...

      I agree entirely. To my own discredit, I don't make many open source contributions to other projects, but when I have done so I have usually spent the best part of an afternoon on my patch, even when my change has been quite minor. Getting to grips with a new codebase is really hard, moreso if one needs adjust to the idiosyncrasies of a project's toolchain usage and programming style.

      4 votes
      1. viridian
        Link Parent
        I'm in the same boat. Contributing is a very hard thing to get into, exactly for the reasons you've outlined. You typically are trying to make changes to a project that more often than not, has...

        I'm in the same boat. Contributing is a very hard thing to get into, exactly for the reasons you've outlined. You typically are trying to make changes to a project that more often than not, has far more rigor and tooling backing it than the random code you push up at a corporate job, and you aren't exactly a trusted agent either. Learning to swim just once in that environment is a good month long task, let alone five times. The vast majority of professional developers haven't submitted a single line of code to any open source project.

        6 votes
    2. [3]
      stu2b50
      Link Parent
      I think DO just underestimated how much people wanted... a free t-shirt? Honestly I don't understand it either. Seriously? Is this much effort to dishonestly satisfy requirements worth it for a T...

      I think DO just underestimated how much people wanted... a free t-shirt? Honestly I don't understand it either. Seriously? Is this much effort to dishonestly satisfy requirements worth it for a T SHIRT?

      They probably just thought, "Hey, this will be a nice bonus to people who contribute to open source", not thinking that this alone will cause people to send in PRs.

      And honestly I still don't get it.

      2 votes
      1. tindall
        Link Parent
        It's not about a tee. My experience at a small university with a lot of people just starting a career in software engineering is that people see Hacktoberfest as an opportunity to get "in" with...

        It's not about a tee. My experience at a small university with a lot of people just starting a career in software engineering is that people see Hacktoberfest as an opportunity to get "in" with the open source community - a way of fitting in, or a community event like Inktober or NaNoWriMo. Of course, random pointless spam MRs aren't really the best way to do that, but that's exactly the kind of people that DigitalOcean want to get their service in front of and, clearly, they don't care at all about the impact they're having.

        3 votes
      2. arp242
        Link Parent
        At my last job we had a crapton of "security researchers" send us silly uninformed "security reports" in the hope of a free t-shirt as a "bounty". We didn't even have a bounty programme. Most of...

        At my last job we had a crapton of "security researchers" send us silly uninformed "security reports" in the hope of a free t-shirt as a "bounty". We didn't even have a bounty programme. Most of them were just spammy and non-issues, but they kept asking for free shirts and the like anyway, typically sending many follow-ups.

        The funniest was the one that sent us a YouTube video of him demonstrating some non-issue and typing in Notepad to exlain (with much backspace use, no audio). The entire thing was about 6 minutes and absolutely hilarious. I wish I had saved it.

        And it's not even a well-known company or anything. I mean, I managed to get Stack Overflow to send me a t-shirt once and that was kind of cool, but this was just a medium B2B SaaS company you probably never heard of (yet large enough to attract these people, it seems).

        3 votes
  4. [17]
    Deimos
    Link
    You can use GitHub search to get an idea of the level of spam that this is causing: https://github.com/search?o=desc&q=improved&s=created&type=Issues Most of those results are spam, and that isn't...

    You can use GitHub search to get an idea of the level of spam that this is causing: https://github.com/search?o=desc&q=improved&s=created&type=Issues

    Most of those results are spam, and that isn't even nearly all of it, just a specific subset where the users (bots?) are using "improved" in the title of the pull request. Even that seems to be currently happening several times a minute, and often repeatedly to the same projects (example from my other comment).

    4 votes
    1. [8]
      jgb
      Link Parent
      I wish to tread sensitively here, but I can't help but notice that virtually all these spam patches are of a similar form and are from accounts with seemingly Indian names. I can't help but wonder...

      I wish to tread sensitively here, but I can't help but notice that virtually all these spam patches are of a similar form and are from accounts with seemingly Indian names. I can't help but wonder if there is an Indian tech forum somewhere on the web that has suggested that people do this as a sort of 'life hack' or something? It seems implausible that so many people would think to try and game the system in such a similar way of their own volition.

      9 votes
      1. [2]
        Deimos
        Link Parent
        I saw a few comments on HN talking about it being a big thing with Indian students for prestige-like reasons, here's one:

        I saw a few comments on HN talking about it being a big thing with Indian students for prestige-like reasons, here's one:

        I'm sorry actually to see that most of the names in the screenshot are people from India. Hacktoberfest to some degree has turned into a madfest with most college students here. Rather than actually contributing to open source, many new repos pop up during these times where fellow college students raise a PR for nothing.

        It's the T-shirt that's the primary reason but also thr flaunting on social media as if I'm some kind of certified open source contributor.

        PS: I've also been part of Hacktoberfest launch events where some people literally created their first PR.

        5 votes
        1. jgb
          Link Parent
          It seems to me that because the primary currencies of the world of free software are - and have been for so long - reputation and professional pride, it is to some extent defenseless against...

          It seems to me that because the primary currencies of the world of free software are - and have been for so long - reputation and professional pride, it is to some extent defenseless against people who do not mind embarrassing themselves among their fellow engineers for a free t-shirt and perhaps the chance to impress their real-life peers.

          It is good to see in the other comments under that post that some projects benefit greatly from this initiative. Perhaps the epithet of 'net negative' is indeed unfair.

          2 votes
      2. [4]
        Deimos
        Link Parent
        Someone from India published this article today that seems like a good coverage of some factors in Indian culture that contribute to this happening: Why most Hacktoberfest PRs are from India

        Someone from India published this article today that seems like a good coverage of some factors in Indian culture that contribute to this happening: Why most Hacktoberfest PRs are from India

        5 votes
        1. jgb
          Link Parent
          I did see this actually, what a superb insight.

          I did see this actually, what a superb insight.

        2. [2]
          Adys
          Link Parent
          I was going to post this standalone earlier, it's an outstanding article. Do you want to post it instead?

          I was going to post this standalone earlier, it's an outstanding article. Do you want to post it instead?

          2 votes
          1. Deimos
            Link Parent
            Oh, you go ahead and post it then. I agree it's a great article that probably deserves its own submission instead of just being linked in a comment here.

            Oh, you go ahead and post it then. I agree it's a great article that probably deserves its own submission instead of just being linked in a comment here.

            2 votes
      3. PendingKetchup
        Link Parent
        India is very populous, has a lot of English speakers, and has already reached the morning of October 1st. Maybe it's just that.

        India is very populous, has a lot of English speakers, and has already reached the morning of October 1st. Maybe it's just that.

        3 votes
    2. [7]
      viridian
      Link Parent
      This is a damn shame, but looking through a few random instances, it definitely doesn't look like bots, it's just folks editing readme.md files and the like. There should probably be a hard ban on...

      This is a damn shame, but looking through a few random instances, it definitely doesn't look like bots, it's just folks editing readme.md files and the like. There should probably be a hard ban on non code contributions as well as my suggestion at the top level of this thread. Projects like the awesome software lists may suffer a bit, but it has to be a net good all things considered.

      3 votes
      1. vektor
        Link Parent
        Disagree. Writing good documentation is hard and time consuming, while also not being so technical and detail-oriented as to be inaccessible. Providing actually useful documentation is an...

        There should probably be a hard ban on non code contributions as well as my suggestion at the top level of this thread.

        Disagree. Writing good documentation is hard and time consuming, while also not being so technical and detail-oriented as to be inaccessible. Providing actually useful documentation is an important task that the actual devs often don't have the time to do adequately, while sometimes just trying to use some software, working through the pitfalls and documenting that process in a way that helps people avoid the pitfalls - that can already be value added to a repo. Hell, a beginner is often the only person who can even ask the right questions here.

        I dunno, I'd think the best way to deal with this is to require that maintainers accept a PR with a explicit endorsement of the hacktoberfest contribution. Basically, if the maintainer says "I understand this is for hacktoberfest and I appreciate this contribution enough to want them to give the guy a shirt!", then that PR counts. Adjust the required number down as desired. Put the endorsement into a snappy hashtag, done.

        That in and of itself doesn't prevent people from building decoy repos to contribute to, but at least then they don't get on everyone's nerves, just DO's apparel purser's.

        5 votes
      2. [2]
        arp242
        Link Parent
        I've sometimes sent PRs which just rewrite the documentation when it's a cool project I like, but made by someone who doesn't speak good English so the docs are kinda awkward. This has always been...

        There should probably be a hard ban on non code contributions as well as my suggestion at the top level of this thread.

        I've sometimes sent PRs which just rewrite the documentation when it's a cool project I like, but made by someone who doesn't speak good English so the docs are kinda awkward. This has always been greatly appreciated by the project owner, and why shouldn't it be? It makes their project better. It's also something that's typically an hour or more of work to do well.

        I generally do my best to write clear and concise documentation; I actually like doing that kind of thing, and it's not always easy to clearly explain how to use an API or program. Especially if you do it for a project you're new to, to do it well you need to first understand the project, which usually requires reading a fair bit of code to understand what it does exactly. It requires quite a bit of attention to detail to spot and communicate all the various subtleties well.

        So, for example:

        // MakeDir creates a new directory.
        func MakeDir(dir string) error {
            // ...
        }
        

        Is documented, in a way. But what if the directory already exists? Does it create intermediate non-existing directories? What are the limitations on allowed characters? Will this work on Windows? What about filesystem permissions?

        This is a very simple example, but to create a good comprehensive documentation for even a fairly basic function like this is more than just "creates a new directory".

        I've sometimes seen companies make the mistake of having the new junior employee or intern write the documentation as a "good introductory task", and I think that's a huge mistake. Writing docs is senior-level stuff if you want it done well, not junior-level.

        The problem with these PRs isn't that they're editing the README, the problem is that they're just inserting random pointless changes. I've also seen plenty of "code PRs" which just add print('asd') or change font-size: 25px to font-size: 24px without explanation. That's just spam, but not because it edits the README.

        1 vote
        1. viridian
          Link Parent
          That's a fair argument, and I fully admit that my suggestion isn't exactly attempting to thwart the problem with surgical precision, and I completely agree that good documentation is hard, skill...

          That's a fair argument, and I fully admit that my suggestion isn't exactly attempting to thwart the problem with surgical precision, and I completely agree that good documentation is hard, skill based work. What I would question though, is whether a program targeted towards getting newbies to contribute would benefit from directing participants specifically towards code changes. As you said yourself, good documentation is by no means an introductory task, and it seems to overwhelmingly encourage low quality commits.

      3. [3]
        jgb
        Link Parent
        The obvious circumvention to this is to simply send patches that just add pointless comments: + | // Accumulate current score into someVar | someVar += getScore();

        The obvious circumvention to this is to simply send patches that just add pointless comments:

         + | // Accumulate current score into someVar
           | someVar += getScore();
        
        1. [2]
          viridian
          Link Parent
          Hahaha, then you'd be arguing that comments are code. Come to think of it, I bet DigitalOcean really doesn't want to walk into that warzone of a topic.

          Hahaha, then you'd be arguing that comments are code. Come to think of it, I bet DigitalOcean really doesn't want to walk into that warzone of a topic.

          2 votes
          1. jgb
            Link Parent
            Even were they to clamp down on that there are of course innumerable ways to meddle with a source file and create a diff but not a semantic change. Though after some number of iterations of this...

            Even were they to clamp down on that there are of course innumerable ways to meddle with a source file and create a diff but not a semantic change. Though after some number of iterations of this cat-and-mouse game it's not that much harder to actually just fix something :-)

            3 votes
    3. PendingKetchup
      Link Parent
      Turns out that you can just wade into the sea of unsolicited low effort PRs and add unsolicited low effort Github reviews. Maybe next year you should need to make 2 PRs and leave 3 reviews.

      Turns out that you can just wade into the sea of unsolicited low effort PRs and add unsolicited low effort Github reviews.

      Maybe next year you should need to make 2 PRs and leave 3 reviews.