13 votes

SourceHut will blocklist the Go module mirror

12 comments

  1. riQQ
    Link
    Update: https://sourcehut.org/blog/2023-01-09-gomodulemirror/

    Update: https://sourcehut.org/blog/2023-01-09-gomodulemirror/

    Update 2023-01-31: Russ Cox of the Go team reached out to us to address this problem. After some discussion, an acceptable plan was worked out. The Go team is working on deploying an update to the “go” tool to add a -reuse flag, which should substantially reduce the traffic generated by this system for all users of Go.

    In the meantime, the automated refresh traffic from proxy.golang.org was disabled for SourceHut, which the Go team assures us should have little-to-no impact on users and which reduces the burden on our system to a managable level. Following this change by the Go team, we have observed traffic from the Go module mirror reduced to an acceptable level. The Go team has decided that the automatic refresh behavior is their responsibility, not the responsibility of other operators, so any other small hosts will hopefully not be affected as the Go team will enable or disable the refresh behavior at their discretion with the burden on third-party operators in mind.

    Consequently, we have cancelled our plans to disable Go traffic to git.sr.ht. No action is required by users to continue receiving service. Thanks Russ!

    8 votes
  2. [2]
    skybrian
    Link
    I'm wondering what the impact is of this dispute. Are there popular Go modules on SourceHut?

    I'm wondering what the impact is of this dispute. Are there popular Go modules on SourceHut?

    4 votes
    1. Diff
      Link Parent
      I don't know how popular it is, but I do know the Gio GUI framework is hosted on SourceHut. And there are a few large enough to be packaged by Debian.

      I don't know how popular it is, but I do know the Gio GUI framework is hosted on SourceHut. And there are a few large enough to be packaged by Debian.

      3 votes
  3. [9]
    Greg
    Link
    Drew says: The conversation on that linked issue says: I actually think he’s got some reasonable technical points in the post in general, but the framing seems dishonest as hell. Google offered an...

    Drew says:

    On February 24th, 2021, we reported an issue to the Go team regarding this problem. The Go team initially helped us narrow down the cause, first by setting an appropriate User-Agent to help identify this traffic, then through discussions regarding the behavior of this system. We made recommendations to Google for how to service their requirements without generating an excessive amount of redundant traffic. However, the discussion stalled and no further changes were made by Google to address the issue, and we continued to receive an excessive amount of traffic from the module mirror.

    The conversation on that linked issue says:

    Anyone who's receiving too much traffic from proxy.golang.org can request that they be excluded from the refresh traffic, as we did for git.lubar.me. Nobody asked for sr.ht be added to the exclusion set, so as far as it's concerned nothing has changed.

    I actually think he’s got some reasonable technical points in the post in general, but the framing seems dishonest as hell. Google offered an interim solution 18 months ago, and clearly did so in earnest since they turned it around within a week for the owner of another domain, so I don’t see any reason they’d lie about not hearing anything from sr.ht.

    3 votes
    1. [8]
      riQQ
      Link Parent
      Drew's reasons for not using the interim solution: Source: https://news.ycombinator.com/item?id=34313802

      Drew's reasons for not using the interim solution:

      For a number of reasons. For a start, what does disabling the cache refresh imply? Does it come with a degradation of service for Go users? If not, then why is it there at all? And if so, why should we accept a service degradation when the problem is clearly in the proxy's poor engineering and design?

      Furthermore, we try to look past the tip of our own nose when it comes to these kinds of problems. We often reject solutions which are offered to SourceHut and SourceHut alone. This isn't the first time this principle has run into problems with the Go team; to this day pkg.go.dev does not work properly with SourceHut instances hosted elsewhere than git.sr.ht, or even GitLab instances like salsa.debian.org, because they hard-code the list of domains rather than looking for better solutions -- even though they were advised of several.

      The proxy has caused problems for many service providers, and agreeing to have SourceHut removed from the refresh would not solve the problem for anyone else, and thus would not solve the problem. Some of these providers have been able to get in touch with the Go team and received this offer, but the process is not easily discovered and is poorly defined, and, again, comes with these implied service considerations. In the spirit of the Debian free software guidelines, we don't accept these kinds of solutions:

      The rights attached to the program must not depend on the program's being part of a Debian system. If the program is extracted from Debian and used or distributed without Debian but otherwise within the terms of the program's license, all parties to whom the program is redistributed should have the same rights as those that are granted in conjunction with the Debian system.

      Yes, being excluded from the refresh would reduce the traffic to our servers, likely with less impact for users. But it is clearly the wrong solution and we don't like wrong solutions. You would not be wrong to characterize this as somewhat ideologically motivated, but I think we've been very reasonable and the Go team has not -- at this point our move is, in my view, justified.

      Source: https://news.ycombinator.com/item?id=34313802

      10 votes
      1. [2]
        spit-evil-olive-tips
        Link Parent
        I like this part of his reasoning, and wish decision-makers (particularly ones who are less divisive / more diplomatic than Drew is) would take this approach more often. if there's a large-scale /...

        We often reject solutions which are offered to SourceHut and SourceHut alone.

        I like this part of his reasoning, and wish decision-makers (particularly ones who are less divisive / more diplomatic than Drew is) would take this approach more often.

        if there's a large-scale / ecosystem-wide problem, such as

        The frequency of these requests can be as high as ~2,500 per hour, often batched with up to a dozen clones at once, and are generally highly redundant: a single git repository can be fetched over 100 times per hour.

        then offering "here's how to opt out on a case-by-case basis" is a very incomplete solution. really much more of a temporary workaround than an actual solution.

        if you set aside the usual problems with tone and combativeness that comes with Drew's posts, I think at the core of it, he has the better technical argument.

        the "solution" he was offered, and declined, is that he or anyone else can file a GitHub issue with the Golang team (at least, I think that's the process...AFAIK this isn't explicitly spelled out anywhere, other than in one of the 38 comments on a 2-year-old GitHub issue) to request exclusion:

        Anyone who's receiving too much traffic from proxy.golang.org can request that they be excluded from the refresh traffic, as we did for git.lubar.me. Nobody asked for sr.ht be added to the exclusion set, so as far as it's concerned nothing has changed.

        you know what would be cool? if there was some way that request for exclusion could be automated, rather than filing a Github issue and waiting on someone at Google (not exactly known for their dedication to customer support) to go through a manual process on their end.

        oh right, there is an automated way of requesting that. and it's been in use for nearly 30 years.

        and in Drew's original request, that's what he asked for:

        If it's crawling, it should set an appropriate user-agent and respect robots.txt.

        as he mentioned in this most recent blog post, there is an additional crawl-delay directive1 in robots.txt that would be perfectly suited for this.

        following the 30-year-old web standard would also have a couple other concrete benefits. one is authentication - if I run acme-git-hosting.com someone might register "@acmegithosting" on GitHub and submit an exclusion request in my name, or ask that a previous exclusion be reverted. it's a fairly minor impersonation vulnerability, as these things go, but it would be completely solved by relying on https://acme-git-hosting.com/robots.txt (rather than assuming / hoping that any such impersonation would be noticed by the person at Google who handles the exclusion request)

        using robots.txt would also let the Git hosting services update the crawl-delay on their own, without needing to coordinate with someone from the Golang team. if Acme Git Hosting allows once-an-hour scrapes from the Golang proxy, but then traffic increases and they want to dial back to every-X-hours, that's a one-line change that Acme can make, entirely on their own.

        at the center of this onion, Google wrote a web crawler (albeit a specialized one, not their normal Googlebot), didn't implement support for robots.txt, and refused to implement it when asked. I can understand the frustration from someone such as Drew who subsequently got deluged by that crawler.

        1: as a side note, crawl-delay is a non-standard addition to robots.txt, but it seems to have been in use since at least 2012 (taken from the oldest of Wikipedia's references to it). meanwhile, in 2019 Google helped get robots.txt standardized as an official IETF RFC. they had ample opportunity to include something like crawl-delay and standardize on its meaning, but they didn't. Googlebot itself doesn't honor crawl-delay directives at all, and instead requires you to go through their webmaster console.

        17 votes
        1. Greg
          Link Parent
          This is probably why the guy gets to me a bit: I often want the same outcomes he does, but a lot of the time I feel like his style is actively undermining the chances of that happening! Oh yeah,...

          I like this part of his reasoning, and wish decision-makers (particularly ones who are less divisive / more diplomatic than Drew is) would take this approach more often.

          This is probably why the guy gets to me a bit: I often want the same outcomes he does, but a lot of the time I feel like his style is actively undermining the chances of that happening!

          the "solution" he was offered, and declined, is that he or anyone else can file a GitHub issue with the Golang team (at least, I think that's the process...AFAIK this isn't explicitly spelled out anywhere, other than in one of the 38 comments on a 2-year-old GitHub issue) to request exclusion

          Oh yeah, it's far from being a reasonable general case fix (debate aside about whether it was ever intended as one) - but I've got to laugh at the irony of him making that important and valid point about poor discoverability and the importance of ideology... on a second-level hacker news comment in a thread that readers of the blog posts have no way of knowing about.

          8 votes
      2. [4]
        rkcr
        Link Parent
        Seems like a very Drew situation to me: unwavering confidence that he's right and totally unwilling to compromise.

        Seems like a very Drew situation to me: unwavering confidence that he's right and totally unwilling to compromise.

        4 votes
        1. [3]
          LukeZaz
          Link Parent
          To me, this looks like an awfully clunky compromise for the reasons he stated already. So... is he wrong?

          To me, this looks like an awfully clunky compromise for the reasons he stated already. So... is he wrong?

          5 votes
          1. Greg
            Link Parent
            Yes, he’s wrong. Just not necessarily on the tech. He’s got good points about the software architecture, no question, but he isn’t primarily acting in his capacity as a programmer right now: he’s...

            Yes, he’s wrong. Just not necessarily on the tech. He’s got good points about the software architecture, no question, but he isn’t primarily acting in his capacity as a programmer right now: he’s acting as lead decision maker and spokesperson for his organisation, and in that he’s simultaneously burning bridges and misleading his audience on the facts.

            The cost of bearing this traffic is no longer acceptable to us, and the Go team has made no attempts to correct the issue during this time. We want to avoid causing inconvenience for Go users, but the load and cost is too high for us to continue justifying support for this feature.

            Bullshit. There was an option to make that cost and load go away, and not only did he not engage with it, he didn’t even mention it in either blog post. He only addressed it when enough people asked that it was impossible to ignore. A guy as smart as he is doesn’t get to play dumb and claim that the omission was anything other than deliberate, and he doesn’t get to maintain the moral authority in the debate after being that misleading.

            I think the most frustrating thing here is that if he’d just said “We’re choosing to turn go modules off entirely as a stand against their habit of making decisions we don’t like, and we don’t accept their compromise for the following reasons…” I’d probably be in this thread agreeing with him.

            As it is, he spent two blog posts trying to claim it was about traffic and server load, both written after they offered to remove those two things as a concern entirely. He’s right to have technical and philosophical issues with the whole situation, and totally wrong to act as if the decision was made for dispassionate reasons of cost.

            He made a subjective call specifically to put pressure on them, and that could have been fine. Just own it, be honest, and be up front about it.

            11 votes
          2. Diff
            Link Parent
            It's a clunky one, but it's more seamless than his current plan. And it's also a temporary one, one whose time is almost over. Right before 1.19 released the go command got a -reuse flag pushed in...

            It's a clunky one, but it's more seamless than his current plan. And it's also a temporary one, one whose time is almost over. Right before 1.19 released the go command got a -reuse flag pushed in after the feature freeze, it'll fix this problem once and for all once the proxies pick up support for this new flag. But it took time to rearchitect things. Kinda weird he's decided to resurrect this issue right as it's finally coming to an end.

            6 votes
      3. Rocket_Man
        Link Parent
        That's refreshing. I don't agree with characterizing the Go team as unreasonable. These middling solutions are very common at my company and I detest them. But they are usually reasonable and...

        That's refreshing. I don't agree with characterizing the Go team as unreasonable. These middling solutions are very common at my company and I detest them. But they are usually reasonable and pragmatic given the circumstances at the time. But I think it's healthy to ask them to do better.

        2 votes