10 votes

The need for software testing: Neil Ferguson's unstable epidemiologic model

23 comments

  1. [20]
    soks_n_sandals
    Link
    I want to be crystal clear that I do not support the criticisms in this piece as a reason to oppose stay-at-home orders and the subsequent economic impact. I believe that regardless of the...

    I want to be crystal clear that I do not support the criticisms in this piece as a reason to oppose stay-at-home orders and the subsequent economic impact. I believe that regardless of the model/code used to make those decisions, they were ultimately the right choice.

    Concerning the article, which is more of an opinion piece, I shared the outline since it's pay-walled. I work as a software tester. It is troubling that the code underlying this COVID model allegedly produces different results depending on how it is run and the platform. I expect there to be some small numerical noise in simulations, especially if running distributed on multiple cores, but if the results vary significantly (as alleged in this piece), that calls into question the reliability of the model.

    I thought these alleged software flaws are interesting and wanted to share.

    4 votes
    1. [13]
      vektor
      Link Parent
      As a computer scientist, and without bothering to look at the original code, some of the criticisms seem hollow: I've done some research and the authors might refer to accidentally typing things...
      • Exemplary

      As a computer scientist, and without bothering to look at the original code, some of the criticisms seem hollow:

      based on a programming language called Fortran[..] This outdated language contains inherent problems with its grammar and the way it assigns values, which can give way to multiple design flaws and numerical inaccuracies.

      I've done some research and the authors might refer to accidentally typing things as single precision or accidentally casting an intermediate result to single precision. That's not great. But no mention whether that actually happens in the code. Sure fortran isn't great, but fortran is what underpins a lot of academic code outside of CS, from what I've heard.

      One file alone in the Imperial model contained 15,000 lines of code. [..] in our commercial reality, we would fire anyone for developing code like this[..]

      Yeah, the guy who made the software is not a software engineer, he's being paid to do research in epidemiology. This is just how a lot of research code is. One guy knows all the things about a piece of code and he does all the work on it. Is it good software engineering? No, but he's not getting paid to do that, and sadly no one else is paid to do it for him either.

      he approach ignores widely accepted computer science principles known as "separation of concerns" [..] Without this separation, it is impossible to carry out rigorous testing of individual parts to ensure full working order of the whole.

      Yeah, no point preparing for the advent of testing if you have no one to write any tests. Still, not an actual criticism of the code or the conclusions derived.

      Run it on different computers and you would likely get different results. In other words, it is non-deterministic.

      Frankly, that isn't all too concerning. If one computer always rounds down and one rounds up, that's not a problem if you're working with 16 digit numbers. Still, failure to show that the conclusions are affected.

      Ultimately, this is a computer science problem and where are the computer scientists in the room?

      As a computer scientist.... get off your high horse(tempering my choice of words here..). This is not a CS problem. Not everything is. You know another piece of code that's undeterministic, horribly designed, unmaintainable and lives are at stake? Every law ever. Go play with lawyers.

      Can rigorous software engineering improve the model? Probably. We could probably buff out the non-determinism, make it easier to adapt to future threats, more performant, yada yada yada. The article completely fails to demonstrate that the outcome would be any different. Methinks this is an attack on the lockdown by proxy. If the lockdown is a consequence of a prediction based on a model written with dodgy code, then the lockdown was wrong..?

      Fuck no, the lockdown was necessary if you don't want to be worse off than italy. What could have prevented this is "testing testing testing". But not unit tests on your code, but early covid-19 testing once the threat was identified. Germany did that, and the (imo) most trustworthy german virologist completely attributes our success to knowing early what is going on. Our testing allowed us to know the virus was here and how fast it was growing, where other countries waited a lot longer.

      By all means, fund epidemiology and virology better. Give Neil Ferguson's lab a trained software engineer. Fund the NHS adequately to enable them to quickly develop and scale up the required tests. But don't open back up without knowing WTF is going on.

      Ninjaedit: Not throwing shade here at OP, t'was a good read and you rightly distanced yourself from the epidemiological part of the argument.

      17 votes
      1. moriarty
        (edited )
        Link Parent
        That's a good catch. Well, Mr. Richards, you know what else runs Fortran? The Large Hadron Collider. LIGO. The International Atomic Energy Agency. I could probably fill encyclopedias with my list...

        based on a programming language called Fortran[..]

        That's a good catch. Well, Mr. Richards, you know what else runs Fortran? The Large Hadron Collider. LIGO. The International Atomic Energy Agency. I could probably fill encyclopedias with my list of complaints about Fortran, but it is a solid language and the glue that holds up a lot of academic research. There are mountains of tried and true monte-carlo simulations written in Fortran. Experienced researchers are well-aware of its pitfalls and will know to use it well. This is the software-equivalent of an ad-hominem - if I can't find an actual flaw, I'll cast aspersions on the language used.

        3 votes
      2. [11]
        soks_n_sandals
        Link Parent
        I'm 100% on board with your criticisms. I agree that the article is essentially a flawed attack on the lock downs, which was a necessary step in limiting spread. I appreciate your long response...

        I'm 100% on board with your criticisms. I agree that the article is essentially a flawed attack on the lock downs, which was a necessary step in limiting spread. I appreciate your long response and perspective here.

        Sure fortran isn't great, but fortran is what underpins a lot of academic code outside of CS, from what I've heard.

        I've come to understand the same thing. Like any language, good Fortran can be written.

        I think a much better outcome of this article would be to advocate for precisely what you said: better funding and possibly supplying someone to write better code.

        Nevertheless, I am inclined to wonder whether the high death rates predicted (for the US) led to a sort of self-fulfilling prophecy. We are nearing the 100k deaths mark. There are a litany of other issues, namely testing and unclear leadership directives, that got us here. However, the 100k death toll was stated early on as a "very good job" in limiting the impact, so I wonder whether that partially charged the public sentiment with a "nothing we can do" mentality. What if it was 10k deaths that was a very good job? Would the response have been more aggressive, or would it be worse? It's obviously a speculative question with no concrete answer.

        2 votes
        1. arghdos
          (edited )
          Link Parent
          FORTRAN with implicit none is generally fine, this does things like disabling implicit creation of new variables, making I,J,K not implicitly be integer counters, requiring array size bounds, etc....

          FORTRAN with implicit none is generally fine, this does things like disabling implicit creation of new variables, making I,J,K not implicitly be integer counters, requiring array size bounds, etc. In implicit mode there are many ways to shoot yourself in the foot (and unfortunately, lots of legacy codes are written like this) but you can do that in modern languages , e.g.,:

          for i in range (10):
              .... # doing some work
              i = 4 - input[i] # shadow loop variable
              output[i] = sin(i)
          

          There’s nothing inherently wrong with FORTRAN, but I’ll take C or C++ any day.

          Edit: honestly the worst thing about FORTRAN is that the language spec is behind a paywall, so you don’t have an equivalent to things like: https://en.cppreference.com/w/

          That’s a roundabout way of saying, if I’ve interpreted what implicit none does incorrectly, it’s because what documentation I’ve found is not great :p

          4 votes
        2. skybrian
          Link Parent
          You could easily argue the other way, that too-low predictions would result in dangerous complacency. But I think focusing on public reaction to predictions in this way is missing the point of...

          You could easily argue the other way, that too-low predictions would result in dangerous complacency. But I think focusing on public reaction to predictions in this way is missing the point of doing science. It's important to separate figuring out what's going on with communicating those results.

          It's particularly important to communicate about uncertainty. The total death toll is the area under a curve and it's very difficult to predict the peak or back side of the curve in advance. Due to parameter sensitivity, it's based on little more than guesses. The main message was accurate: if we don't stop it, this is going to be very bad.

          In such circumstances, giving out any single number is misleading - it should be a wide range.

          4 votes
        3. [7]
          gpl
          Link Parent
          I'm not sure what machines this code was being written on, but I want to note that a lot of academic code is written in C or Fortran simply because compiler support for other languages is pretty...

          I've come to understand the same thing. Like any language, good Fortran can be written.

          I'm not sure what machines this code was being written on, but I want to note that a lot of academic code is written in C or Fortran simply because compiler support for other languages is pretty non-existent on high performance machines. And, of course, Fortran is blazing fast and the recent updates to the language make it way more usable than like Fortran77.

          I actually find this whole thing kind of humorous, as I never realized that others thought academics were writing deployment-ready software packages for their research. Most academic code is never touched by anyone other than the person who wrote it, and coding practices like we see here are pretty much the norm. I don't think it's a huge issue overall as the code is a means in this case and an ends.

          4 votes
          1. [3]
            arghdos
            Link Parent
            What? I mean, at the very least you'll have C/C++/Fortran, CUDA, python, OpenMP / OpenACC, various MPI stacks, runtime/transport layers (e.g., Charm++) etc. Not sure what other compiler support...

            compiler support for other languages is pretty non-existent on high performance machines

            What? I mean, at the very least you'll have C/C++/Fortran, CUDA, python, OpenMP / OpenACC, various MPI stacks, runtime/transport layers (e.g., Charm++) etc. Not sure what other compiler support you're really looking for there, as that really covers 95% or so of most HPC / compiler codes. You could always build your own Rust compiler or something if you needed

            And, of course, Fortran is blazing fast

            Fortran is "blazing fast" in the sense that compilation times are fast... but in that case, so is C. The code itself can been fast or slow depending on how optimized it is. It's not inherently faster or slower than properly written C / C++ or even something like Python using Cython or numpy to offload the computation-y bits.

            1 vote
            1. gpl
              Link Parent
              No, you're definitely right on both points. I used C and Fortran as an example there, but I mainly meant that 'newer' languages like Rust, Go, etc that get pointed to as "replacements" for things...

              No, you're definitely right on both points. I used C and Fortran as an example there, but I mainly meant that 'newer' languages like Rust, Go, etc that get pointed to as "replacements" for things like Fortran or C/C++ don't have as much support. You're definitely correct that all of those things have HPC support on many systems.

              I personally don't think using Fortran is somehow inherently an issue, especially considering the language has been updated multiple times in the past years. This isn't as relevant for the linked article, but I have seen criticisms passed around about Ferguson's model being based on a 'decades old' language as though that was itself an issue. I guess I was more responding to an argument not made here.

              3 votes
            2. vektor
              Link Parent
              Most of the things you mention make shooting yourself in the foot or writing bad code just as easy as fortran(at least as far as my experience will carry you). Python, C, Fortran and the...

              Most of the things you mention make shooting yourself in the foot or writing bad code just as easy as fortran(at least as far as my experience will carry you).

              Python, C, Fortran and the shader-like language I learned for HPC (was it OpenMP or OpenCL? Don't remember) are all various kinds of dangerous if not used properly.

              So from that angle, @gpl is kind of right: Academics write brittle code because the HPC ecosystem kind of sets you up for it. Keep in mind I'm not saying it's impossible to write good code in any of the above, just that you can find yourself shifting to less-than-ideal practices or can mess up and not even notice.

              Can you tell I like my compiler to be the dominant part in our relationship? :D

          2. [3]
            soks_n_sandals
            Link Parent
            You make a great point. I wonder why more academic researchers haven't shifted to Matlab. It's not outrageously expensive and looks to have parallel/cluster computing packages to handle gpu...

            You make a great point. I wonder why more academic researchers haven't shifted to Matlab. It's not outrageously expensive and looks to have parallel/cluster computing packages to handle gpu acceleration and MPI. I think there's an argument that this research is pretty important and repeatability is definitely desireable, but in a lot of cases, code is just for the person writing it.

            1. [2]
              arghdos
              Link Parent
              Matlab is $860 a year (or $2150 for a perpetual license). Python is free, and kicks the pants off of Matlab w.r.t., GPU acceleration / extensibility / available packages (e.g., Tensorflow and...

              Matlab is $860 a year (or $2150 for a perpetual license).

              Python is free, and kicks the pants off of Matlab w.r.t., GPU acceleration / extensibility / available packages (e.g., Tensorflow and friends) any day of the week.

              Really the only reason I ever see to use Matlab is if you're deep into Simulink or their control-process design stuff.

              6 votes
              1. soks_n_sandals
                Link Parent
                I would imagine their license cost is tenable for a research institution. But yes, python is becoming a strong competitor with the packages the numerical packages they have available. I can't...

                I would imagine their license cost is tenable for a research institution. But yes, python is becoming a strong competitor with the packages the numerical packages they have available. I can't speak to their distributed computing or whether they're comparable in compute speed.

                I think robotics is a good application of Matlab. It's obviously excellent for linear algebra, so I know a lot of academic heat transfer codes are written in Matlab in order to do essentially custom finite element work.

                3 votes
        4. vektor
          (edited )
          Link Parent
          I think if that's a prevailing view, it needs communicating that corona is probably a lot less severe than previously thought. The original Wuhan reports indicated a 3% fatality rate, which...

          I think if that's a prevailing view, it needs communicating that corona is probably a lot less severe than previously thought. The original Wuhan reports indicated a 3% fatality rate, which combined with an expected ~60% infection rate in an improperly mitigated scenario, gives you 2% of population dead. For the US, that's 6 million. That's just completely staggering. With a bit of optimism, you can convince yourself that currently, the infection fatality rate is as low as 0.3% (certainly isn't 1% at this point). That gives you 600000 dead. In other words, if the government fails to contain this, we expect at least that many. More if hospital overload is increasing fatality rates. If we scale down how we judge the government reaction appropriately, that 10k death toll came and went for the US. Compare too, that Germany exceeded that target death toll (adjusted for population) by a factor of ~2, and we've generally done a good job; so by that metric, that 100k/10k goal is quite optimistic.

          In any case, I don't think the US govt did a good job at all. How do we communicate that? Ahh, that seems to be the question we have to ask at every corner here, isn't it? Nothing about this pandemic is simple enough for how our news work these days.

          3 votes
    2. [6]
      cfabbro
      (edited )
      Link Parent
      We're free to share outline links in comments, but AFAIK the policy on Tildes (likely due to Canadian copyright issues) is to always link the primary source so long as that site is still...

      I shared the outline since it's pay-walled.

      We're free to share outline links in comments, but AFAIK the policy on Tildes (likely due to Canadian copyright issues) is to always link the primary source so long as that site is still functioning, regardless of if it's paywalled or not. See: @Deimos' comment here. So I have changed the link to the original source.

      p.s. The outline link in question for those who need it: https://outline.com/cnvAS8

      6 votes
      1. [4]
        N45H
        Link Parent
        For anyone stopping by, feel free to label Offtopic, but because of you cfab I just discovered Outline (what he* mentioned about posting links in comments was discussed here) Kill Sticky Headers...

        For anyone stopping by, feel free to label Offtopic, but because of you cfab I just discovered

        1. Outline (what he* mentioned about posting links in comments was discussed here)
        2. Kill Sticky Headers

        Thanks!
        *I assumed the gender because of the use of 'cockle' in the 1st intro thread. I'm hoping this doesn't backfire.

        1 vote
        1. [3]
          cfabbro
          (edited )
          Link Parent
          Heh... "warms the cockles of my heart" is an idiomatic expression essentially just meaning "makes me deeply happy". Cockles refers to the cochleae (ventricles/chambers) of the heart. It has...

          Heh... "warms the cockles of my heart" is an idiomatic expression essentially just meaning "makes me deeply happy". Cockles refers to the cochleae (ventricles/chambers) of the heart. It has nothing to do with penises or gender. :P

          But don't worry about that backfiring, at least with me, since I genuinely don't care which gendered terms people refer to me as. Though for the record I am a cis male...ish (it's complicated).

          1 vote
          1. [2]
            N45H
            Link Parent
            I genuinely haven't heard that one before. Good to know! Haha. After that smiley I presumed it would actually backfire. Then, for a second I really thought it was all said and settled with cis...

            I genuinely haven't heard that one before. Good to know!
            Haha. After that smiley I presumed it would actually backfire. Then, for a second I really thought it was all said and settled with cis male. Aaand then the plane touched down in a weird middle place of a situation. What a rollercoaster! Thanks for that.

            1 vote
            1. cfabbro
              Link Parent
              Gender can be weird like that sometimes. ;)

              Aaand then the plane touched down in a weird middle place of a situation.

              Gender can be weird like that sometimes. ;)

              1 vote
  2. [2]
    Comment deleted by author
    Link
    1. soks_n_sandals
      Link Parent
      Thank you for sharing this article. It has all the substance lacking from the Telegraph piece.

      Thank you for sharing this article. It has all the substance lacking from the Telegraph piece.

      1 vote
  3. [2]
    nothis
    Link
    I can't bypass the paywall, so... by what factor was it off? And... was Italy just a glitch in the Matrix?

    I can't bypass the paywall, so... by what factor was it off? And... was Italy just a glitch in the Matrix?

    1 vote
    1. soks_n_sandals
      Link Parent
      https://outline.com/cnvAS8 So the difference of results is not clearly defined in this piece, just described as "wildly" varying, which is why I refer to them as "alleged". I think it reduces the...

      https://outline.com/cnvAS8

      So the difference of results is not clearly defined in this piece, just described as "wildly" varying, which is why I refer to them as "alleged". I think it reduces the credibility of the argument since it matters here for accuracy.

      It really doesn't matter if there's numerical noise. There's so much uncertainty in this that I really think it compounds how little numerical noise matters. But it would be troubling if two researchers couldn't run the same model configuration and couldn't get similar results.

      3 votes