12 votes

At what point is a rewrite warranted?

Posted May 31, 2018 by fishinginthecoy

[Context: I do computational research in the natural sciences.]

I have been tasked with verifying the correctness of a ~3000 LOC software project written in a mix of Fortrans 77 and 90. I have made some small amount of headway with getting the program up and running, but it seems like every time I make one step forward I take ten steps back.
Some issues with the program:

It only compiles with one, specific, closed-source compiler
Useless variable names
Minimal comments (the ones that do exist are near-gibberish, explain the obvious, or comment out debugging print statements)
Weird decisions are made with no justification, e.g. the code author decided that, if we are considering the calculations on the first molecule, we are only to consider its first atom
Magic numbers everywhere, very few of which are known physical constants or their conversions
etc, etc, etc.

I am reaching peak frustration after having worked with this code for only a few weeks. At this point, the idea of sitting down and rewriting the program from scratch is very, very tempting.
Do I need to just step back, relax, and keep hitting my head against the wall, or is this a situation where a rewrite may be necessary?

11 comments

[4]
rhencke
May 31, 2018
Link
We had another discussion about legacy code in another thread. Long story short, I have a book recommendation for you - Working Effectively with Legacy Code....
We had another discussion about legacy code in another thread. Long story short, I have a book recommendation for you - Working Effectively with Legacy Code.

https://tildes.net/~talk/zd/today_i_finally_beat_being_a_digital_pirate_despite_having_to_jump_a_big_hurdle#comment-6ae

To expound on that, though. It depends. But, arming yourself with better skills for understanding this code, and better skills for restructuring it will go a long, long way, and it will pay off in the long run.

A lot of it depends on things like:
- How widely is the current version used?
- Where does it run? What are the inputs/outputs? What systems depend on it? How often does it run?
- How mission-critical is it? If a mistake is made during a rewrite, what's the fallout?
- What is your timetable for this work?
10 votes
1. [3]
  fishinginthecoy (OP)
  May 31, 2018
  Link Parent
  I will read the book, thank you for the recommendation. To answer your questions: The code is not used anywhere outside of the lab I do research in. As far as I can tell, it hasn’t been touched in...
  
  I will read the book, thank you for the recommendation.
  
  To answer your questions:
  
  The code is not used anywhere outside of the lab I do research in. As far as I can tell, it hasn’t been touched in fifteen years.
  
  Runs on any Unix system, input is basic science stuff (atom coordinates, parameters) and output is various properties calculated from and using the inputs. No systems depend on it.
  
  The program implements a particular model of a system we are interested in. Errors in the rewrite will lead to incorrect scientific data which could lead to erroneous results and predictions
  
  I have about four weeks to meet our first goal of reproducing results from the original paper on the subject. This is a long-term research project, however. Timetable is uncertain beyond that.
  
  2 votes
  1. [2]
    teaearlgraycold
    May 31, 2018
    Link Parent
    Well you could always do black-box testing to check correctness of new code. As for you timetable, 4 weeks seems pretty tough for re-writing 3000 lines of unreadable code.
    
    The program implements a particular model of a system we are interested in. Errors in the rewrite will lead to incorrect scientific data which could lead to erroneous results and predictions
    
    Well you could always do black-box testing to check correctness of new code. As for you timetable, 4 weeks seems pretty tough for re-writing 3000 lines of unreadable code.
    
    3 votes
    
    fishinginthecoy (OP)
    May 31, 2018
    Link Parent
    Yeah, you're right... I'll take what I've learned from this thread to my PI tomorrow and see what they think.
    
    Yeah, you're right... I'll take what I've learned from this thread to my PI tomorrow and see what they think.
    
    3 votes
panic
May 31, 2018
Link
What does it mean for this program to be correct? These weird decisions should be based somehow on what the code is supposed to be doing, right? In any case, if you're trying to understand the...
What does it mean for this program to be correct? These weird decisions should be based somehow on what the code is supposed to be doing, right?

In any case, if you're trying to understand the program, rewriting it won't help. What can help is "rewriting" the structure of the code on paper, then making lots of small, incremental changes to move the program toward that "rewritten" structure. Each change should be obviously correct and testable on its own. Changes like: renaming a function, moving a variable from a global to a struct, or inlining a function into all its callers.

As soon as you run into a problem making one of these incremental changes, that's a sign that the original program is doing something you didn't understand. Then you can take the time to figure out why the old program worked that way (or to verify that this particular behavior is actually wrong and remove it).

I've had a lot of success with this technique: it can take a while, but you get:
- A working program before and after each change—no long periods of breakage
- A full understanding of what the old program was doing
- If you do everything right, the only changes in behavior will be to remove bugs
(wouldn't it be nice if Reddit followed this approach!)
5 votes
nebruin
May 31, 2018
Link
My suggestion to you would be to decide what the business value is of the program and it's future needs. If it's a program that will need to change, be refactored, revamped, etc... then spending...

My suggestion to you would be to decide what the business value is of the program and it's future needs. If it's a program that will need to change, be refactored, revamped, etc... then spending the time to make sure it can do those things is worth the effort. If it's basically just a one and done and you only need it to keep up just long enough, then you obviously spend considerably less effort on it. So basically I would frame it more as how much value is it to the business and make your decision from that.

3 votes
Voxavious
May 31, 2018
Link
I'm no programmer, but I feel like this situation comes up in most fields in some form or another. When do you decide to start over and how do you decide if something is worth saving? I feel like...

I'm no programmer, but I feel like this situation comes up in most fields in some form or another. When do you decide to start over and how do you decide if something is worth saving? I feel like with programming, the impression I've gotten is that, most of the time, code has interlocking dependencies and can be interconnected to the point of acting like a tangle of spaghetti. Is there any way to follow a particular strand of dependencies to the base and repeat that until you have your spaghetti sorted? 3000 lines is a lot to try and handle for the multitude of dependencies it sounds like you're working with.

Coming from a project management standpoint, I would ask yourself what the end goal benefit the organization is trying to realize and if taking a breath and getting your hands dirty will better achieve that than starting fresh with all the modern tools you have at your disposal.

2 votes
[2]
zmitchell
May 31, 2018
Link
Oh god this is giving me flashbacks. I get the feeling you're a computational chemist or something like that, so hello from experimental biophysics land *waves* The software that controls my...
Oh god this is giving me flashbacks. I get the feeling you're a computational chemist or something like that, so hello from experimental biophysics land *waves*

The software that controls my experiments is a scripting language that comes bundled with a non-free application my advisor wrote in 1995. Everything is global, there are no functions (only GOTO/GOSUB/subroutines), and you can't split a program into multiple files. There are no loop constructs, so you see a lot of this (baffling line-labels and all):
```
agaga:
if some_condition; GOTO agaga
```
I've started rewriting this thing like three times, twice in Python and once in Rust, but the actual PhD work has gotten in the way each time. Here are the things I ask myself about the program:
1. How do I know that this program even does what I think it does? In your case it sounds like you're doing simulations, so you should be able to provide some minimal test cases to do some sanity checks. My program interacts with a bunch of serial devices, so it's more difficult to know that it's doing what I think it's doing.
2. If I need to make modifications to the program, how long will it take me? Will I be able to tell if I've broken something? (See 1.) I did a rewrite of another program (same scripting language, different experiment) and restructured it so that I could more easily make programs to test the system, and it was really useful.
3. Is this the most important thing I can be doing for my PhD/masters right now? Is this going to get me out the door faster? I'm not staying in academia, and I'm close to graduating, so that may be more important to me than it is to you. It's really easy to work on the things that don't really matter just because it's easier than banging your head against a wall doing actual research.
4. How much empathy do you have for the people who will have to use this after you leave? If you really care about making life easy for the next person, go ahead and rewrite it. If you just want to get the hell out of dodge, just stick it out and use Fortan.
5. I know Fortran gets used a lot for simulations because it's blazingly fast, but if speed isn't the first priority, you might consider rewriting it in another language that more people know. It will be easier to find help when you get stuck, it will be easier for the other people that use the code, etc.
2 votes
1. fishinginthecoy (OP)
  June 1, 2018
  Link Parent
  Excellent guess, I am indeed a computational chemist! Those questions are really poignant and have given me a lot to think about. I spoke with my boss today and we have come to the conclusion that...
  
  Excellent guess, I am indeed a computational chemist!
  Those questions are really poignant and have given me a lot to think about. I spoke with my boss today and we have come to the conclusion that we're going to reimplement the core functionality of the code (the meat and potatoes, as it were) in LAMMPS.
  
  1 vote
[2]
jgb
May 31, 2018
Link
Sounds like a re-write is in order to me. If not for yourself, at least for the sake of the next guy who comes along and has to deal with it. Besides, 3000 lines of FORTRAN can probably be...

Sounds like a re-write is in order to me. If not for yourself, at least for the sake of the next guy who comes along and has to deal with it. Besides, 3000 lines of FORTRAN can probably be re-implemented in 1000 lines of Python, which all things considered really isn't that much code.

1 vote
1. fishinginthecoy (OP)
  June 1, 2018
  Link Parent
  My boss and I came to this conclusion today. We will be implementing the core functionality in a program that is the industry standard for this type of research.
  
  My boss and I came to this conclusion today. We will be implementing the core functionality in a program that is the industry standard for this type of research.
  
  2 votes