12
votes
At what point is a rewrite warranted?
[Context: I do computational research in the natural sciences.]
I have been tasked with verifying the correctness of a ~3000 LOC software project written in a mix of Fortrans 77 and 90. I have made some small amount of headway with getting the program up and running, but it seems like every time I make one step forward I take ten steps back.
Some issues with the program:
- It only compiles with one, specific, closed-source compiler
- Useless variable names
- Minimal comments (the ones that do exist are near-gibberish, explain the obvious, or comment out debugging print
statements
) - Weird decisions are made with no justification, e.g. the code author decided that, if we are considering the calculations on the first molecule, we are only to consider its first atom
- Magic numbers everywhere, very few of which are known physical constants or their conversions
- etc, etc, etc.
I am reaching peak frustration after having worked with this code for only a few weeks. At this point, the idea of sitting down and rewriting the program from scratch is very, very tempting.
Do I need to just step back, relax, and keep hitting my head against the wall, or is this a situation where a rewrite may be necessary?
We had another discussion about legacy code in another thread. Long story short, I have a book recommendation for you - Working Effectively with Legacy Code.
https://tildes.net/~talk/zd/today_i_finally_beat_being_a_digital_pirate_despite_having_to_jump_a_big_hurdle#comment-6ae
To expound on that, though. It depends. But, arming yourself with better skills for understanding this code, and better skills for restructuring it will go a long, long way, and it will pay off in the long run.
A lot of it depends on things like:
I will read the book, thank you for the recommendation.
To answer your questions:
Well you could always do black-box testing to check correctness of new code. As for you timetable, 4 weeks seems pretty tough for re-writing 3000 lines of unreadable code.
Yeah, you're right... I'll take what I've learned from this thread to my PI tomorrow and see what they think.
What does it mean for this program to be correct? These weird decisions should be based somehow on what the code is supposed to be doing, right?
In any case, if you're trying to understand the program, rewriting it won't help. What can help is "rewriting" the structure of the code on paper, then making lots of small, incremental changes to move the program toward that "rewritten" structure. Each change should be obviously correct and testable on its own. Changes like: renaming a function, moving a variable from a global to a struct, or inlining a function into all its callers.
As soon as you run into a problem making one of these incremental changes, that's a sign that the original program is doing something you didn't understand. Then you can take the time to figure out why the old program worked that way (or to verify that this particular behavior is actually wrong and remove it).
I've had a lot of success with this technique: it can take a while, but you get:
(wouldn't it be nice if Reddit followed this approach!)
My suggestion to you would be to decide what the business value is of the program and it's future needs. If it's a program that will need to change, be refactored, revamped, etc... then spending the time to make sure it can do those things is worth the effort. If it's basically just a one and done and you only need it to keep up just long enough, then you obviously spend considerably less effort on it. So basically I would frame it more as how much value is it to the business and make your decision from that.
I'm no programmer, but I feel like this situation comes up in most fields in some form or another. When do you decide to start over and how do you decide if something is worth saving? I feel like with programming, the impression I've gotten is that, most of the time, code has interlocking dependencies and can be interconnected to the point of acting like a tangle of spaghetti. Is there any way to follow a particular strand of dependencies to the base and repeat that until you have your spaghetti sorted? 3000 lines is a lot to try and handle for the multitude of dependencies it sounds like you're working with.
Coming from a project management standpoint, I would ask yourself what the end goal benefit the organization is trying to realize and if taking a breath and getting your hands dirty will better achieve that than starting fresh with all the modern tools you have at your disposal.
Oh god this is giving me flashbacks. I get the feeling you're a computational chemist or something like that, so hello from experimental biophysics land *waves*
The software that controls my experiments is a scripting language that comes bundled with a non-free application my advisor wrote in 1995. Everything is global, there are no functions (only GOTO/GOSUB/subroutines), and you can't split a program into multiple files. There are no loop constructs, so you see a lot of this (baffling line-labels and all):
I've started rewriting this thing like three times, twice in Python and once in Rust, but the actual PhD work has gotten in the way each time. Here are the things I ask myself about the program:
Excellent guess, I am indeed a computational chemist!
Those questions are really poignant and have given me a lot to think about. I spoke with my boss today and we have come to the conclusion that we're going to reimplement the core functionality of the code (the meat and potatoes, as it were) in LAMMPS.
Sounds like a re-write is in order to me. If not for yourself, at least for the sake of the next guy who comes along and has to deal with it. Besides, 3000 lines of FORTRAN can probably be re-implemented in 1000 lines of Python, which all things considered really isn't that much code.
My boss and I came to this conclusion today. We will be implementing the core functionality in a program that is the industry standard for this type of research.