14 votes

How do you structure larger projects?

I'll be writing a relatively large piece of scientific code for the first time, and before I begin I would at least like to outline how the project will be structured so that I don't run into headaches later on. The problem is, I don't have much experience structuring large projects. Up until now most of the code I have written as been in the form of python scripts that I string together to form an ad-hoc pipeline for analysis, or else C++ programs that are relatively self contained. My current project is much larger in scope. It will consist of four main 'modules' (I'm not sure if this is the correct term, apologies if not) each of which consist of a handful of .cpp and .h files. The schematic I have in mind for how it should look is something like:

src
 ├──Module1 (Initializer)
 │         ├ file1.cpp
 │         ├ file1.h
 │         │...
 │         └ Makefile
 ├───Module2 (solver)
 │          ├ file1.cpp
 │          ├ file1.h
 │          │...
 │          └ Makefile
 ├───Module3 (Distribute)
 │          ├ file1.cpp
 │          └Makefile 
 └ Makefile

Basically, I build each self-contained 'module', and use the object files produced there to build my main program. Is there anything I should keep in mind here, or is this basically how such a project should be structured?

I imagine the particularly structure will be dependent on my project, but I am more interested in general principles to keep in mind.

6 comments

  1. [3]
    Farox
    Link
    There are tons of books on this out there but this is where years of experience with large systems come in. So I doubt anyone can tell you how to do this but there are principles... which again,...

    There are tons of books on this out there but this is where years of experience with large systems come in. So I doubt anyone can tell you how to do this but there are principles... which again, everyone values differently.

    But searching for and learning about how to architect enterprise software should set you up for this, and you should be just in time for those amazon orders to arrive before the weekend. :)

    Also have a look at design patterns. These are common architectural solutions to common problems. Even if you don't implement them 1:1 it's just good to have those in the back of your head.

    Also, naming things is hard.

    6 votes
    1. [2]
      gpl
      Link Parent
      Thanks so much, I didn't expect to get an explicit road map since I assumed these things are a whole topic in and of themselves :) Would you happen to have recommendations for reading that you...

      Thanks so much, I didn't expect to get an explicit road map since I assumed these things are a whole topic in and of themselves :) Would you happen to have recommendations for reading that you found particularly useful, or should I just browse and see what's popular?

      2 votes
      1. Farox
        Link Parent
        My pleasure. The problem is really that it's such a vast subject and if you really master it you're pretty much "there", raking in the big bucks. I personally enjoyed "Patterns of Enterprise...

        My pleasure. The problem is really that it's such a vast subject and if you really master it you're pretty much "there", raking in the big bucks.

        I personally enjoyed "Patterns of Enterprise Application Architecture", I heard good things about "Code Complete: A Practical Handbook of Software Construction" and started my career reading "Object-Oriented Analysis and Design with Applications", this is like the ancient bible.

        Other than that I don't read many blogs anymore and things like that on the subject, so I can't help you there.

        Good luck!

        3 votes
  2. [2]
    duality
    Link
    Could you tell us a bit more about the outcome these modules are trying to achieve? Are they doing large amounts of ETL of data? Data modeling? Etc? Most starter templates for various applications...

    Could you tell us a bit more about the outcome these modules are trying to achieve? Are they doing large amounts of ETL of data? Data modeling? Etc?

    Most starter templates for various applications types will give you a decent starter organization of your source and header files. But in my experience it’s ultimately dependent on your use-case.

    2 votes
    1. gpl
      Link Parent
      The entire program is a simulation of a physical system. The first module sets up a grid and initializes a system on it (essentially amounts to looping over an array and assigning values as to...

      The entire program is a simulation of a physical system. The first module sets up a grid and initializes a system on it (essentially amounts to looping over an array and assigning values as to reproduce the statistics of my initial state). Another module advances this initial state in time through through a split-operator PDE solver. I'm supposing another module will be I/O to write these states to disk, and another will be a driver for parallelization so this whole thing can take place across multiple nodes. I have some of these modules written either partially or completely.

      In any case, as per other suggestions I'm definitely going to try and find a book or template that is similar to my use case here.

      2 votes
  3. spit-evil-olive-tips
    Link
    I assume the Makefiles in your example directory tree are hand-written? That's one of the first things that will fall apart as project size increases and you start to break things out into modules...

    I assume the Makefiles in your example directory tree are hand-written? That's one of the first things that will fall apart as project size increases and you start to break things out into modules like you're doing now.

    You very likely want something more complicated than plain ol' make. Since you're in C++ land the first thing I'd recommend looking at is CMake. It generates Makefiles on your behalf from a higher-level description, so it should take out a lot of the tedium / error-prone-ness of managing those Makefiles yourself, while still being fairly familiar.

    If you want a peek at more complicated build systems, I'd recommend reading about Blaze, which was developed internally at Google then open-sourced as Bazel. There's copycat versions of Blaze, as well - Pants and Buck, among others. Most of those clones were written by ex-Googlers who went to another company and missed Blaze so much they wrote their own version of it.

    The key thing those build systems do is track dependencies between modules - imagine instead of the 3 modules you laid out above, there's 30, or 300. Each module can depend on one or more other modules, and so the build system needs to understand that if module X changes, everything that depends (directly or indirectly) on module X might need to be rebuilt to pick up those changes, and should have its tests run to make sure it wasn't broken by those changes to module X.

    2 votes