15 votes

I want to finally understand how to compile in C well, any resource recommendations?

I am a scientist who has semi-frequently written code in C (and other compiled languages like Fortran). When it comes time to compile, I typically tape together a Makefile from past projects and hope for the best, but even then I spend more time than I'd like to admit trying to figure out why my project is not being compiled or linked correctly. I've had a hard time finding any resources that aren't extremely surface level, or else are not behind some type of paywall. Can anyone recommend me some reading so that I can confidently write Makefiles and compile programs and actually understand what the different flags and commands are doing? I don't need extreme "under the hood" information as I don't intend to do things like write my own compiler, I just want to understand the process a little better. Help a scientist out!

8 comments

  1. [5]
    pseudolobster
    Link
    I don't know much about writing makefiles, and I'm sure someone will come by later and give a more comprehensive and sensible answer, but what I'd suggest is keep it simple stupid. Start off with...

    I don't know much about writing makefiles, and I'm sure someone will come by later and give a more comprehensive and sensible answer, but what I'd suggest is keep it simple stupid. Start off with a blank makefile, learn the basics, and then focus on gcc flags where the real meat and potatoes is.

    For the most part I'm happy just doing gcc -O2 -j8 foo.c where my CPU has 8 threads. This will compile with optimization level 2 (faster to execute, slower to compile), and 8 threads. If I know I'm going to be running the binary on the current CPU I'm using I'll add -march=native or if I know it'll be run on a different, but still modern CPU I'll do something like -march=skylake to enable support for things like SSE4 or AVX or whatnot.

    In a makefile, to keep things neat and maintainable you'll want to put these into variables at the top of the file. You'll want to put those compiler options into a CFLAGS variable, and your source files into another variable, etc, like so:

    CC = gcc
    CFLAGS = -O2 -j8 -march=native
    SOURCES = foo1.c foo2.c 
    OBJECTS = foo1.o foo2.o
    

    Or if you want to compile all the C files in the current directory you could do like SOURCES=$(wildcard *.c)

    Then further down you'd use a syntax like this to use your variables:

    program: $(OBJECTS)
         $(CC) $(CFLAGS) $(OBJECTS) -o program
    

    The makefile mostly just says what input files you want to compile, and what options you want the compiler to use. If you are targeting a specific distro you can write up an install target to tell it where to put the resulting binaries, etc. The meat and potatoes of optimizing compilation comes down to which CFLAGS you give gcc.

    The comprehensive and exhaustive documentation for gcc can be found here. While doing surface level research for this reply I found this basic makefile tutorial here, another surface level tutorial here, and a good list of optimizations for gcc here.

    10 votes
    1. [4]
      hungariantoast
      Link Parent
      You can also use nproc on Linux to assign some number of processor cores to the compiler automatically. For example, in my Arch Linux machine's /etc/makepkg.conf file I have the following:...

      You can also use nproc on Linux to assign some number of processor cores to the compiler automatically. For example, in my Arch Linux machine's /etc/makepkg.conf file I have the following:

      CFLAGS="-march=native -mtune=native -O3"
      MAKEFLAGS="-j$(expr `nproc` + 2)"
      

      So any packages I build myself using Arch's makepkg command will automatically compile with O3 and native optimizations, and will spawn as many compile jobs as there are processor cores, plus two.

      4 votes
      1. [3]
        pseudolobster
        Link Parent
        I'm curious about the n+2 part. Why do we want to spawn two extra processes? I've heard people say you should use -j with a couple less threads than you have, just for system overhead. So, if...

        I'm curious about the n+2 part. Why do we want to spawn two extra processes? I've heard people say you should use -j with a couple less threads than you have, just for system overhead. So, if nproc doesn't account for hypertheading and only targets physical cores that'd make sense to me; Otherwise are we trying to overprovision here?

        1 vote
        1. [2]
          hungariantoast
          (edited )
          Link Parent
          nproc counts the number of threads/logical cores. For example, my Ryzen 7 5800X reports 16. So nproc + 2 is overprovisioning. However, I'm not sure why I have it set up that way. I vaguely...

          nproc counts the number of threads/logical cores. For example, my Ryzen 7 5800X reports 16.

          So nproc + 2 is overprovisioning. However, I'm not sure why I have it set up that way.

          I vaguely remember reading, years ago, some answer on Stack Overflow that showed nproc + 2 was better for large compilation jobs than just nproc or even nproc * 2, but I could also be misremembering. This was done by the hungariantoast of seven years ago, so honestly there's no telling why I have it set that way.

          3 votes
          1. pseudolobster
            Link Parent
            Hmm. I feel like this whole field of research is surrounded in magic rituals and it probably comes down to your specific codebase, your specific computer, how often you need to recompile, and what...

            Hmm. I feel like this whole field of research is surrounded in magic rituals and it probably comes down to your specific codebase, your specific computer, how often you need to recompile, and what else you're doing at the same time.

            The only large project I compile semi-often is the linux kernel, and I usually set it to 8 threads when my CPU actually has 12. Mind you, I'm using a laptop as my main computer, so I'm likely watching youtube and/or playing balatro at the same time so I'm willing to wait an extra minute if it doesn't impact those things, lol.

            2 votes
  2. zestier
    (edited )
    Link
    Is your goal to better understand Makefiles, compiler and linkers, or just know how to make stuff work without needing to tinker with it for a long time? Personally, and I'm sure this is blasphemy...

    Is your goal to better understand Makefiles, compiler and linkers, or just know how to make stuff work without needing to tinker with it for a long time?

    Personally, and I'm sure this is blasphemy to some people, I'm not interested in writing Makefiles. I just want my stuff to work and hand-writing Makefiles is rarely the most efficient path to getting there for me. This is probably biased by that I rarely create projects targeted at only one OS, and almost never targeted at only one compiler. Because of this I instead favor abstraction tools. I'd rather use something like add_dependency(foo bar) from CMake to have "foo" take a dependency on "bar" than to need to manually set the right flags on the compiler and linker to get header paths and object files imported correctly. Usually, though admittedly not always, these abstractions provide far better names than the flags they get transformed to which can make them easier to understand. It also means that if my project gets big enough that parallel building with an automatically generated dependency graph becomes worth it I can usually just change that tool from targeting make to ninja.

    If you want to keep with Makefiles or learn them in depth then what I said is not helpful at all. For that my advice would be to start a Makefile from scratch. Don't copy anything in at all. From there, go bit by bit manually putting in just the exact targets you want and the command line strings you want to be executed. It won't be pretty, but you'll understand it. Then, once you understand what each thing is doing, start adding prerequisites, then variables, then branches, then wildcards, and so on. Basically, start by treating it like its no more than a collection of .sh files (ex. put what you'd want to put into clean.sh into clean:) and then slowly use more Makefile features only when you're sure why you're doing it. I'd start with something as bland as

    program:
    	gcc main.c -o program
    

    then

    program: main.c
    	gcc main.c -o program
    

    and so on. Because Makefiles really don't have a ton of features you're likely to need immediately you will likely have a pretty solid handle on what the Makefiles themselves are doing after a few iterations. Then you need to dive into the documentation of your preferred compiler and linker for figuring out the flags and such. If you want a specific resource https://makefiletutorial.com/ seems fine at describing the concepts, although I wouldn't try to follow it like a tutorial but would instead just read through what it says about each concept (targets, prereqs, variables, etc.) and apply them as needed.

    6 votes
  3. text_garden
    Link
    The core functionality of make is simple and very general, extending beyond C and software development: given a target file, its prerequisite files and a recipe that generates the target, it runs...

    The core functionality of make is simple and very general, extending beyond C and software development: given a target file, its prerequisite files and a recipe that generates the target, it runs the command only if any of the prerequisite files are newer than the target. Rules can be dependent: if the target of one rule is a prerequisite of another rule, it'll evaluate the former rule first.

    For make, I like the Introduction to Makefiles in the GNU Make manual.

    For the compiler front-end, there are a lot of flags and options for more or less obscure use cases, so it really depends on what you do. The GCC manual provides a helpful summary. Most importantly I think, you pass -c to tell the frontend to compile the given source files into an object file without linking it. You pass -o with a name to specify an output name. I usually pass -Wall -O3 when compiling to enable all warnings and a high degree of optimization respectively. For linking to system libraries I use pkg-config to generate the appropriate flags to the linker and compiler.

    So for a simple single-file C application, in the shell I might invoke

    cc -Wall -O3 $(pkg-config --cflags libpcap) -c application.o application.c # compile into object file
    cc application.o $(pkg-config --libs libpcap) -o application # link into executable.
    

    The corresponding Makefile might be something like

    application: application.o
        cc -Wall -O3 $(shell pkg-config --cflags libpcap) -c application.o application.c
    
    application.o: application.c application.h
        cc application.o $(shell pkg-config --libs libpcap) -o application
    

    or, because GNU make has pre-defined implicit rules for these things based on a set of variables

    CFLAGS += -Wall -O3 $(shell pkg-config --cflags libpcap)
    LDLIBS += $(shell pkg-config --libs libpcap)
    
    application: application.o
    application.o: application.c application.h
    

    Note that the Makefile only really shines when you are linking multiple independent object files. Then, if you have specified the dependencies correctly, it'll only recompile the objects to which the dependencies have changed, and relink, saving you a whole lot of compilation of source files that haven't changed.

    4 votes
  4. hungariantoast
    Link
    In addition to the great info in the comments here, I'd like to share this old topic from 2018: How do I hack makefiles? There was also another topic about makefiles posted earlier this year: Be...

    In addition to the great info in the comments here, I'd like to share this old topic from 2018:

    How do I hack makefiles?

    There was also another topic about makefiles posted earlier this year:

    Be aware of the Makefile effect

    But I haven't read it yet

    3 votes