15 votes

I want to finally understand how to compile in C well, any resource recommendations?

I am a scientist who has semi-frequently written code in C (and other compiled languages like Fortran). When it comes time to compile, I typically tape together a Makefile from past projects and hope for the best, but even then I spend more time than I'd like to admit trying to figure out why my project is not being compiled or linked correctly. I've had a hard time finding any resources that aren't extremely surface level, or else are not behind some type of paywall. Can anyone recommend me some reading so that I can confidently write Makefiles and compile programs and actually understand what the different flags and commands are doing? I don't need extreme "under the hood" information as I don't intend to do things like write my own compiler, I just want to understand the process a little better. Help a scientist out!

7 comments

  1. [5]
    pseudolobster
    Link
    I don't know much about writing makefiles, and I'm sure someone will come by later and give a more comprehensive and sensible answer, but what I'd suggest is keep it simple stupid. Start off with...

    I don't know much about writing makefiles, and I'm sure someone will come by later and give a more comprehensive and sensible answer, but what I'd suggest is keep it simple stupid. Start off with a blank makefile, learn the basics, and then focus on gcc flags where the real meat and potatoes is.

    For the most part I'm happy just doing gcc -O2 -j8 foo.c where my CPU has 8 threads. This will compile with optimization level 2 (faster to execute, slower to compile), and 8 threads. If I know I'm going to be running the binary on the current CPU I'm using I'll add -march=native or if I know it'll be run on a different, but still modern CPU I'll do something like -march=skylake to enable support for things like SSE4 or AVX or whatnot.

    In a makefile, to keep things neat and maintainable you'll want to put these into variables at the top of the file. You'll want to put those compiler options into a CFLAGS variable, and your source files into another variable, etc, like so:

    CC = gcc
    CFLAGS = -O2 -j8 -march=native
    SOURCES = foo1.c foo2.c 
    OBJECTS = foo1.o foo2.o
    

    Or if you want to compile all the C files in the current directory you could do like SOURCES=$(wildcard *.c)

    Then further down you'd use a syntax like this to use your variables:

    program: $(OBJECTS)
         $(CC) $(CFLAGS) $(OBJECTS) -o program
    

    The makefile mostly just says what input files you want to compile, and what options you want the compiler to use. If you are targeting a specific distro you can write up an install target to tell it where to put the resulting binaries, etc. The meat and potatoes of optimizing compilation comes down to which CFLAGS you give gcc.

    The comprehensive and exhaustive documentation for gcc can be found here. While doing surface level research for this reply I found this basic makefile tutorial here, another surface level tutorial here, and a good list of optimizations for gcc here.

    10 votes
    1. [4]
      hungariantoast
      Link Parent
      You can also use nproc on Linux to assign some number of processor cores to the compiler automatically. For example, in my Arch Linux machine's /etc/makepkg.conf file I have the following:...

      You can also use nproc on Linux to assign some number of processor cores to the compiler automatically. For example, in my Arch Linux machine's /etc/makepkg.conf file I have the following:

      CFLAGS="-march=native -mtune=native -O3"
      MAKEFLAGS="-j$(expr `nproc` + 2)"
      

      So any packages I build myself using Arch's makepkg command will automatically compile with O3 and native optimizations, and will spawn as many compile jobs as there are processor cores, plus two.

      4 votes
      1. [3]
        pseudolobster
        Link Parent
        I'm curious about the n+2 part. Why do we want to spawn two extra processes? I've heard people say you should use -j with a couple less threads than you have, just for system overhead. So, if...

        I'm curious about the n+2 part. Why do we want to spawn two extra processes? I've heard people say you should use -j with a couple less threads than you have, just for system overhead. So, if nproc doesn't account for hypertheading and only targets physical cores that'd make sense to me; Otherwise are we trying to overprovision here?

        1 vote
        1. [2]
          hungariantoast
          (edited )
          Link Parent
          nproc counts the number of threads/logical cores. For example, my Ryzen 7 5800X reports 16. So nproc + 2 is overprovisioning. However, I'm not sure why I have it set up that way. I vaguely...

          nproc counts the number of threads/logical cores. For example, my Ryzen 7 5800X reports 16.

          So nproc + 2 is overprovisioning. However, I'm not sure why I have it set up that way.

          I vaguely remember reading, years ago, some answer on Stack Overflow that showed nproc + 2 was better for large compilation jobs than just nproc or even nproc * 2, but I could also be misremembering. This was done by the hungariantoast of seven years ago, so honestly there's no telling why I have it set that way.

          3 votes
          1. pseudolobster
            Link Parent
            Hmm. I feel like this whole field of research is surrounded in magic rituals and it probably comes down to your specific codebase, your specific computer, how often you need to recompile, and what...

            Hmm. I feel like this whole field of research is surrounded in magic rituals and it probably comes down to your specific codebase, your specific computer, how often you need to recompile, and what else you're doing at the same time.

            The only large project I compile semi-often is the linux kernel, and I usually set it to 8 threads when my CPU actually has 12. Mind you, I'm using a laptop as my main computer, so I'm likely watching youtube and/or playing balatro at the same time so I'm willing to wait an extra minute if it doesn't impact those things, lol.

            2 votes
  2. text_garden
    Link
    The core functionality of make is simple and very general, extending beyond C and software development: given a target file, its prerequisite files and a recipe that generates the target, it runs...

    The core functionality of make is simple and very general, extending beyond C and software development: given a target file, its prerequisite files and a recipe that generates the target, it runs the command only if any of the prerequisite files are newer than the target. Rules can be dependent: if the target of one rule is a prerequisite of another rule, it'll evaluate the former rule first.

    For make, I like the Introduction to Makefiles in the GNU Make manual.

    For the compiler front-end, there are a lot of flags and options for more or less obscure use cases, so it really depends on what you do. The GCC manual provides a helpful summary. Most importantly I think, you pass -c to tell the frontend to compile the given source files into an object file without linking it. You pass -o with a name to specify an output name. I usually pass -Wall -O3 when compiling to enable all warnings and a high degree of optimization respectively. For linking to system libraries I use pkg-config to generate the appropriate flags to the linker and compiler.

    So for a simple single-file C application, in the shell I might invoke

    cc -Wall -O3 $(pkg-config --cflags libpcap) -c application.o application.c # compile into object file
    cc application.o $(pkg-config --libs libpcap) -o application # link into executable.
    

    The corresponding Makefile might be something like

    application: application.o
        cc -Wall -O3 $(shell pkg-config --cflags libpcap) -c application.o application.c
    
    application.o: application.c application.h
        cc application.o $(shell pkg-config --libs libpcap) -o application
    

    or, because GNU make has pre-defined implicit rules for these things based on a set of variables

    CFLAGS += -Wall -O3 $(shell pkg-config --cflags libpcap)
    LDLIBS += $(shell pkg-config --libs libpcap)
    
    application: application.o
    application.o: application.c application.h
    

    Note that the Makefile only really shines when you are linking multiple independent object files. Then, if you have specified the dependencies correctly, it'll only recompile the objects to which the dependencies have changed, and relink, saving you a whole lot of compilation of source files that haven't changed.

    4 votes
  3. hungariantoast
    Link
    In addition to the great info in the comments here, I'd like to share this old topic from 2018: How do I hack makefiles? There was also another topic about makefiles posted earlier this year: Be...

    In addition to the great info in the comments here, I'd like to share this old topic from 2018:

    How do I hack makefiles?

    There was also another topic about makefiles posted earlier this year:

    Be aware of the Makefile effect

    But I haven't read it yet

    3 votes