I want to finally understand how to compile in C well, any resource recommendations?
I am a scientist who has semi-frequently written code in C (and other compiled languages like Fortran). When it comes time to compile, I typically tape together a Makefile from past projects and hope for the best, but even then I spend more time than I'd like to admit trying to figure out why my project is not being compiled or linked correctly. I've had a hard time finding any resources that aren't extremely surface level, or else are not behind some type of paywall. Can anyone recommend me some reading so that I can confidently write Makefiles and compile programs and actually understand what the different flags and commands are doing? I don't need extreme "under the hood" information as I don't intend to do things like write my own compiler, I just want to understand the process a little better. Help a scientist out!
I don't know much about writing makefiles, and I'm sure someone will come by later and give a more comprehensive and sensible answer, but what I'd suggest is keep it simple stupid. Start off with a blank makefile, learn the basics, and then focus on gcc flags where the real meat and potatoes is.
For the most part I'm happy just doing
gcc -O2 -j8 foo.c
where my CPU has 8 threads. This will compile with optimization level 2 (faster to execute, slower to compile), and 8 threads. If I know I'm going to be running the binary on the current CPU I'm using I'll add-march=native
or if I know it'll be run on a different, but still modern CPU I'll do something like-march=skylake
to enable support for things like SSE4 or AVX or whatnot.In a makefile, to keep things neat and maintainable you'll want to put these into variables at the top of the file. You'll want to put those compiler options into a CFLAGS variable, and your source files into another variable, etc, like so:
Or if you want to compile all the C files in the current directory you could do like
SOURCES=$(wildcard *.c)
Then further down you'd use a syntax like this to use your variables:
The makefile mostly just says what input files you want to compile, and what options you want the compiler to use. If you are targeting a specific distro you can write up an install target to tell it where to put the resulting binaries, etc. The meat and potatoes of optimizing compilation comes down to which CFLAGS you give gcc.
The comprehensive and exhaustive documentation for gcc can be found here. While doing surface level research for this reply I found this basic makefile tutorial here, another surface level tutorial here, and a good list of optimizations for gcc here.
You can also use
nproc
on Linux to assign some number of processor cores to the compiler automatically. For example, in my Arch Linux machine's/etc/makepkg.conf
file I have the following:So any packages I build myself using Arch's
makepkg
command will automatically compile with O3 and native optimizations, and will spawn as many compile jobs as there are processor cores, plus two.I'm curious about the
n+2
part. Why do we want to spawn two extra processes? I've heard people say you should use-j
with a couple less threads than you have, just for system overhead. So, if nproc doesn't account for hypertheading and only targets physical cores that'd make sense to me; Otherwise are we trying to overprovision here?nproc
counts the number of threads/logical cores. For example, my Ryzen 7 5800X reports16
.So
nproc + 2
is overprovisioning. However, I'm not sure why I have it set up that way.I vaguely remember reading, years ago, some answer on Stack Overflow that showed
nproc + 2
was better for large compilation jobs than justnproc
or evennproc * 2
, but I could also be misremembering. This was done by the hungariantoast of seven years ago, so honestly there's no telling why I have it set that way.Hmm. I feel like this whole field of research is surrounded in magic rituals and it probably comes down to your specific codebase, your specific computer, how often you need to recompile, and what else you're doing at the same time.
The only large project I compile semi-often is the linux kernel, and I usually set it to 8 threads when my CPU actually has 12. Mind you, I'm using a laptop as my main computer, so I'm likely watching youtube and/or playing balatro at the same time so I'm willing to wait an extra minute if it doesn't impact those things, lol.
Is your goal to better understand Makefiles, compiler and linkers, or just know how to make stuff work without needing to tinker with it for a long time?
Personally, and I'm sure this is blasphemy to some people, I'm not interested in writing Makefiles. I just want my stuff to work and hand-writing Makefiles is rarely the most efficient path to getting there for me. This is probably biased by that I rarely create projects targeted at only one OS, and almost never targeted at only one compiler. Because of this I instead favor abstraction tools. I'd rather use something like
add_dependency(foo bar)
from CMake to have "foo" take a dependency on "bar" than to need to manually set the right flags on the compiler and linker to get header paths and object files imported correctly. Usually, though admittedly not always, these abstractions provide far better names than the flags they get transformed to which can make them easier to understand. It also means that if my project gets big enough that parallel building with an automatically generated dependency graph becomes worth it I can usually just change that tool from targeting make to ninja.If you want to keep with Makefiles or learn them in depth then what I said is not helpful at all. For that my advice would be to start a Makefile from scratch. Don't copy anything in at all. From there, go bit by bit manually putting in just the exact targets you want and the command line strings you want to be executed. It won't be pretty, but you'll understand it. Then, once you understand what each thing is doing, start adding prerequisites, then variables, then branches, then wildcards, and so on. Basically, start by treating it like its no more than a collection of .sh files (ex. put what you'd want to put into clean.sh into
clean:
) and then slowly use more Makefile features only when you're sure why you're doing it. I'd start with something as bland asthen
and so on. Because Makefiles really don't have a ton of features you're likely to need immediately you will likely have a pretty solid handle on what the Makefiles themselves are doing after a few iterations. Then you need to dive into the documentation of your preferred compiler and linker for figuring out the flags and such. If you want a specific resource https://makefiletutorial.com/ seems fine at describing the concepts, although I wouldn't try to follow it like a tutorial but would instead just read through what it says about each concept (targets, prereqs, variables, etc.) and apply them as needed.
The core functionality of
make
is simple and very general, extending beyond C and software development: given a target file, its prerequisite files and a recipe that generates the target, it runs the command only if any of the prerequisite files are newer than the target. Rules can be dependent: if the target of one rule is a prerequisite of another rule, it'll evaluate the former rule first.For make, I like the Introduction to Makefiles in the GNU Make manual.
For the compiler front-end, there are a lot of flags and options for more or less obscure use cases, so it really depends on what you do. The GCC manual provides a helpful summary. Most importantly I think, you pass
-c
to tell the frontend to compile the given source files into an object file without linking it. You pass-o
with a name to specify an output name. I usually pass-Wall -O3
when compiling to enable all warnings and a high degree of optimization respectively. For linking to system libraries I usepkg-config
to generate the appropriate flags to the linker and compiler.So for a simple single-file C application, in the shell I might invoke
The corresponding Makefile might be something like
or, because GNU make has pre-defined implicit rules for these things based on a set of variables
Note that the Makefile only really shines when you are linking multiple independent object files. Then, if you have specified the dependencies correctly, it'll only recompile the objects to which the dependencies have changed, and relink, saving you a whole lot of compilation of source files that haven't changed.
In addition to the great info in the comments here, I'd like to share this old topic from 2018:
How do I hack makefiles?
There was also another topic about makefiles posted earlier this year:
Be aware of the Makefile effect
But I haven't read it yet