Compiling and Running C and C++

September 28, 2024

You’ll be coming across a lot of software written in C and C++, and it’s useful to have a primer on how to build and run it. This is true even if you’re not writing your own software in these languages. There are various build systems for these languages, and we’ll need to take a quick look at all of them.

Invoking `gcc` or `clang` manually.

For projects with a single source file, or perhaps a small number of them, it is fine to invoke the compiler manually. If you want to play with this, see the article What You Need To Know About C and creating the three files test.h, test.c, and main.c in an empty directory.

Note that there are two widely-used open source C/C++ compilers in the Linux ecosystem.

GCC is a compiler from the Free Software Foundation, originally written by Richard Stallman in the late 1980s, and licensed under the GNU General Public License. It is an acronym, originally standing for GNU C Compiler, but now officially standing for GNU Compiler Collection, since it supports more than C and C++. It is invoked with the command gcc. It is highly portable and prioritizes supporting nearly any microprocessor family with any significant use.
Clang (C Lang) comes from the LLVM Project. It is licensed under the much more permissive Apache license. Its development was largely driven by Apple, as it dislikes the copyleft nature of the GPL. It is invoked with the command clang. It supports fewer processor architectures than GCC but still supports all the ones that are largely relevant to Linux application servers, namely x86_64 and ARM64 (often called aarch64), as well as a couple others you’re less likely to need.

Either compiler should work just fine in building any software using established features of either language. They sometimes differ in their support of cutting-edge new language features.

The following examples use GCC, but both compilers accept similar basic options, and replacing gcc with clang will work just fine.

If you run:

gcc test.c main.c
ls -l

you’ll see that you have an additional mysterious file, a.out. You’ll notice also that it’s executable. If you examine it with the file a.out command, you’ll see that it’s

a.out: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=af5f4a20130ef46d2e0d6f8a0448864e8f05cac2, for GNU/Linux 4.4.0, not stripped

if you’re on an x86-64 computer, or

a.out: ELF 64-bit LSB pie executable, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, BuildID[sha1]=f406d8d44875f4bc40caf86d9ca6a7f4a6cb6536, for GNU/Linux 3.7.0, not stripped

if you’re on ARM/aarch64. You can go ahead and run it, with ./a.out and you’ll see the output

I'm in test!
I'm in main! Value is 11

It is our multi-file C program!

Note for a single-file program, of course, just provide the single filename.

Maybe you don’t like the name a.out, which is based on very old history of an executable format called “a.out”, which was used by Linux in the 1990s and predates the current ELF format. You can give the executable any name you want:

gcc main.c test.c -o testprog

Now we have testprog instead of a.out; otherwise, they’re identical.

What if we only specify one of the C files, instead of both?

gcc main.c

Well that didn’t go as well as we may have hoped!

/usr/bin/ld: /tmp/cc2xpDzo.o: warning: relocation against `shared_val' in read-only section `.text'
/usr/bin/ld: /tmp/cc2xpDzo.o: in function `fun_in_main':
main.c:(.text+0x6): undefined reference to `shared_val'
/usr/bin/ld: /tmp/cc2xpDzo.o: in function `main':
main.c:(.text+0x34): undefined reference to `shared_val'
/usr/bin/ld: main.c:(.text+0x42): undefined reference to `fun_in_test'
/usr/bin/ld: warning: creating DT_TEXTREL in a PIE
collect2: error: ld returned 1 exit status

You’ll note a reference to fun_in_test. That is a function that is defined in our test.c source file, which we did not include in this compilation run. In main.c we include the header test.h, which declares the function. But the linker has no idea what to do with it. It can’t make an executable program without filling in that important gap!

What’s the linker, you ask? That is the program /usr/bin/ld, which you saw in the output above, and it is responsible for taking object files emitted by the compiler, and putting them together into an executable. In many cases, the compiler calls it automatically. But you can run it yourself too. Let’s check out the object files and how we get them:

gcc -c test.c main.c

Now you’ll see two new files in the directory, test.o and main.o. Running file main.c will show that it is ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped. These files contain the compiled binary code from our source files, but not in a way that can be directly executed. And, they contain references to things that are defined in other files. To create an executable, it is necessary to provide all the required object files to the linker. We can give it a try:

ld main.o test.o

Uh oh:

ld: warning: cannot find entry symbol _start; defaulting to 0000000000400120
ld: main.o: in function `fun_in_main':
main.c:(.text+0x20): undefined reference to `printf'
ld: test.o: in function `fun_in_test':
test.c:(.text+0x10): undefined reference to `puts'

What is this? You’ll recognize printf as a function we called in the program. It turns out that puts is also a function provided by the C library, likely called internally by printf. So ld is not seeing our C library! But that’s easy to fix:

ld main.o test.o -lc

The -l option tells the linker to find a library with the following name. In this case, we just want libc (libraries always start with lib) so it’s just -lc. Now, we have our a.out file again!

The compiler runs the linker automatically with the -o option, or if the -c option (meaning compile to object file only) is not specified in the command line. It also supplies -lc to the linker automatically. However, if we use another library, we may need to specify it ourselves. The compiler itself accepts the -l parameter, and just passes it on to the linker.

Other GCC/Clang options to know about:

-g: Adds debugging symbols. If you want to run your compiled program through a debugger such as gdb, you want it to contain debugging information. This lets you see variable and function names in their appropriate places, instead of just raw pointer values. It increases the size of the executable by a little bit.

-static: Puts everything required by the executable in the executable file itself. Normally, Linux executables use dynamic linking to libraries. This is usually great; it allows the system to share library code on disk and in RAM instead of duplicate it for every executable. Say we run gcc test.c main.c -o prog to compile our program. Its size is about 15k on my system. We can use ldd prog to see what it’s linking against:

	linux-vdso.so.1 (0x00007ed140b4e000)
	libc.so.6 => /usr/lib/libc.so.6 (0x00007ed1408fd000)
	/lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007ed140b50000)

The first, linux-vdso, is a virtual library provided by the kernel and mapped into all processes. The second, libc, is the main thing we’re actually linking with. It contains all the main functions provided by the C library. The last, ld-linux, is used by the dynamic linker itself.

Now, what if we gcc test.c main.c -o prog -static instead. Our program is now over 750k on disk, and the ldd prog output simply shows not a dynamic executable. libc is actually included in the executable! This might be handy, for example, if you want an executable that can be run on absolutely any sort-of-recent Linux system without having to worry about which library versions are installed.

-O: Optimization control. -O0 turns off optimization and is recommended with -g. -O2 enabled many optimizations; -O3 enables even more (though some of those may increase executable size in a tradeoff to get faster runtime execution). -Os optimizes for the smallest possible executable size. There are others; you can see the man pages for gcc or clang.

-march=: Tells the compiler which processor architecture to target. For example, -march=broadwell would generate code that may contain processor instructions that exist only in the Intel Haswell (or later) CPUs; the program may fail to execute at all on earlier CPUs.

-Wall: Turns on

Makefiles and the `make` command

An old but simple build system that automates much of the monotony of running the compiler directly is the Makefile. It basically lists out targets and what to do if they are invoked. It will often contain a hierarchy of commands telling it what to do with certain kinds of files. For example, for .c files, it will run gcc with a set of options defined in a variable. Then, with .o files, it will invoke the linker to produce an executable.

make checks file modification times to only recompile or do other operations if needed. For example, if test.c has a later modification time than test.o, then it will recompile it; if not, it will use the existing object file. This saves a lot of time in large projects where only certain files were modified.

It also allows parallel builds with the -j option, specifying the number of jobs to run at the same time. This is a huge timesaver on builds with many source files. It is often recommended to use the number of CPU cores you have, plus two. So if you’re on a six-core system you might use -j8. This is often ideal because there can be some CPU idle time in individual processes.

Here is a very simple Makefile for our example program with two C files:

CC = clang  # or gcc if you want
CFLAGS = -O2
LDFLAGS =
OBJFILES = main.o test.o
TARGET = prog

all: $(TARGET)

$(TARGET): $(OBJFILES)
	$(CC) $(LDFLAGS) -o $(TARGET) $(OBJFILES)

%.o: %.c
	$(CC) $(CFLAGS) -c $< -o $@

clean:
	rm -f *.o $(TARGET)

Note that all the indentions in a Makefile are tabs, not spaces. Simply place this in the same directory as the other files. Running make will automatically see the Makefile and show what it is doing:

clang   -O2 -c main.c -o main.o
clang   -O2 -c test.c -o test.o
clang    -o prog main.o test.o

And we can see that the object files and the executable are now in the directory. If we run make clean, it does:

rm -f *.o prog

and all the generated files are gone!

Note the $< and $@ in the Makefile. In rules that translate a source to a target (which is what %.o: %.c does), the former is a macro that translates to each source filename, and the latter is a macro that translates to the target filename.

You can add much more sophistry to Makefiles, but we are endeavoring here to show only basic usage. There is far more information, of course, in make’s documentation. That will be the same with the other tools we describe below. This article is intended to be a summary of what you need to know to deal with common situations you will run into.

GNU Autotools

Many open source programs for Linux use GNU Autotools to build and install. The existence of a configure script in the project’s root directory is a giveaway. One can usually build and install these with the ubiquitous steps:

./configure
make
sudo make install

configure is a script that is automatically generated by Autotools based on other files that describe what it needs to do and the program’s dependencies. Its job is to check that the system meets the requirements of the program, find the location of required dependencies, and flag whether optional components have the required dependencies. If it is unable to build the software on the current system, configure will exit with an error. If it is successful, it will generate a Makefile, hence the second step of running make.

You can specify command line flags to configure, and ./configure --help will give a complete list of them. Flags may enable or disable optional dependencies. An important flag is --prefix, which allows you to specify installation in an alternative location. By default, it is /usr/local, which will put the binaries in /usr/local/bin, libraries in /usr/local/lib, config in /usr/local/etc, etc. But perhaps you do not want to install the software as root. You can specify --prefix=/home/myuser/mystuff, and then the binaries would be in /home/myuser/mystuff/bin, etc.

The Makefile also contains an install step. By default, it will usually install in /usr/local, which requires root privileges, and why you need to run it with sudo (unless you are already in a root shell). But if you specified an optional prefix that your normal user owns, you can run make install without sudo.

I will not describe how to use Autotools in your own software; there are now better systems. But you want to know it when you see it.

CMake

Meson

Tags: