Compiling and Running C and C++
You’ll be coming across a lot of software written in C and C++, and it’s useful to have a primer on how to build and run it. This is true even if you’re not writing your own software in these languages. There are various build systems for these languages, and we’ll need to take a quick look at all of them.
Invoking gcc
or clang
manually.
For projects with a single source file, or perhaps a small number of them, it is fine to invoke the compiler manually.
If you want to play with this, see the article What You Need To Know About C and
creating the three files test.h
, test.c
, and main.c
in an empty directory.
Note that there are two widely-used open source C/C++ compilers in the Linux ecosystem.
- GCC is a compiler from the Free Software Foundation, originally written by Richard Stallman in the late 1980s,
and licensed under the GNU General Public License. It is an acronym, originally standing for GNU C Compiler, but now
officially standing for GNU Compiler Collection, since it supports more than C and C++. It is invoked with the command
gcc
. It is highly portable and prioritizes supporting nearly any microprocessor family with any significant use. - Clang (C Lang) comes from the LLVM Project. It is licensed under the much more permissive Apache license.
Its development was largely driven by Apple, as it dislikes the copyleft nature of the GPL. It is invoked with the command
clang
. It supports fewer processor architectures than GCC but still supports all the ones that are largely relevant to Linux application servers, namely x86_64 and ARM64 (often called aarch64), as well as a couple others you’re less likely to need.
Either compiler should work just fine in building any software using established features of either language. They sometimes differ in their support of cutting-edge new language features.
The following examples use GCC, but both compilers accept similar basic options, and replacing gcc
with clang
will work just fine.
If you run:
gcc test.c main.c
ls -l
you’ll see that you have an additional mysterious file, a.out
. You’ll notice also that it’s executable. If you examine it with the
file a.out
command, you’ll see that it’s
a.out: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=af5f4a20130ef46d2e0d6f8a0448864e8f05cac2, for GNU/Linux 4.4.0, not stripped
if you’re on an x86-64 computer, or
a.out: ELF 64-bit LSB pie executable, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, BuildID[sha1]=f406d8d44875f4bc40caf86d9ca6a7f4a6cb6536, for GNU/Linux 3.7.0, not stripped
if you’re on ARM/aarch64. You can go ahead and run it, with ./a.out
and you’ll see the output
I'm in test!
I'm in main! Value is 11
It is our multi-file C program!
Note for a single-file program, of course, just provide the single filename.
Maybe you don’t like the name a.out
, which is based on very old history of an executable format called “a.out”, which was used by Linux in the 1990s and
predates the current ELF format. You can give the executable any name you want:
gcc main.c test.c -o testprog
Now we have testprog
instead of a.out
; otherwise, they’re identical.
What if we only specify one of the C files, instead of both?
gcc main.c
Well that didn’t go as well as we may have hoped!
/usr/bin/ld: /tmp/cc2xpDzo.o: warning: relocation against `shared_val' in read-only section `.text'
/usr/bin/ld: /tmp/cc2xpDzo.o: in function `fun_in_main':
main.c:(.text+0x6): undefined reference to `shared_val'
/usr/bin/ld: /tmp/cc2xpDzo.o: in function `main':
main.c:(.text+0x34): undefined reference to `shared_val'
/usr/bin/ld: main.c:(.text+0x42): undefined reference to `fun_in_test'
/usr/bin/ld: warning: creating DT_TEXTREL in a PIE
collect2: error: ld returned 1 exit status
You’ll note a reference to fun_in_test
. That is a function that is defined in our test.c
source file,
which we did not include in this compilation run. In main.c
we include the header test.h
, which declares
the function. But the linker has no idea what to do with it. It can’t make an executable program without
filling in that important gap!
What’s the linker, you ask? That is the program /usr/bin/ld
, which you saw in the output above, and it is
responsible for taking object files emitted by the compiler, and putting them together into an executable.
In many cases, the compiler calls it automatically. But you can run it yourself too. Let’s check out the
object files and how we get them:
gcc -c test.c main.c
Now you’ll see two new files in the directory, test.o
and main.o
. Running file main.c
will show that it is
ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
. These files contain the compiled binary
code from our source files, but not in a way that can be directly executed. And, they contain references to things
that are defined in other files. To create an executable, it is necessary to provide all the required object files
to the linker. We can give it a try:
ld main.o test.o
Uh oh:
ld: warning: cannot find entry symbol _start; defaulting to 0000000000400120
ld: main.o: in function `fun_in_main':
main.c:(.text+0x20): undefined reference to `printf'
ld: test.o: in function `fun_in_test':
test.c:(.text+0x10): undefined reference to `puts'
What is this? You’ll recognize printf
as a function we called in the program. It turns out that puts
is also a
function provided by the C library, likely called internally by printf
. So ld
is not seeing our C library! But
that’s easy to fix:
ld main.o test.o -lc
The -l
option tells the linker to find a library with the following name. In this case, we just want libc
(libraries
always start with lib
) so it’s just -lc
. Now, we have our a.out
file again!
The compiler runs the linker automatically with the -o
option, or if the -c
option (meaning compile to object file only)
is not specified in the command line. It also supplies -lc
to the linker automatically. However, if we use another
library, we may need to specify it ourselves. The compiler itself accepts the -l
parameter, and just passes it on
to the linker.
Other GCC/Clang options to know about:
-g
: Adds debugging symbols. If you want to run your compiled program through a debugger such as gdb
, you want it to
contain debugging information. This lets you see variable and function names in their appropriate places, instead of just
raw pointer values. It increases the size of the executable by a little bit.
-static
: Puts everything required by the executable in the executable file itself. Normally, Linux executables use
dynamic linking to libraries. This is usually great; it allows the system to share library code on disk and in RAM instead
of duplicate it for every executable. Say we run gcc test.c main.c -o prog
to compile our program. Its size is about
15k on my system. We can use ldd prog
to see what it’s linking against:
linux-vdso.so.1 (0x00007ed140b4e000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007ed1408fd000)
/lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007ed140b50000)
The first, linux-vdso
, is a virtual library provided by the kernel and mapped into all processes. The second, libc
,
is the main thing we’re actually linking with. It contains all the main functions provided by the C library. The last,
ld-linux
, is used by the dynamic linker itself.
Now, what if we gcc test.c main.c -o prog -static
instead. Our program
is now over 750k on disk, and the ldd prog
output simply shows not a dynamic executable
. libc
is actually included
in the executable! This might be handy, for example, if you want an executable that can be run on absolutely any
sort-of-recent Linux system without having to worry about which library versions are installed.
-O
: Optimization control. -O0
turns off optimization and is recommended with -g
. -O2
enabled many optimizations;
-O3
enables even more (though some of those may increase executable size in a tradeoff to get faster runtime execution).
-Os
optimizes for the smallest possible executable size. There are others; you can see the man pages for gcc
or clang
.
-march=
: Tells the compiler which processor architecture to target. For example, -march=broadwell
would generate code
that may contain processor instructions that exist only in the Intel Haswell (or later) CPUs; the program may fail to execute
at all on earlier CPUs.
-Wall
: Turns on
Makefiles and the make
command
An old but simple build system that automates much of the monotony of running the compiler directly is the Makefile. It
basically lists out targets and what to do if they are invoked. It will often contain a hierarchy of commands telling
it what to do with certain kinds of files. For example, for .c
files, it will run gcc
with a set of options defined
in a variable. Then, with .o
files, it will invoke the linker to produce an executable.
make
checks file modification times to only recompile or do other operations if needed. For example, if test.c
has
a later modification time than test.o
, then it will recompile it; if not, it will use the existing object file. This
saves a lot of time in large projects where only certain files were modified.
It also allows parallel builds with the -j
option, specifying the number of jobs to run at the same time. This is a huge
timesaver on builds with many source files. It is often recommended to use the number of CPU cores you have, plus two.
So if you’re on a six-core system you might use -j8
. This is often ideal because there can be some CPU idle time in
individual processes.
Here is a very simple Makefile for our example program with two C files:
CC = clang # or gcc if you want
CFLAGS = -O2
LDFLAGS =
OBJFILES = main.o test.o
TARGET = prog
all: $(TARGET)
$(TARGET): $(OBJFILES)
$(CC) $(LDFLAGS) -o $(TARGET) $(OBJFILES)
%.o: %.c
$(CC) $(CFLAGS) -c $< -o $@
clean:
rm -f *.o $(TARGET)
Note that all the indentions in a Makefile
are tabs, not spaces. Simply place this
in the same directory as the other files. Running make
will automatically see the Makefile and show
what it is doing:
clang -O2 -c main.c -o main.o
clang -O2 -c test.c -o test.o
clang -o prog main.o test.o
And we can see that the object files and the executable are now in the directory. If we run make clean
, it does:
rm -f *.o prog
and all the generated files are gone!
Note the $<
and $@
in the Makefile. In rules that translate a source to a target (which is what %.o: %.c
does),
the former is a macro that translates to each source filename, and the latter is a macro that translates to the target filename.
You can add much more sophistry to Makefiles, but we are endeavoring here to show only basic usage. There is far more information,
of course, in make
’s documentation. That will be the same with the other tools we describe below. This article is intended to be
a summary of what you need to know to deal with common situations you will run into.
GNU Autotools
Many open source programs for Linux use GNU Autotools to build and install. The existence of a configure
script in the project’s
root directory is a giveaway. One can usually build and install these with the ubiquitous steps:
./configure
make
sudo make install
configure
is a script that is automatically generated by Autotools based on other files that describe what it needs to do and
the program’s dependencies. Its job is to check that the system meets the requirements of the program, find the location of
required dependencies, and flag whether optional components have the required dependencies. If it is unable to build the software
on the current system, configure
will exit with an error. If it is successful, it will generate a Makefile
, hence the second
step of running make
.
You can specify command line flags to configure
, and ./configure --help
will give a complete list of them. Flags may enable or
disable optional dependencies. An important flag is --prefix
, which allows you to specify installation in an alternative location.
By default, it is /usr/local
, which will put the binaries in /usr/local/bin
, libraries in /usr/local/lib
, config in /usr/local/etc
,
etc. But perhaps you do not want to install the software as root. You can specify --prefix=/home/myuser/mystuff
, and then the binaries
would be in /home/myuser/mystuff/bin
, etc.
The Makefile
also contains an install
step. By default, it will usually install in /usr/local
, which requires root privileges, and
why you need to run it with sudo
(unless you are already in a root shell). But if you specified an optional prefix that your normal
user owns, you can run make install
without sudo
.
I will not describe how to use Autotools in your own software; there are now better systems. But you want to know it when you see it.