systemd Services: An Introduction
Nearly every modern Linux distribution uses systemd
, which … pretty much manages the system.
It has an enormous wealth of features, many of which we will describe, but we will begin with
its service manager.
A systemd
service is basically a unique application that can be controlled separately and has
its own resources. That can include things that run in the background on the system, such as
NFS drive shares, printing services, or DBUS. It can also include the things you actually
want the system to do: Your web server, database server, and your own application server.
Prior to systemd
, services were started and stopped by init scripts, which were typically written
in the sh
shell language. The services set to start at boot, or at a given runlevel, were set by
symbolically linking the script into a certain directory. The system would then fairly blindly run
the scripts in the appropriate directory for the runlevel it was entering. It worked, and a lot of
people liked it that way, but systemd
is so much better!
systemd
services are not written in a shell language, but using a simple configuration language
based on the .ini
file format. The file is grouped into sections, such as [Unit]
, [Service]
,
and [Install]
. Inside each section there are many possible configuration parameter names and their
values, separated by an equals sign.
These files are called unit files, and services are only one type of them. There are also timers
(like the old Unix crontab
but more powerful), socket activations, targets with which you can group other
units, and more. We’ll cover each of these, starting with services here.
When systemd
starts up, it reads all the unit files, calculates their dependency graph, and figure out
exactly what needs to be started and in which order. The fact that it has an internal representation of what
is needed to start the system allows for some very useful querying tools.
Let’s have a look at perhaps the simplest possible service:
[Unit]
Description=A Test Sleep Service
[Service]
ExecStart=/usr/bin/sleep 15
If you want to try it out (and you should!), go ahead and create this file as /etc/systemd/system/sleep.service
.
Keep in mind that you will need to do this and everything in this article as the root
user. It is essential
to remember that after you add or edit a systemd
unit file, you must reload the daemon for it to take effect.
Use this command:
systemctl daemon-reload
systemctl
provides us the tools we need to interact with the service manager.
The Life Cycle of a Service
Before doing anything else, let’s query the status of the service:
systemctl status sleep.service
You should see something like this:
○ sleep.service - A Test Sleep Service
Loaded: loaded (/etc/systemd/system/sleep.service; static)
Active: inactive (dead)
This tells us the unit file was loaded, but the service is not active. How about starting it?
systemctl start sleep.service
Run the status command again, and you should see that it’s running:
● sleep.service - A Test Sleep Service
Loaded: loaded (/etc/systemd/system/sleep.service; static)
Active: active (running) since Mon 2024-08-19 21:15:28 MDT; 10s ago
Main PID: 432509 (sleep)
Tasks: 1 (limit: 9247)
CPU: 1ms
CGroup: /system.slice/sleep.service
└─432509 /usr/bin/sleep 15
This tells you a number of things. The service is active
and it tells you the time it was started,
and even that it was 10 seconds ago. There’s the main Process ID and its name Then there’s Tasks
,
which is basically processes and kernel threads. On the next line is the amount of CPU time the processes
have consumed. Since the service was started 10 seconds ago, this obviously isn’t wall time. The CPU
is good at going to sleep or doing something else when the process doesn’t need it, and in fact that’s
exactly what the sleep
command does. Then we have the CGroup. A CGroup is a named cluster of processes
that are related. Resource limits can be set for the whole group. This just gives you the name of the CGroup,
and then the tree of processes under it. In this case, sleep
is the only process.
If you check the status again after the 15 seconds have elapsed, it should look similar to how it looked
before you started the service in the first place. When the process exits, systemd
will mark it inactive.
You will also see the last few journal lines for this service. In this case it will tell you when it started and when it stopped.
Aug 19 21:15:28 micahpi5 systemd[1]: Started sleep.service - A Test Sleep Service.
Aug 19 21:15:43 micahpi5 systemd[1]: sleep.service: Deactivated successfully.
They’re 15 seconds apart, as they should be!
Besides active and inactive, systemd
can also mark a service as failed. Go ahead and create this as
/etc/systemd/system/failure.service
.
[Unit]
Description=An Absolute Failure of a Service :'(
[Service]
ExecStart=/usr/bin/false
Reload the daemon and start the service. Then check its status:
× failure.service - An Absolute Failure of a Service :'(
Loaded: loaded (/etc/systemd/system/failure.service; static)
Active: failed (Result: exit-code) since Mon 2024-08-19 21:54:11 MDT; 3s ago
Duration: 3ms
Process: 435264 ExecStart=/usr/bin/false (code=exited, status=1/FAILURE)
Main PID: 435264 (code=exited, status=1/FAILURE)
CPU: 1ms
Aug 19 21:54:11 micahpi5 systemd[1]: Started failure.service - An Absolute Failure of a Service :'(.
Aug 19 21:54:11 micahpi5 systemd[1]: failure.service: Main process exited, code=exited, status=1/FAILURE
Aug 19 21:54:11 micahpi5 systemd[1]: failure.service: Failed with result 'exit-code'.
A result of exit-code
means the process, /usr/bin/false
, exited with a non-zero status. Recall that
Linux considers a zero return a success and any other number to be a failure. The false
command’s entire
reason for existence is just to return 1! (There is also a true
command that returns 0.) It may seem
silly, but it can be pretty handy at times – including for testing things like this!
There are other statuses as well, but we’ll leave it here for now.
Service Types
An important parameter in the service file is Type=
, which we haven’t used yet. The default is simple
.
The different types largely determine at what stage systemd
reports the service as being active. One
reason this is important is that systemd
supports unit dependencies, with the ability to start a unit
only after everything it depends on is successfully started. For that to work well, it matters when a unit
is marked as started. You don’t want to do it too soon, before any necessary checks and initializations
are complete.
The simple
type marks it active immediately, even before the new process is executed. Of course, as we’ve seen,
it can still be marked as failed later. But it’s better to not go active in the first place, so use of simple
is not recommended.
The exec
type waits until the new process has started. Let’s take a look at the difference in the log. Here
is the unit file, note that the executable doesn’t exist:
[Unit]
Description=Just A Test
[Service]
Type=simple
ExecStart=/usr/bin/blahblah
We can view service logs with the journalctl
command, like this:
journalctl -u test1.service
When it’s started, we get this:
Aug 21 21:26:33 micahpi5 (blahblah)[465987]: test1.service: Failed to locate executable /usr/bin/blahblah: No such file or directory
Aug 21 21:26:33 micahpi5 (blahblah)[465987]: test1.service: Failed at step EXEC spawning /usr/bin/blahblah: No such file or directory
Aug 21 21:26:33 micahpi5 systemd[1]: Started test1.service - Just A Test.
Aug 21 21:26:33 micahpi5 systemd[1]: test1.service: Main process exited, code=exited, status=203/EXEC
Aug 21 21:26:33 micahpi5 systemd[1]: test1.service: Failed with result 'exit-code'.
When the type is changed to exec
and the daemon reloaded and the service is started again, we get this:
Aug 21 21:27:14 micahpi5 (blahblah)[466024]: test1.service: Failed to locate executable /usr/bin/blahblah: No such file or directory
Aug 21 21:27:14 micahpi5 (blahblah)[466024]: test1.service: Failed at step EXEC spawning /usr/bin/blahblah: No such file or directory
Aug 21 21:27:14 micahpi5 systemd[1]: Starting test1.service - Just A Test...
Aug 21 21:27:14 micahpi5 systemd[1]: test1.service: Main process exited, code=exited, status=203/EXEC
Aug 21 21:27:14 micahpi5 systemd[1]: test1.service: Failed with result 'exit-code'.
Aug 21 21:27:14 micahpi5 systemd[1]: Failed to start test1.service - Just A Test.
See the difference? With simple
, we see the serviced was “Started”. With exec
, it was merely “Starting”.
Now we get to the forking
type. Consider this C program:
#include <stdio.h>
#include <unistd.h>
int main(int argc, char *argv[]) {
printf("Do initialization and checks ...\n");
if(fork()) {
printf("This is the main process started by systemd. A child process is running. We'll just exit.\n");
return 0;
}
printf("Now we're in the child process, ready to do real work.\n");
return 0;
}
[TODO write article on Linux process creation, link here.] Recall that fork()
is a syscall that will create a new
process with most of the same properties as the current process, and executing at the exact same point. In the original process it will return the process
ID of the new child, causing the if
condition to be true. The parent process will then just print a short message
and exit.
Meanwhile, the child process receives a 0
from fork()
, so the condition will fail there, and flow will continue
to code below the block. It is there we would start processing requests or whatever it is we want the service to
really do.
When the type is forking
, systemd
will simply mark the service as active when the parent process exits with a
successful status, because it expects that the child process will be living on and doing work.
This worked well in the old days, when services were started by init scripts written in Bash, and you may need to
use it for legacy servers that are only written to operate in this way. However, systemd
has far better methods
of receiving notification that a service is ready, so forking
should not be used for new services.
That brings us to the notify
type. systemd
provides a C library with many funcctions, one of which is sd_notify
.
Calling that can notify the system that the service has successfully started and is now ready to begin processing.
There are a number of messages that sd_notify
can send to the service manager. The string READY=1
will tell the service
manager to mark the service as active. Suppose we have this Python script at /usr/local/bin/notify-test.py
, with its execute
permission bit set:
#!/usr/bin/python
import time
from systemd import daemon
time.sleep(5)
daemon.notify("READY=1")
print(daemon.booted())
time.sleep(10)
And this systemd unit at /etc/systemd/system/notify-test.service
:
[Unit]
Description=Notify Test
[Service]
ExecStart=/usr/local/bin/notify-test.py
Type=notify
After starting it, in the first five seconds, you’ll see it in Activating status:
● notify-test.service - Notify Test
Loaded: loaded (/etc/systemd/system/notify-test.service; static)
Active: activating (start) since Wed 2024-09-25 21:15:22 MDT; 1s ago
Main PID: 592982 (notify-test.py)
Tasks: 1 (limit: 9247)
CPU: 22ms
CGroup: /system.slice/notify-test.service
└─592982 /usr/bin/python /usr/local/bin/notify-test.py
The systemctl
command waits for that five second point when the notify is sent, then it exits, and the service status changes to Active:
● notify-test.service - Notify Test
Loaded: loaded (/etc/systemd/system/notify-test.service; static)
Active: active (running) since Wed 2024-09-25 21:15:27 MDT; 1s ago
Main PID: 592982 (notify-test.py)
Tasks: 1 (limit: 9247)
CPU: 22ms
CGroup: /system.slice/notify-test.service
└─592982 /usr/bin/python /usr/local/bin/notify-test.py
That brings us to the dbus
notification type, which is somewhat similar to notify
, except that the service manager listens
on DBUS, which we will discuss in a separate article.
You may also want a service unit that simply runs a program and then exits, but you may want the service manager to note
that it is active and that other dependent units should be started. That is the purpose of the oneshot
notification type.
Say we have this service:
[Unit]
Description=Oneshot Test
[Service]
ExecStart=/usr/local/bin/oneshot-test.py
Type=oneshot
RemainAfterExit=no
And this Python script:
#!/usr/bin/python
import time
print("Hello from Oneshot Service!")
time.sleep(10)
If we start it up, it will be in an “activating” mode for the 10 second delay, then it will exit and be inactive.
But suppose we change the last line in the service to RemainAfterExit=yes
. Then, it will still be “activating”
for that 10 seconds, but after the program exits, it will be active!
● oneshot-test.service - Oneshot Test
Loaded: loaded (/etc/systemd/system/oneshot-test.service; static)
Active: active (exited) since Wed 2024-09-25 22:31:26 MDT; 1s ago
Process: 595423 ExecStart=/usr/local/bin/oneshot-test.py (code=exited, status=0/SUCCESS)
Main PID: 595423 (code=exited, status=0/SUCCESS)
CPU: 21ms
This allows us to have the service marked as active, even though the program isn’t actually running anymore! This is very useful when you just need to have some program or action execute once, successfully, before some other action is triggered. There are several such services included with a typical Linux distribution. TODO examples
Targets and Dependencies
systemd
services can depend on other services, and targets can help group them together.
Let’s say we have an application server that requires two different microservices and some setup work.
[work in progress]