Troubleshooters.Com® and djbdns Intro Present:

Daemontools Intro

Career Skills nobody else teaches


CONTENTS:

Introduction

With the impending Linux switch to systemd, daemontools by djb (Daniel J. Bernstein) assumes a new relevance. Daemontools can do a lot of what systemd can do, in one tightly encapsulated, easy to understand, ingeniously simple package. daemontools will never replace systemd, or any other init software, because it doesn't natively handle dependencies like "don't start Apache until the DNS server is started". At least you can't do it without some serious kludging. However, if you're anything like me, many of your daemons are rather independent.

Every daemon you move out of systemd and into daemontools makes systemd a little less complex. Every daemon you move out of systemd and into daemontools is another daemon whose log files you can read in text mode, natively. Every daemon you run out of daemontools is a daemon you can control with the svc command and examine with the svstat and svok commands, whether on systemd-mandatory Red Hat, systemd-verboten Funtoo, or any BSD. Except for starting daemontools itself, daemontools works the same in any Unix like environment.

As a matter of fact, even though the first sentence of this chapter mentioned systemd as a motive, it's also true for all the init systems. Most init systems are like a 21st century car, so crammed full of smog equipment, computers, and other geegaws, that you need to have specialized tools and training to do the most mundane maintenance on them. daemontools is more like a 1959 Plymouth with a three on the tree and a flathead six: everything's visible, everything's easy to get to with only an editor, and it just works!

Of course, not everyone wants a 1959 Plymouth, and not everyone wants daemontools. The person wanting to use Linux as a more secure, no-cost Windows replacement wouldn't want to monkey around with daemontools. As a matter of fact, this person probably would have no objection to systemd. Or any other init system (they're all bad, I think systemd is worse, but that's just my opinion.) If you don't care about the ability to take the metaphorical wrench to your computer, this web page has nothing for you, and you'll lose nothing by stopping at this point.

So, if you've read this far, I'll assume you're interested in maintaining your own computer system...

With daemontools, if you want to write your own daemon, you can write it as a normal program, and then just incorporate it into daemontools to daemonize it, with logs and restarts and control and everything. You can daemonize pretty much any executable, even one never made to be a daemon. You can daemonize a shellscript. Once again, this isn't just running it in the background: It's controlling it with the svc command, examining it with svstat and svok, and having its every line to stdin and stdout sent to a log file whose format you control.

One other point worth looking at. With the impending systemd, and the proclamations of those evangelizing it, there's been an uptick in discussions of the Linux Philosophy. Personally, I'm a big fan of the Linux Philosophy, and it's my opinion that djb's software takes the Linux Philosophy where it's never been before. Djb uses the Unix filesystem as a natural source of hierarchy, for configuration, grouping, and lots of stuff. Djb's use of a file's name and contents to represent a key->value pair is classic. Eric Raymond says we should put software's complexity in the data, not the algorithm. Looking at the filesystems djb's software uses is like reading a document on the software. You look at the filesystem, it makes sense, and you can even guess to a large degree what you're going to find in his code. Your mileage may differ, but I consider daemontools not only a ruggedly simple monument to practicality, but also a work of art.

Enough introduction. In the words of Mark Antony, I came to explain daemontools, not to praise it. So let's continue...

Steve Litt is the author of the Universal Troubleshooting Process Courseware. Steve can be reached at his email address.

A Few Points on Installation

For the most part, djb's installation instructions are excellent, always assuming you're not installing it from a package manager (and I recommend you don't). I have just a few additions.

Danger: Watch Out!

You can untar daemontools-0.76.tar.gz into any directory you want. But do not, I repeat, do not rename that directory after you've done your first package/install. If you were to rename it after that first package/install command, daemontools would silently and mysteriously fail to work. The build actually queries for the current directory and writes it into various compile files.

First, daemontools won't compile, as is, on most Linux systems. The problem is that djb defines errno as extern int errno, but most compilers define it by including errno.h, and on these compilers djb's code fails to compile. If daemontools is the only djb software you're going to be using, just do this from within the whatever/admin/daemontools-0.76 directory:

wget http://djbware.csi.hu/patches/daemontools-0.76.errno.patch patch -p1 < daemontools-0.76.errno.patch wget daemontools-0.76.errno.patch package/install mv daemontools-0.76.errno.patch /safe/place/4storage/

There's an alternate way to get it to compile by modifying the compile/conf-cc file, but I'm not going to discuss that.

If you use a lot of djb software, you probably know this, but if you've forgotten, read this page about patches from the djb way patch page.

Steve Litt is the author of several books on Rapid Learning. Steve can be reached at his email address.

About Directory Locations

Read djb's daemontools installation instructions and its obvious he has strong opinions on where everything should be located. And yet, as far as I can see, the only three locations hard coded into djb's code are /service, /command, and servicedir/log. He says you should create a /packages directory, into which to untar daemontools-0.76.tar.gz, but the fact is, you can put that anywhere, as long as the directory is owned by root and chmoded 1755.

Then there's the question about where your service directories will reside. I don't mean /service, I mean the directories you symlink to from within /service. You can put them anywhere. For security's sake, they should be owned by root, and there should be a way of backing them up as data so you can walk them over to a new computer and drop them onto it.

Most distros hate putting new directories, like /service and /command, right under the root. I agree with them. However, I don't hate it enough to do what the distros do and change the locations for /service and /command by changing the code (and I think the only change might be to svscanboot, but still, no).

Bottom line, as long as you compile as in djb's daemontools installation instructions, leave /service and /command where they are, unzip daemontools-0.76.tar.gz into a directory owned by root and don't move that directory after doing your first package/compile, and remember the errno patch, it should be pretty easy to install a working daemontools.

Steve Litt is a the author of Escape From Kmail. Steve can be reached at his email address.

daemontools Service Hello World

This chapter walks you through getting a tiny service running. It's a proof of concept daemontools service implementation, with minimal features that, while learning, would be distractions. What you'll do in this chapter is make a shellscript that, once every second, writes the yyyymmdd:hhmmss time both to a file in the /tmp directory, and to stdout. This shellscript runs in the foreground and has no daemon-like properties. Then you'll call this shellscript, from a daemontools implementation, to make it into a daemon, controllable by the svc command, and observable with the svstat and svok commands. You'll watch the file in the /tmp directory as you start and stop the daemon with the svc command.

In this Hello World, assume the following:

Therefore, when you perform this Hello World, change every instance of "slitt" to your username, and every instance of "mydesq2" to your hostname.

Make and test the shellscript

Make a directory under the /home/slitt directory called daemhello. This arbitrarily named and placed directory is owned by slitt. Create the following shellscript, called print_timestamps.sh, in that newly created directory:

log=/tmp/junklog.log
sleeptime=1
while /bin/true; do
  mydate=`date +%C%Y%m%d_%H:%M:%S`
  echo $mydate >> $log
  echo $mydate
  sleep $sleeptime
done

Set the file readable and executable by all, run it, and you'll see it count time second by second. On a different terminal, perform a tail -fn0 /tmp/junklog.log, and you'll see that it's writing to that file also. Before trying to daemonize this shellscript, make sure it works right as a foreground program.

Build the daemontools Service

This section walks you through building a daemontools service for the shellscript you just made. Do all this section's work logged in as root. Later we'll discuss daemontools running a daemon as a normal user, but the setup work should be done as user root, regardless of the user daemontools uses to run the program.

The name we give the service is arbitrary. In this case, we'll call it "hello". As you remember from earlier, all service directory trees are built under the /scratch/service directory.

Manipulate and Monitor Your New Service

You manipulate your service with the svc command. You monitor your service with commands svstat and svok. Let's dispense with svok right now: It outputs nothing, and simply returns 0 if the service is working properly, or non-zero otherwise. It's beyond the scope of this document, and once you've finished this document, you can read djb's brief daemontools documentation to learn exactly how to do it.

The svc command is used to manipulate your daemon, by sending signals to it. This command is performed as user root, from within the /service directory. The following is an example:

svc -d hello

The preceding command downs (stops) the hello daemon. The following is a table of svc arguments, their meanings, and their signals:

Arg Action Signal
-u Start (up) -
-d Stop (down) TERM, then CONT
-t Restart if running TERM

The preceding are the commands I use all the time. Many, many more arguments to the command are explained at http://cr.yp.to/daemontools/svc.html.

With one terminal running a tail -f /tmp/junklog.log, use the three svc command previously listed and note their effect on the output. You can turn your daemon on and off at will.

Steve Litt is the author of The Key to Everyday Excellence. Steve can be reached at his email address.

Daemontools Mental Model

Here's the graphical Mental Model for daemontools:

Diagram of Mental Model for daemontools

Please keep in mind that "Connector A: Service" is really this service's run script, or whatever binary executable, shellscript, or Python/Perl/Ruby/Lua script the run script assigned to its PID via its exec command.

Mental Model Narrative

Please keep referring Mental Model Diagram as you read this narrative.

So here's how the system works. By whatever means is necessary, depending on the init system your computer is using, the system's boot runs /command/svscanboot. /command/svscanboot, in turn, runs both /command/readproctitle and svscan /service. /command/readproctitle is simply a debugging aid that gets run, and will be discussed in other chapters in this document.

svscan /service is the top-level mechanism of daemontools. Every five seconds (approximately), it scans the /service directory(assuming /service was the command line arg passed to svscan) looking for symlinked directories, and for each symlinked directory, and runs that symlinked directory's run command if it's not already running.

Note:

The preceding paragraph is an oversimplification, and its overlooked details are discussed later in this chapter.

The svscan program loops forever, doing two things each time:

  1. Scan /service finding all symlinked directories
  2. For each symlinked directory found, runs the supervise program on the directory, if the directory didn't already have a supervise process running. The most likely reason it didn't have a supervise process is because it wasn't symlinked the last time.

The supervise program runs and stops the run shellscript, based on the contents of the service directory's supervise directory tree.

One supervise program per service directory and log

Daemontools gives each service its own supervise process. If the service has a daemontools-hosted log system (highly recommended), then it has an additional supervise process to control that log.

The exact mode of the supervise program's communication between the run script and the supervise directory is too complex for me to figure out without spending a lot of time reading djb's source code, and as far as I can see it's not documented anywhere. But based on the way it works and the higher level documentation that exists, I can tell you this: the supervise directory contains info telling whether the service should or should not be running, and if it should, then the supervise program runs the run script, even if the run script terminated. Also, the supervise program allows the root user to control the run script or the program it execs using the svc command, as follows:

-u Up Start nonrunning daemon
-d Down Stop running daemon
-t Term Restart running service, do nothing to a stopped one
-x Exit supervise terminates as soon as the run pgm or its exec'ed successor does. Usually used in conjunction with -d.
-o Once Run a stopped service exactly once, don't restart if it terminates.

Note:

The preceding args to the svc program comprise a partial list of the ones I consider useful on a regular basis. To see the entire list, see http://cr.yp.to/daemontools/svc.html.

Getting back to the Mental Model, the supervise program runs or stops the service directory's run script, or whatever that script exec'ed in its place, as appropriate and/or dictated by a svc command issued by the root user.

As stated earlier in this document, the Mental Model diagram's "Connector A: Service" block is really this service's run script, or whatever binary executable, shellscript, or Python/Perl/Ruby/Lua script the run script assigned to its PID via its exec command.

That's it. Reread this chapter a few times, while referring back to the Mental Model diagram, and you should have a good enough idea of the inner workings of daemontools to troubleshoot it.

Steve Litt is the author of Thought Patterns of the Troubleshooting (and Debugging) Ninja. Steve can be reached at his email address.

Elementary Troubleshooting

If you haven't already, get familiar with daemontools' Mental Model before proceeding. Once that's done, your first step in troubleshooting a daemontools problem is to find out what's running and what's not. Referring to the Mental Model, that narrows things down a lot. For instance, if svscan isn't running, you probably have something wrong with your basic daemontools installation; perhaps your system's not starting it on boot. If svscan is running but the if service for the service under investigation (perhaps called myserviced) isn't, then, depending on whether svc -u myserviced succeeds or fails to run its supervise process, it was either just a temporary problem to be watched for in the future, or a basic failure to launch that service.

If your service's supervise process runs, but the program it exec's isn't running, your exec probably failed. Try running the exec'ed program at a normal command prompt. If it doesn't run, fix it. If it does run, then there's something about the environment daemontools bestows on it that's causing problems: A problem that can be investigated with the xtermd service.

Here are some typical ps commands to see what's running and what's not:

If you find yourself with no running svscanboot, you can run it as root from a terminal like this:

csh -cf '/command/svscanboot &'

Note:

There are probably a million other ways to run this at a command prompt, but I chose it because it's listed on djb's page at http://cr.yp.to/daemontools/start.html. This same page tells you how to have daemontools start, upon boot, for many different init systems, although systemd isn't one of them.

It should also be noted that, if svscanboot isn't being started at boot, you can put the preceding command in /etc/rc.local on non-systemd machines.

So now you've seen what's running and what isn't. Excellent! Looking at the Mental Model, it's obvious that you've considerably narrowed the area where the root cause resides. The next thing to do is use svstat and svc commands to investigate further. Assuming your service is called myserviced. Perform the following command a few times, as user root, from the /service directory:

svstat myservice

The preceding command tells you up to five pieces of information:

  1. The service dir name (I guess in case you forgot what you typed)
  2. The current state (either up or down)
  3. The PID (skipped if the service is down)
  4. The number of seconds it's been in its current state
  5. Its normal state (skipped unless its current state is different from its normal state)

Check the results of svstat against your info gleaned from ps, and if necessary see what happens when you try to toggle its state with svc -u or svc -d.

One more thing. If successive runs of svstat show the service being up, but at a different PID every time, and when this happens the seconds up is usually very short, like a one digit, your service is either failing to exec the necessary command, or the necessary command isn't running or is crashing right away. Make sure both the run script and the program to be exec'ed are permissioned executable. If so, one easy way to narrow it further is to back up the program to be exec'ed, and then, replace that file, using its exact name, with the following permissioned executable shellscript that does nothing except write to a file in the /tmp directory every second, and then perform a svc -u.

#!/bin/sh
while /bin/true; do
  mydate=`date +%C%Y%m%d_%H:%M:%S`
  echo $mydate >> /tmp/junklog.log
  sleep 1
done

Warning:

Before running the preceding shellscript from an exec from a daemontools run script, erase any existing /tmp/junklog.log file, then run it the preceding shellscript from a regular command prompt to make sure it works. Perform a tail -f on /tmp/junklog.log to make sure it's being written to. Before running it from daemontools, make sure to once again erase /tmp/junklog.log.

If the constant rerunning symptom goes away, then the problem was in your original exec'ed program or in the environment that program was passed. If it keeps happening, then it's something in the run script itself. Be sure the run script is permissioned executable, and that its shebang line (#!/bin/sh or whatever) matches the configuration of your system.

If the problem appears to be in the exec'ed program or the environment it's passed, first debug it at a normal command prompt, and once it runs correctly there, debug it at an xterm bestowed by the xtermd service, exporting any necessary environment variables, and when you find one of those necessary environment variables, also add it within the service's ./env directory. Once you match the service's environment variables in ./env with those you exported in the xtermd terminal, it should run the same way from the xtermd terminal and from daemontools itself. Or at least, that's the way things should go.

If you've gotten this far and the problem still isn't solved, review the Daemontools Mental Model, and familiarize yourself with the Landmines and Gotchas chapter, because at this point it's probably covered in that chapter. Problems with state are especially difficult to troubleshoot, and if you're not aware of problems with Python 3's stderr implementation, you can chase your tail for hours trying to find it.

Steve Litt is the author of The Key to Everyday Excellence. Steve can be reached at his email address.

Using Daemontools Logging

Our hello service writes its own log to /tmp/junklog.log, but sometimes you're daemonizing someone else's code and don't want to modify it to write a log. No sweat, daemontools can write a timestamped line to a log file every time a daemonized program writes to stdout or stderr. This chapter adds logging to the hello service. Do the following, logged in as user root:

Steve Litt is the author of Twenty Eight Tales of Troubleshooting. Steve can be reached at his email address.

Setting the Service's User

Lots of daemons are dangerous or insecure if run as root, so daemontools includes a method to run the daemon as any user you want. Before discussing this, let's prove that so far, it's running as root.

First things first: At the beginning of this document, you ran the /home/slitt/daemhello/print_datestamp.sh as user slitt, and if you did so, /tmp/junklog.log is owned by slitt. So, with the daemon running as described in the Using daemontools Logging chapter, erase /tmp/junklog.log and take another look:

rm -f /tmp/junklog.log
sleep 2
ls -l /tmp/junklog.log

You should see that it's now owned by root. Now do the following, as user root:

cd /service
svc -d hello
cd /scratch/service/hello
cp -p run run.new

Warning!

Be sure the file you edit is run.new! If you were to edit the original, and the daemon restarted in the middle of the edit, the half-edited file would almost certainly produce errors: Possibly data-harming errors. The correct procedure is to edit run.new, and when you're all done editing it, rename it to the original run filename.

Now change the run.new file by adding setuidgid commands to run the command as the user named in the first argument. In other words, the first argument is the the user you want the program to run as, and the rest of the arguments are the command. Be careful though, sometimes you need to enclose the command in quotes if the command contains punctuation like tildes or pipe symbols or single or double anglebrackets. The following is the new code:

#!/bin/sh
setuidgid slitt echo Starting hello
setuidgid slitt "echo Starting hello at `date` >> /tmp/junklog.log"
exec setuidgid slitt /home/slitt/daemhello/print_timestamps.sh

In the preceding, note the doublequotes around the command that writes /tmp/junklog.log file. These are needed because of the redirection. If those quotes weren't there, it would write /tmp/junklog.log as root, which means that if /tmp/junklog.log didn't exist at the time, it would be created by root, in which case the print_timestamps.sh program, operating as slitt, couldn't write the /tmp/junklog.log file, and there would be a readproctitle error saying it the write failed.

Now finish it up:

mv run.new run
cd /service
svc -t hello
ls -l /tmp/junklog.log

If /tmp/junklog.log is still owned by root, remove it, wait two seconds, and the newly created file should be owned by slitt. If not, troubleshoot.

A couple facts to consider: This procedure did not write the log files as user slitt. That requires similar modifications to /scratch/service/hello/log/run, and to change the ownership of the main directory and the current file. The exact steps are left as an exercise for the reader.

Also, the setuidgid program runs the program as the user and the user's primary group, but it drops the user's auxilliary groups. If the file gets its mojo by membership in several groups, this won't work. There are workarounds, but they are beyond the scope of this chapter.

That's it. You ran the daemon as a different user.

Steve Litt is available to select clients to personally teach the Universal Troubleshooting Process Course. Steve can be reached at his email address.

Setting Environment Variables

I can think of several ways in which a daemon can acquire information:

Most of those are addressed from the command itself, or from hard-code or defaults in the program itself. The exception are signals and environment variables. Signals are trivial if you know the PID. That leaves environment variables.

I suppose your run script could assign and export environment variables before exec'ing the actual daemon, but djb thought of a better way. He lets you create a directory containing key->value pairs consisting of environment var names and values. For each environment var in this directory, use the filename as the environment var name, and its contents as its value. So, for instance, you could make directory env in your hello directory, and within it put file MYPREFIX, containing the string mypfx. You're half way there.

Before walking you through the other half, to showcase this environment variable, it needs to be in the shellscript that you've daemonized, /home/slitt/daemhello/print_timestamps.sh. The following shows the modified shellscript, with the additions highlighted:

#!/bin/sh
log=/tmp/junklog.log
sleeptime=1
while /bin/true; do
  mydate=`date +%C%Y%m%d_%H:%M:%S`
  echo $MYPREFIX $mydate >> $log
  echo $MYPREFIX $mydate
  sleep $sleeptime
done

The only change is that every line printed to /tmp/junklog.log and stdout are preceded by the contents of $MYPREFIX. This should show up instantly in both the daemontools log and in /tmp/junklog.log.

Now the shellscript to be daemonized actually showcases $MYPREFIX. The final job is to make a small change to the run script, adding envdir to the line that invokes the shellscript. Remember, cp -p the script to run.new, do the edits there, and then when complete, mv run.new run. In the following listing, the added envdir is highlighted:

#!/bin/sh
setuidgid slitt echo Starting hello
setuidgid slitt "echo Starting hello at `date` >> /tmp/junklog.log"
exec setuidgid slitt envdir ./env /home/slitt/daemhello/print_timestamps.sh

Warning

Sometimes the mixture of changing the user and using environment variables fails. Typically, when that happens, the output of the following command gripes about the env directory or the supervise directory under it:

ps ax | grep readproctitle

In that case, use the envuidgid command, like the following:

exec setuidgid slitt envuidgid slitt envdir ./env /home/slitt/daemhello/print_timestamps.sh

After performing the preceding steps, you need to shut down the service, unlink it from /service, delete the env/supervise tree, and re-link the service.

You've done it, but you'll probably see no change from your tail -f. To see the change, as root, perform the following steps:

cd /service
svc -t hello

If the steps detailed in this chapter don't prepend the word "mypfx" to every line in /tmp/junklog.log, troubleshoot.

Steve Litt has written a large collection of Lyx documentation. Steve can be reached at his email address.

Landmines and Gotchas

By its very nature, a daemon controller, and that includes daemontools, is a black box. This creates some landmines you need to avoid, or else repair the damage. These landmines and gotchas can cost you hours if you're not aware of them.

Python 3 stdout/stderr Buffering

In Python 2, stderr was unbuffered, as it should be, so that whether you're looking at a screen or at an error file, you can instantly see errors. In Python 3, stderr is line buffered to the terminal, but block buffered if redirected to a file (which apparently is how daemontools logs work). Not only that, but Python's -u flag, which in Python 2 set both stdout and stderr to unbuffered, in Python 3 appears to do nothing. I have found absolutely no way to make stderr either unbuffered or line buffered when redirected to a file. What this means is that if you're depending on stderr out of Python, you're out of luck. Messages from stderr might come in a huge bunch ten minutes after the occurrences that spawned them, or they might end up left in the undrained buffer and never transferred to the daemontools logs. I also read, and I don't even remember where, that later 3.x versions have fixed this problem. Ummm, you need to write for all 3.x versions, so this is locking the barn after the horses escape.

What I've done is eliminated all writes to stderr in Python, and to write strings, I've created a function something like this:

def errprint(msg):
 print(msg)
 sys.stdout.flush()

Unfortunately, sys.stderr.flush() doesn't perform the corresponding function for writes to stderr.

I've read a lot of contradictory and anecdotal information that there's a new way to do I/O in Python3, but until there's a consensus, I'm not messing with it.

$XAUTHORITY and $DISPLAY

This might seem obvious, but it's easy to forget: If you intend daemontools to control any program that might spawn a GUI program or dialog box, you need to have $XAUTHORITY and $DISPLAY set correctly. If your directly or indirectly daemontools-spawned GUI programs fail to run, make sure these environment variables are set. Instructions for doing so are provided in the Pro Troubleshooting Move: The xtermd Service chapter later in this document.

Unknown Environment

When daemontools runs a daemon, it passes that daemon sparse, if any, environent variables. $PATH is a very necessary environment variable, as is $HOME. We already discussed environment variables $XAUTHORITY and $DISPLAY. Many daemon failures can be tracked to wrong or missing environment variables. To minimize your chance of problems, at a minimum, set the following environment variables before spawning:

In the preceding, it's obvious that $USER and $HOME must match the user set, within the run script, by setuidgid.

And if there's any chance the daemon might spawn anything GUI, including little graphical warning/error boxes, set these two also:

envdir and envuidgid

This will nail you if you forget it. If your run script's exec line changes the user to a non-root user, that user won't be able to read the contents of the env directory. No problem, accompany your envdir command with envuidgid. The following is an example:

exec envuidgid slitt envdir ./env setuidgid slitt  /d/at/python/littcron/littcron.py /d/at/python/littcron/crontab

Yes, as a matter of fact, I did make a crude cron replacment out of Python and daemontools.

log Is the Only Hard Coded Directory

log is one of the very few "reserved words" in daemontools. Daemontools is hard coded to treat a subdirectory called log under the daemon's directory as used for logs. The same is not true for env or any other commands: You need to specify the environment directory with the envdir command, and it can be put anywhere. ./env is just a custom.

You Need to Manually Start Logging

I figured that because log is hard coded into daemontools, logging for a service would be started as soon as you start the service. But I was wrong. The following is how you start both the service and its log:

svc -t myserviced myserviced/log

If you forget the part about the log, the log won't start. This is true of all svc commands.

Eliminating Your Service the Wrong Way

If you try to eliminate (not just stop, but decommission) a service the wrong way, you risk getting that service into a state such that it won't come back up again without a lot of state deletion. So this section describes the right way to eliminate a service.

There's a very specific way to de-install a service, to turn it off and kill any daemons or subprocesses it was running. In English, you first unlink it from /service, and then you shut down the running programs. Here it is:

root@mydesq2:/service# cd hello
root@mydesq2:/service/hello# rm -f /service/hello
root@mydesq2:/service/hello# svc -dx . log
root@mydesq2:/scratch/service# cd /service
root@mydesq2:/service# ps ax | grep hello | grep -v readproctitle
18478 pts/4    S+     0:00 grep hello
26334 ?        Ss     0:00 gvim hello/run
root@mydesq2:/service#

In the preceding, notice that you first delete the /service/hello symlink, so that when you turn everything off, daemontools doesn't restart them. Then you turn off the daemons, safe in the knowledge that daemontools will laave them off.

The stuck-bad supervise directory

This gotcha has caused me more frustration and lost time than any other. What needs to be understood is that daemontools stores a lot of state that survives svc commands and even /service directory unlinkings. I think it might even survive reboots.

So if you do something wrong with your daemontools, perhaps manually performing a svscan on a directory, or manually killing a process with a kill command, or lots of other seemingly harmless stuff, your service gets itself in a state where you can't do anything with it, no matter what all you try. And because daemon-runners are all rather opaque, you go on, getting more and more frustrated, and possibly never suspecting that the root cause is state that you can and should erase.

The state is contained in various supervise directories, in the daemon directory, and in its log and env directories. A dead giveaway is if the following command contains gripes about "couldn't access supervise:

ps ax | grep readproctitle

If you see errors like that, first be sure your actual program runs correctly in the foreground of a regular terminal, as the intended user. Then, run the same command in your xtermd spawned xterm, and transfer any needed config to the service that's not working. When you're finally pretty sure that dropping all state would probably cause the service to run, do the following steps:

  1. cd /service/mydaemond
  2. rm -f /service/mydaemond
  3. svg -dx . log
  4. cd /scratch/service/mydaemond
  5. rm -rf log/supervise
  6. rm -rf env/supervise
  7. rm -rf ./supervise
  8. cd /service
  9. ln -s /scratch/service/mydaemond /service/mydaemond
  10. sleep 5; svstat mydaemond mydaemond/log

The preceding procedure should clear all state, and start the service cleanly. If the service is OK and all that was wrong was state, your daemon should start running properly, unless, of course, file mydaemond/down exists, in which case you'd need to issue a svc -u or svc -o command to get it going.

Messing Up Your Original Program

As you spend more and more time and energy making diagnostic changes, it gets more probable that one of those diagnostic changes renders your original program dysfunctional. If you're not regularly running your original program from a regular terminal, as opposed to from daemontools, this kind of problem can look like any other, causing hours of troubleshooting. Never assume, without test, your original program, the one that gets exec'ed by the run command, still works. Test that assumption every few minutes. In the words of Ronald Reagan, "Trust, but verify."

Steve Litt is the originator of the Universal Troubleshooting Process. Steve can be reached at his email address.

Pro Troubleshooting Move: The xtermd Service

When a program runs successfully at the command prompt, but fails when run from daemontools, you need a way to peer inside and see what's going on. One excellent way to this is the xtermd service described in this chapter. It runs an instance of xterm from daemontools, with the environment bestowed upon it from daemontools, so that you can test a command in a daemontools provided environment. If it doesn't run there, you can find out why, perhaps some environment variables, and when you find them, you can put them in /scratch/service/xtermd/env directory. Once you get the command working there, you can copy the environment variables to the malfunctioning service's env directory, and it will probably fix the problem. Here's how you build it, as user root:

  1. cd /scratch/service
  2. mkdir xtermd
  3. cd xtermd
  4. mkdir log
  5. cd log
  6. Create the following run command:
    #!/bin/sh
    echo Starting xterm service
    exec envuidgid slitt envdir ./env setuidgid slitt  /usr/bin/xterm
    
  7. Create the following log/run
    #!/bin/sh
    exec 2>&1
    exec multilog t ./main
    
  8. cd env
  9. Create the following files in env
    • DISPLAY
    • XAUTHORITY
  10. In the preceding step, simply perform an env command in a normal terminal for the non-root user, and copy the values of the environment variables to the contents of each of the environment files in env.
  11. cd /service
  12. ln -s /scratch/service/xtermd /service/xtermd

If everything went right, within five seconds, an xterm window should appear, with your non-root user, available for you to use.

Once you have the terminal, you can try running various commands on it, and see what else needs to be done for the commands to run correctly. It might take an added or changed environment variable, or a slight change to the command invoking the program. Naturally, before running a command from your xtermd service, you should have verified that the program runs properly on a regular terminal in the foreground.

If a program fails from within your xtermd spawned terminal, but runs from a regular terminal for that user, experiment with exporting various environment variables, to see which of those might be necessary to run correctly. The following is a list of likely culprets:

If the problem is fixed by changing $PATH, you might wish to use the changed path, or to use the full path of executables in the command. The former makes it easier, makes commands shorter, and tends to be more portable across machines having similar paths. The latter makes the environment simpler and exposes less failure modes.

Anyway, once you find out what environment variables it takes, or what command changes it takes, to run the command when spawned by xtermd, you can implement those environment variables in the real service's ./env directory, or implement the command changes in the real service's run command.

Note:

The $DISPLAY and $XAUTHORITY environment variables are necessary for a daemon to launch an X program, such as xterm. The rest of the environment variables may or may not be necessary to run a specific command. Certainly it's a good idea to have $HOME set correctly for the user.

Limiting Your xtermd Rerunning

By the very nature of a daemon launcher, within seconds of your closing your daemontools-launched xterm window, another one appears. You can avoid this as follows:

  1. cd /service
  2. touch xtermd/down
    • The preceding command changes this service from "normally up" to "normally down".
  3. svc -d xtermd

After the preceding commands, your xtermd service is down unless you specifically run it, and it does not start on boot, because of the existence of the down file. To get a daemontools-provided xterm window without it constantly restarting, use the svc command's "run once" option:

svc -o xtermd

Your daemontools-provided xterm window provides you a great tool to peer inside what's really happening to your program, inside the daemontools enclosure. As mentioned before, make sure your program works properly from a normal terminal before doing this, or you'll be chasing your tail all day.

Steve Litt is the originator of the Universal Troubleshooting Process. Steve can be reached at his email address.

Transitioning a Daemon From Your Current Init

Most of the verbiage in a typical init shellscript is already handled by daemontools' svc command. The only thing left to do is handle any dependencies. Dependencies can be handled two ways:

  1. Loop until what the app depends on is up and running
  2. Start what the app depends upon from the app's service itself

#1 is more traditional, and in a lot of ways easier. It can get hairy if there are several dependencies, and can stall if one of the dependencies has not, and won't be, run.

#2 still requires detection of running dependencies so they're not run twice, unless you hack daemontools to keep track of what's running. Or, if both service and its dependencies are controlled by daemontools, the daemontools provided svok and svstat commands makes detection very easy. For instance, consider the following shellscript command (call it is_svc_up.sh):

#!/bin/sh
cd /service
if ! svok $1 > /dev/null; then
  return 111
fi

STRINGA=`/usr/local/bin/svstat $1`

if echo $STRINGA | grep -qs "$1:[[:space:]]*up[[:space:]]"; then
  return 0
fi

if echo $STRINGA | grep -qs "$1:[[:space:]]*down[[:space:]]"; then
  return 1
fi

return 2

You could use the preceding is_svc_up, within a run script that actually runs its dependencies, like this:

if ! ./is_svc_up.sh depsvcd; then 
  svc -u depsvcd
fi

Of course, ultimately you'd still need to test for all dependencies. Anyway, there are a million things you could do. You could make a shellscript called run_dependency_if_not_running.sh that would return success or failure, but then you'd block on that particular run. Perhaps better to run them all and then check. There are a million ways you could do it.

And of course, the less you make one daemon dependent on another, the more robust your system will be.

So basically, to move a daemon from your current init system to daemontools, you:

  1. Read the existing script or unit file or whatever defines your daemon's startup
  2. Translate any dependencies to simple shellscript checks
  3. Make your daemontools service directory

Steve Litt is the author of Rapid Learning for the 21st Century. Steve can be reached at his email address.

Daemontools is Not Alone

Daemontools is only one of many daemon runners/ available, some of which are built expressly to be run as PID 1. Here's a very partial list:

Looking at the preceding list, several things pop out. First of all, djb inspired all but one of them. Reading about their goals, their construction, and the way they work. The djb inspired ones are far more similar to each other than, for instance, sysvinit, upstart and systemd are to each other.

Their priorities seem to be work simply, work reliably, work securely, and then get the hell out of the way. Not one of them has anything to say about how authentication takes place: Other software does that. Not one of them has hooks for GUI desktop environments: Other software can do that. If they use libraries or tools at all, those libraries and tools are very simple and specific. Several of these init systems can pretty much replace each other, without bringing the whole operating system to its knees. None of them tempts any application programmer to write to their specifics, nor to require them specifically as a dependency. Compare these priorities to the seeming priorities of other init alternatives.

Listen, I'm not telling you to try to use any of these as your PID 1. The Red Hat marketed systemd juggernaut has already pretty much closed that door for you on most distros not requiring compilation for every install. All I'm saying is look how daementools works, and maybe try something like s6, runit, perp or nosh on an experimental machine, as PID 1, just for fun.

Steve Litt oversees content at Troubleshooters.Com. Steve can be reached at his email address.

djb's Gift to Us

I don't know djb. I've never spoken to him. My knowledge is confined to using his programs: daemontools and djbdns. Look at his code. Oh yeah, he uses quite a bit of abstraction, and he uses more global variables, macros and callback functions than I do, he uses very short variable names, and he's obviously a much better programmer than I. But the code is clean, and once you understand his abstraction, it's readable. Most of his functions are less than 30 lines, a lot less than 15 lines, and functions and structs are named well enough to understand what they're doing. Daemontools' biggest source file, besides the Makefile, is multilog.c, weighing in at 13898 bytes. Not lines, bytes. wc tells me it's 617 lines of code. Makefile is 510 lines, according to wc.

But it's a lot more than his code -- that's just style, security and efficiency. It's about what he does with his code. While other programmers look around for yet another library to stop them from "reinventing the wheel", djb finds himself in a Unix environment, with a hierarchical filesystem, fifos, pipes, signals, the whole shebang, and uses what Unix gives him. That makes his stuff easy to understand, even without reading his code, makes it likely to compile most places, and makes his stuff just work, consistently.

Everyone who uses Unix, Linux or *BSD should look at daemontools. Understand what it's doing, and why it does things that way. Extrapolate: What other seemingly difficult tasks could be done that way? Is there some Unixism, just laying around waiting for you to pick up, that you could use to save yourself a ton of work, and maybe avoid linking in YAL (Yet Another Library) that could crash, leak memory, create version conflicts, or have a problem with other software?

Read what Laurent Bercot wrote about djb. That just about says it all, doesn't it?

Lately, in the Free Software world, I often hear the phrase "we don't have a choice." Look at djb's software, starting with daemontools. Examine the source code. Read the docs. Look at the way it operates, including its use of everything Unix provides.

You'll find yourself with a lot more choices.

Steve Litt is the author of several human performance books. Steve can be reached at his email address.

Wrapup

Most Linux distributions' mandatory switch from init to systemd has led to a substantial backlash, some of which is based on principle (lack of modularity, going over-scope, rejection of Unix Philosophy), and some based on results. At this point it's likely many systemd complaints based on results aren't caused by systemd. There are additional objections about the way the systemd rollout has been handled: from a Linux standpoint, and from a distribution standpoint.

This has led many to seek fallback plans in case systemd turns out to be a huge mistake, and one fallback representing a very minor change is to move most daemon control from systemd to a daemontools started by systemd. Beyond the systemd connections, many who have used daemontools believe it to be exemplary sofrtware that does one thing and does it well. This document walks you through the setting up of a daemon demonstrating the most common daemontools tactics:

The preceding is pretty much everything you need to know. Obviously, removing a daemon from the control of your current init system and placing it in control of daemontools is more complex, but this intro gives you the basics you need in order to do the job.

For those wanting to completely replace their systemd, there are several alternate init systems:

I can't promise that application developers won't gratuitously use parts of systemd, thereby making their software unavailable to someone not using systemd, but one does his best.

Steve Litt is the author of Troubleshooting: Just the Facts. Steve can be reached at his email address.