Troubleshooters.Com®, Linux Library, and DIY Linux Present:
How Linux Boots
Copyright © 2015 by Steve Litt
See the Troubleshooters.Com Bookstore.
Contents:
Linux booting, which covers everything from Kernel instantiation up through running a Desktop Environment, has gotten much more complex during the past 15 years. Linux is the DIY OS, and to Do It Yourself with Linux sometimes requires getting involved with the boot process. So you must understand the boot process in order to boot your computer to the state you desire -- a state your Linux distribution might never have forseen.
This document attempts to be neither minutely thorough nor absolutely accurate. Instead, this document serves as a medium level overview of the boot process, sufficient so you can intervene in the boot to accomplish what you need.
This document doesn't cover the very beginning of the boot: Power On Self Test (POST), MBR, boot loader, or UEFI boot system. UEFI is much too new for me to cover accurately. Suffice it to say that the Linux part of the boot begins when the bootloader or the UEFI boot system or something else runs the Linux kernel. For the purpose of this document, I'll call whatever runs the kernel "the bootloader".
This document focuses on the boot sequence of most major Linux distributions running on full sized hardware. Embedded systems will probably vary considerably from this document. Even so, this document should give you certain boot foundational principles that will help you work with embedded systems and other systems not using this exact boot system.
The Linux part of the boot starts when the boot loader (grub, lilo, etc) or the UEFI boot system runs the Linux kernel, and passes it the following information:
The following diagram is a high-level simplification of the Linux boot process:
Please refer to the preceding diagram while reading the boot process narration in the rest of this section.
After the boot manager runs the kernel with the specified information, the kernel does lots of initialization, runs various kernel processes, and then, if there's an initramfs file specified by whatever ran it, the kernel uses that initramfs file's in-memory image as a ram-disk root directory and runs that root directory's init program, (usually /init for initramfs-created roots), within the kernel. If you have a choice as to what to put in initramfs, try to keep it as simple as possible, and have it run as few processes as possible. A ram disk root filesystem, changeable only through a reboot, decoding, editing, encoding, and a second reboot is extremely time consuming to troubleshoot.
Initramfs and its predecessor, initrd, were seldom necessary in the old days, because early boot ram disks were seldom necessary. In those days you could count on having boot-necessary programs in /sbin or /bin, and you could count on those directories not being symlinks or mount points. Same with /etc, which was never a symlink or mount point. In the old days, once you knew both the root partition and the partition containing the kernel (if /boot was a mount point), all necessary boot commands and information were locatable and accessible.
These days, /sbin and /bin are often symlinks to /usr/bin. In such cases, if /usr is a mountpoint, then until /usr is correctly mounted, the boot-necessary commands in /sbin and /bin are unnecessary. This includes commands necessary to mount /usr. This is a buried shovel, a Catch 22, easiest resolved by booting to a RAM disk that has its own copy of the boot-necessary commands.
Even /etc is a mountpoint in certain edge cases. So the kernel needs quite a bit of help finding things. That's the main purpose of initramfs. The initramfs' init program might also run some processes that must be run before the on-disk init program.
After the kernel program executes the initramfs init program, the initramfs init program passes control to the on-disk init program, which becomes PID 1. Because init programs vary so much from each other, from this point forward, the rest of the boot can take a few forms. Typically, PID 1 spawns a few programs. One of those programs might be a daemon manager.
The preceding was an overview. The rest of this document fills in some details. While reading this doc, always feel free to refer back to the diagram of the process near the top of this section.
Note:
I know little about bootloaders, less about Grub, and still less about Grub2. So most of this section is guesswork not sufficient for troubleshooting or design, but sufficient to demonstrate that by the time it launches the kernel, Grub2 has all the information the kernel needs.
There are many, many bootloaders. This document uses Grub/Grub2 for examples only because they're the most common.
The bootloader must have enough information to:
#1 consists of:
It's instructive to look at some lines from my Debian Wheezy /boot/grub/grub.cfg:
insmod part_msdos insmod ext2 set root='(hd2,msdos1)' search --no-floppy --fs-uuid --set=root ee74d4ba-ae3a-4c2c-9752-1c82a4c63656 echo 'Loading Linux 3.2.0-4-amd64 ...' linux /vmlinuz-3.2.0-4-amd64 root=UUID=2598ea36-258d-480f-b1a7-eae244962526 ro noquiet nosplash echo 'Loading initial ramdisk ...' initrd /initrd.img-3.2.0-4-amd64
Let's take a moment to examine specific lines of the preceding:
set root='(hd2,msdos1)'
I can only guess what this is. My guess is that the preceding identifies hd2, which is the third disk device (/dev/sdc in other words), as the device containing the boot MBR or GUID.
search --no-floppy --fs-uuid --set=root ee74d4ba-ae3a-4c2c-9752-1c82a4c63656
All I can tell you about the preceding is that "ee74d4ba-ae3a-4c2c-9752-1c82a4c63656" is the partition that will be mounted to /boot.
linux /vmlinuz-3.2.0-4-amd64 root=UUID=2598ea36-258d-480f-b1a7-eae244962526 ro noquiet nosplash
The preceding is called the "kernel line" because it identifies the kernel. In the kernel command, obviously the kernel's filename is vmlinuz-3.2.0-4-amd64. The only vmlinuz-3.2.0-4-amd64 on my computer is in the /boot directory, which, from the previous "search" command, is UUID "ee74d4ba-ae3a-4c2c-9752-1c82a4c63656". So I surmise that it uses the information from the previous "search" command to find the kernel.
What makes this confusing is the "root" part of the kernel command identifies UUID "2598ea36-258d-480f-b1a7-eae244962526", which, it turns out, is the actual root filesystem of the soon to be booted OS.
If you find these three different uses of the keyword "root" confusing, you're not alone. If I were going to design a bootloader, I'd use more semantic keywords.
One more thing: This kernel line doesn't specify a name and location for the on-disk init program. The kernel searches down a predefined list of name/locations for the kernel, and one of the first searched is /sbin/init. If you had wanted to specify a different init program, you could have added "init=/mydir/myinit" to the line. By the time this information is interpreted, the whole file hierarchy is all mounted, so this location is taken literally by the OS.
initrd /initrd.img-3.2.0-4-amd64
The preceding identifies the initramfs file as /initrd.img-3.2.0-4-amd64. The only /initrd.img-3.2.0-4-amd64 on my computer is in the /boot directory, a mountpoint for the "ee74d4ba-ae3a-4c2c-9752-1c82a4c63656" partition that the earlier "search" command showed would be mounted to /boot, so apparently the /initrd.img-3.2.0-4-amd64 is relevant to partition UUID "ee74d4ba-ae3a-4c2c-9752-1c82a4c63656".
Without commenting on the three different uses of the "root" keyword, and the presumed-partition in the "initrd" line, it's pretty obvious that the preceding lines gave us the location and filename of the kernel itself, of the initramfs (which here is called "initrd" for historical reasons). The kernel doesn't need a name/location for the on-disk init program, as long as that init program exists at one of a short list of possible locations. In other words, at this point, the bootloader has all the information necessary to run the kernel and then pass it all necessary information.
The Linux kernel is pretty much an interface to all things hardware and all things low level. It runs any time your operating system is running, and if the kernel were to stop running, your computer would cease to function until rebooted. Beyond this, the kernel plays a special part in the boot process, because it's what your bootloader runs.
When the bootloader first runs the kernel, no drives have yet been mounted. This means that the computer can't find the initramfs or the computer's init program, because it doesn't know how to find /sbin/init. This is because the kernel doesn't know whether /sbin is just another directory off the root, or whether it is a mountpoint, or, as is the new custom, it's a symlink to /usr/bin. The same is true of the initramfs file. Not being sure whether /etc is a normal directory, a mountpoint, or a symlink, it might not even be able to run /etc/fstab. The bootloader must pass enough information to the kernel so it can get everything organized.
The kernel's capable of searching some sane defaults, but at a very minimum, your kernel needs to know:
One of the things the kernel does is mount the initramfs ram disk as the root directory (/). Then the kernel goes down a list of probable locations for the initramfs init program. One of those locations is /init, which is traditionally where it's located. So the first initramfs init program found is run.
I'm not sure what the kernel does if there's no initramfs: I haven't seen such a case in years. The two initramfs inits I've examined in detail, that of Debian Wheezy and Manjaro OpenRC edition, pass control directly to the on-disk via the switch_root command.
The kernel keeps running to serve as an interface between various programs, hardware, and very low level capabilities. But once the kernel has run the initramfs init program and/or the on-disk init program, the kernel's role in the boot process is finished.
Except for one other thing: Modern Linux kernels start bootup processes in parallel, and it's possible or even likely that the kernel keeps running bootup processes after it's passed control to one of the init programs. It's this fact that is used as the justification for the new breed of "event driven" or "socket activation" init systems. Nevertheless, for the most part, once it hands off to an init, the kernel's role in boot is greatly diminished: The show has moved on.
This document is an approximation, meant for for conceptual use. It's designed to be used as a sort of block diagram or overview simplification, so there are probably errors on low level details. If you go deep enough, no reference other than the source code will suffice.
But at all the higher levels, I have a feeling you'll find this document very helpful.
To reiterate what's been said before, an initramfs file is a gzipped cnew format cpio of a ram disk destined to be your machine's root directory, before any hard disk partitions are mounted. It enables you to boot to a ramdisk-hosted OS, from which you can load any drivers necessary to mount your root partition, mount any other necessary partitions, on your way to a hard-disk hosted OS.
There was a time when most systems didn't need an initramfs (or its predecessor, initrd). Back then, there were no encrypted root drives, no LVM disk systems, and few people used RAID. Any necessary drivers to mount drives were compiled into the kernel. And last but not least, you could always count on /sbin, /bin, and /etc to be real directories off the root: Not mountpoints or symlinks. Back then, the minute you'd mounted your root partition, you had access to all the bootup programs you needed in /sbin, /bin, and you had guaranteed access to /etc/fstab in order to mount everything else.
We live in better times.These days, volume encryption is a must in many situations. LVM means you can grow any "partition" by dropping in more drives. With customers demanding formerly unattainable uptimes, RAID is a necessity to keep running when individual drives go bad. Life has improved.
But not all changes are improvements. More and more distros feature /sbin and /sbin as a symlinks to /usr/bin, and of course if the /usr directory is a mountpoint, that mount has not taken place at the beginning of the boot, so your commands aren't available. Personally, I'm a big fan of being able to boot at least to a functional virtual terminal after mounting only the root partition, with boot-basic commands including ls, sh, mount, umount available. If these programs are in /usr/bin, and /usr is a mountpoint, then the only way to have these programs available before mounting /usr is to have them available in the pre-mount ramdisk.
Note:
To me, the symlinking of /bin and /sbin to /usr/bin is a change with few benefits and the significant cost of requiring initramfs ramdisks in every use case. From a diagnostic and DIY standpoint, these symlinks are a bad idea. We all like modern, but in my opinion, the guys who created Unix knew what they were doing, and we should at least pause to think before undoing what they did. To me, change for change's sake is just fashion.
I've had friends whose /etc directory was actually a mountpoint to an NFS mount. How do they handle the chicken and egg in which you have no NFS configuration until you've started the NFS service to access /etc? Once again, start it up on the ram disk by running the ram disk's /init program.
From a DIY perspective, I believe initramfs ram disks, and the /init program they contain, should be as simple as possible. A few design constraints, such as keeping /etc, /bin, and /sbin as ordinary directories, would go a long way toward eliminating the need for an initramfs, or at least a complex one.
Possibly Useful Kludge
You could copy the programs from the initramfs' /bin and /sbin to their counterparts on the root partition. That way you'd have the necessary programs before mounting /usr, so if you're not using any special drivers to mount the root partition, you could theoretically boot without the initramfs.
Anyway, in typical situations, the final thing an initramfs' init program does is hand off control to the switch_root or run-init command.
As mentioned, typically the final thing an initramfs' init program does is hand off control to the switch_root or run-init command. This section discusses only switch_root, because that's what I'm familiar with. I'd imagine run-init could be made to accomplish the same things.
Look at the man page for switch_root, and you'll see that at a high level it does the following:
The on-disk init system will do at least the following:
The preceding list of commonalities notwithstanding, init programs come in all shapes and sizes, to the point where some bear almost no resemblance to others. Here's a partial list of init systems available on Linux:
If you were under the impression that sysvinit, Upstart and systemd were the only three init systems, you've been listening to too many systemd arguments. Anyway, runit, S6, and nosh are all init systems that also handle full daemon management (with respawning), using methods similar to daemontools. They can all be said to be "daemontools inspired."
Epoch is a simple, consecutive daemon manager capable of both management and single shot runs. It's like the old VW Bugs: Laughably unfeatured, but it runs every time, and a mere mortal can maintain it, without special tools or a lot of effort.
Suckless Init is a pure init with absolutely no daemon management. All it does is run the bin/rc.init that you write, and then hang around to catch signals for poweroff and reboot, each of which does its thing via a shellscript you write. Its source file is 89 lines of C, and the way you'd almost certainly use it is to have bin/rc.init" do a little light setup and then call daemontools, daemontools-encore, or maybe OpenRC to handle processes.
I've neither tried nor examined busybox-init, but I've heard great things about it in terms of simplicity.
OpenRC doesn't do PID 1 activities, requiring something like sysvinit to perform the PID1 tasks. OpenRC is a concurrent launching single-shot daemon runner with outstanding documentation and init scripts rivalling the difficulty of sysvinit.
sysvinit is decades old, and uses /etc/inittab as its config file, together with all sorts of complex init scripts. I'm pretty sure it runs daemons and processes consecutively rather than concurrently, which could create long boots on systems running several tens of daemons and processes.
Upstart is a concurrent booting, event driven PID1 plus process manager, made almost a decade ago to address the deficits of sysvinit.
Systemd is a concurrent booting, event driven init system plus much more, tightly binding all sorts of init-irrelevant functionalities to itself, making DIY difficult. In fact, on most distributions it's impossible to run Gnome without systemd, and the Devuan project is making a program called vdev to replace the udev program that now works only with systemd.
Daemontools (and daemontools-encore) are not init systems and don't run as PID1, but they do process management very well, such that you could use even the simplest init and then pass the baton to daemontools (-encore), which would manage all your processes extremely well. Daemontools (-encore) is well documented, easy to understand, easy to DIY, and easy to bolt to a truly minimal PID1. The DIY Linux user should have an active knowledge of daemontools and daemontools-encore.
As mentioned earlier, init systems either perform process management, or offload process management to a different program. Before discussing process management and process managers, a few definitions are required in order that everyone is on the same page and the subject can be discussed intelligently...
Be very, very careful here. A lot of white hot flamefests, with the parties telling each other they don't know squat, stem exclusively from the fact that the two parties are using different definitions of these words. On 4/24/2015, Wikipedia defines "daemon" as "a computer program that runs as a background process, rather than being under the direct control of an interactive user."
There are numerous ways a computer program (process) can be running in the background. A user can run it, as a command typed on a terminal, and end it with the ampersand sign (&). This puts the command in the background. It can be run from the Cron system, which always puts programs it runs in the background. Some programs put themselves in the background using a technique called "double fork". And some process managers, namely, the daemontools and the other inits and process managers it inspired, take any program meant to run in the foreground, and run it in the background
The preceding paragraph scratches the surface of the terminology problem. Some people call any process running in the background "a daemon", while others insist that only programs that put themselves in the background are daemons. Some folks say that when daemontools runs a normally-foreground program in the background, it has "daemonized" the normally-foreground program. Others contend that a program can be "daemonized" only by itself.
So the argument starts when two people don't realize they have different definitions of these words, and then transforms to an argument about the correct definition of the words. Ugh! Now you know why I try very hard to stay far away from the words "daemon" and "daemonize" in discussions. Everyone understands the phrase "process running in the background." More words, but less opportunity for misunderstanding and argument.
Daemontools has inspired process manager daemontools-encore, as well as init systems runit, s6, and nosh. All of these programs have this in common: They take an ordinary program and run it in the background. Let me give you an example...
I wrote a Cron replacement in Python, partially for fun and partially because I wanted a cron that was personalized for my way of doing things. My cron replacement is a very simple program that loops around, gets the time, consults its program launch schedule, and runs programs when it's time to do so. My cron replacement doesn't put itself in the background: If you run it on a terminal, you'll see it spinning around writing messages to stderr, and it works just fine that way. But of course I don't want to remember to rerun this program every time I reboot, and I don't want screen real estate consumed by the terminal that ran it. So I have daemontools run it, and daemontools puts it in the background. Daemontools intercepts all those writes to stderr and logs them with timestamps. Daemontools offers me very simple commands with which I can start and stop my cron replacment.
In fact, daemontools and those it inspired draw their control power from the fact that they run the process in the foreground, so the process is their child, so they have control over it. That's why there are simple tools with which I can start and stop the process. Daemontools and those it inspired have a special kludge with which they can often be used to manage programs that insist on putting themselves in the background, but this kludge decreases daemontools' control, so daemontools recommends that if the program has an option to run in the foreground, such as ntpd -d or sshd -D, you use that option in the daemontools run script.
To the best of my knowledge, process managers not inspired by daemontools insist of managing a process already in the background. For programs that automatically put themselves in the background, you just specify the program. For programs that can't put themselves in the background, you can often use an ampersand on the end of the program name. Generally speaking, these process managers preserve the process' PID on disk (usually on /var/run/programname.pid), so that they can go back and send signals to it.
As mentioned earlier, an init program must be PID 1 for the life of the Linux session, it must remain alive to catch uncaught signals and interrupts, it must be able to catch poweroff and reboot signals, and last but not least, it must run other processes. The running of other processes is the Process Management part of the init.
An init might do its own process management, or it might offload it onto a process manager, or in some cases a little of both. Here are a few use cases:
Almost every general purpose computer must perform certain actions in order to function as a general pupose computer. These include:
Every one of the preceding could be done in the on-disk init's process management, but usually 1 and 2, and sometimes 3 and/or 4 are done in the initramfs init. From 5 on are almost always done as part of the on-disk init's process management. Here I'm defining process management as running process, whether run-once or respawn.
If 7 is not done, the computer boots to a virtual terminal, from which you can run your GUI desktop with the startx command.
Most init programs are very handy in the hands of a DIY person expert with them. One popular thing to do is to install a second, alternative init, in addition to the one that came with the distro. So you can choose which init to boot with just by changing the init= phrase in the kernel line of your bootloader config file. This is handy for troubleshooting problems suspected of being caused by the default init system, and it's also popular as a way to bust back into a machine that hangs or crashes during normal init.
Another plus of adding an alternative init system is as a replacement for systemd. As mentioned, systemd is much more than an init system, and tends to act like a machine that's welded together instead of bolted together. As a DIY person, you probably prefer bolted together, due to your preference for easy interchange of parts.
By modifying your init system's configuration, you can make your machine boot right to GUI, or boot to CLI requiring startx. You can determine which and how many virtual terminals to make available. Virtual terminals are the CLI terminals you get when you press Ctrl+Alt+F2, Ctrl+Alt+F3, etc. By following the logic of your init system, you can determine how to start, stop, activate and deactivate processes. By augmenting your current init with daemontools or daemontools-encore, you can move much of the process management where it's easy, while keeping your init doing mostly PID 1 type things. This makes DIY and troubleshooting much easier.
The following diagram summarizes a Linux boot from a very high level:
This is important to the DIY person because you can change the entire complexion of your machine by changing elements of the boot. For simplicity, you could get rid of the initramfs file after changing some file locations around. Or, you could add very hard to implement boot features by adding functionality to the initramfs. You could even, especially with kiosk type use cases, even run all initialization from initramfs' init, and end up with the root directory as a ram disk.
By changing your init system or tuning the init system you have, you can make an incredibly simple machine with no udev, no NetworkManager or WICD, and a no frills window manager. Oppositely, you could throw in everything and the kitchen sink, creating a computer that uses every last bit of its hardware resources.
This stuff is important now. For two decades we've taken boot and init for granted, and recent events have caught us unable to troubleshoot problems in the New World of Linux. Whether you want a full featured machine without obnoxious bugs, or a trimmed down OS that respects configuration by editor, either way, now you need to know how the boot process works. After all, the opposite of DIY is HIDTY.
[ Training | Troubleshooters.Com | Email Steve Litt ]