Newbie's Overview of Docker
What, Why and How
Thanks go out to Chris Jones, who put on a highly educational Docker presentation at Greater Orlando Linux User Group (GoLUG) and who took the time to answer many of my Docker questions before and during the presentation. Thanks also to the Docker developers and documenters for this fine resource.
Docker is a facility for creating encapsulated computer environments, each with its own running copy of Linux. Each encapsulated computer environment is called a container. Each container (encapsulated computer environment) shares the host computer's copy of the kernel. This means there's no hypervisor, and no extended bootup. Starting up a Docker container is lightning fast.
The fact that the kernel must be a Linux kernel creates these two restrictions on Docker:
If you need to simultaneously run different operating systems (like Windows, OS/X or BSD), or run programs compiled for other operating systems, Docker can't help you: You need to do a full Virtual Machines implantation like KVM, VirtualBox or VMware.
This document you're now reading is just an overview, perhaps with a few general ideas of how to accomplish certain things. It's not a document to teach yourself Docker. For that, you need the full Docker Documentation, the root of which is located at http://www.docker.io. Useful specific docs include:
The preceding are documentation from the makers of Docker, they're excellent, and there's no substitute for reading them. The document you're currently reading is great for an overview before jumping in, but when actually performing Docker activities, you should use the official Docker documentation. If you want to have a good life in Dockerland, take the time to read the Docker docs.
There's a Docker IRC channel, called #docker, on the Freenode IRC server.
A Docker container shares a kernel with the host OS and with other containers. Other than that, the container is completely isolated from everything else, at least by default. Therefore, Docker is a great sandbox for development, testing, experimentation. But Docker's abilities as a sandbox barely scratch the surface of its capabilities and usefulness.
Docker's greater use is as a thing that can be plopped down on a computer to run an application, without any worry about dependencies. This is huge, but this requires deliberate holes in the container's encapsulation. Here are just a few of the easy ways you can poke controlled holes in a Docker container's encapsulation:
Before getting to Docker-specific definitions, the word sandbox refers to a computing environment in which what happens in the sandbox stays in the sandbox. If you were to perform an rm -rf within the sandbox, the contents of the sandbox get erased, but the containing computer suffers no damage. If you were to create a security breach within the sandbox, enabling a badguy to get in, theoretically the badguy could harm only the sandbox, not the containing computer. Of course, the practicality of the preceding sentence depends on the nature of the security breach and the knowledge and ability of the badguy, but all things being equal, a security problem in a sandbox is hugely preferable to the same security breach on the containing computer.
Now let's talk about Docker specific terminology. Life is much easier if you understand Docker terminology from the very start. The easiest way to get an initial grasp of Docker terminology is via the following (greatly simplified) diagram:
Looking at the diagram, you'll see that a container is a runtime implementation of an image. Many containers can be started from one image. As an analogy, an image is kind of like an architectural drawing of a house, and a container is a house built based on that drawing. As anyone who has driven through a housing community knows, several houses can be constructed from one drawing. Another analogy might be in Object Oriented Programming, with the image being a class and the container being an object. The thing that does all the computing is the container: An image alone does nothing except serve as a template for containers.
Looking again at the diagram, you'll see there are at least three methods of creating/modifying an image:
A repository is a network/internet connected service that contains images. Images can be pulled from a repository or pushed to it. There are Docker repositories all over the place, and it's beyond the scope of this document to tell you how to specify a repository or build your own. Suffice it to say that the docker program's commands default to the top-level repositories controlled by the Docker team, as does the Dockerfile from statement.
When you roll your own image with a Dockerfile and associated image, all files destined to end up inside the image go in the directory containing the Dockerfile (named Dockerfile), the Dockerfile specifies which image, from the repository, to base the image on, and then you create the image with the docker build command. The Dockerfile is just a text file that can be edited with any text editor.
If you need a sandbox in which to develop or test, the value of Docker is intuitively obvious to the most casual observer. Other benefits need some explanation...
On Linux (and on Windows, but that's not relevant here), every piece of software has all sorts of dependencies. The complexity of all these interlinking dependencies is so overwhelming that every Linux distribution needs a package manager to install software. Oh, of course you could ./configure;make;make install, but on software of even moderate complexity you'll spend an hour or so working out all the dependencies. This dependency complexity becomes a problem for three classes of people:
For those three classes of people, Docker solves the dependency complexity problem. The exact way Docker solves the problem can be read on the Docker explanation pages, but the bottom line is that Docker makes deployment and installation much easier for software with non-trivial dependencies.
As a user or sysadmin, you might want a software version that your Linux distribution's package manager doesn't offer. Docker to the rescue. You can run a Ubuntu version of Gnumeric on Debian, or vice versa. You can run the Debian version of Apache on Fedora, and vice versa.
Users! Can't live with them, can't live without them. They'll hack, if you let them. Using Docker containers disguised as simple applications, you can limit the damage they can do. You can let them modify or see only files in a certain directory tree. You can deploy a version of the software with specific plugins and configurations.
Encapsulation simplifies the building, use and troubleshooting of any system. By wrapping your software in Docker containers, you can limit interactions between the containers, and thereby between the pieces of software, to only those interactions that are necessary by your overall system design. The containers can give you great troubleshooting test points.
During the past four months (1/2014-4/2014), Docker's gone from a phrase heard only in the Geekiest of circles to the next big thing, praised to the high heavens by all the trade mags. If you're a sysadmin, you boost your market value by knowing and using Docker.
One more Docker benefit: Although it's challenging, it's a heck of a lot of fun.
The purpose of this section is not to help you set up Docker, nor as part of a Docker tutorial. You'd use the official Docker documentation for those purposes.
Instead, this section is designed to reassure the person, who is considering learning or using Docker, that Docker can do certain things. Therefore, the instructions are general, not specific. Look elsewhere for exactly how to do what's discussed here.
Note that, unless you haven't made your username a member of group docker, which is a huge security risk in production, you'll need to use sudo to run your docker commands. I've heard that in later versions of Docker, you can set a certain group as a Docker-privileged group without the security problems, but this is beyond the scope of this document. Anyway, although this section lists the commands without sudo, you might need to prepend sudo in front of all commands.
The following command lists all Docker commands:
docker 2>&1 | less
The following command lists all available images:
The preceding command yields a list of available images, with the following five fields:
The following command shows the system wide docker information:
The following command shows all the running containers:
The preceding command is how you get a container ID with which to perform other actions. One of those actions is to get low level information on a running container, as follows:
docker inspect <container_id>
In the preceding, note that you need only as many leftmost characters as makes it unique within all Docker IDs.
The Docker documentation says that Docker was developed in Ubuntu, so it's easiest to install Docker on a modern Ubuntu system. Believe it! Docker is not an easy install, things go wrong, and when you're a Docker newbie, you have no frame of reference for fixing those things that go wrong. Personally, I laid down a brand new Ubuntu 13.10 64 bit on a spare laptop with 2GB of RAM. By installing Docker on top of a fresh OS, I limited the the variables, reducing troubleshooting and installation hassles.
Docker depends on having a 64 bit kernel having the kernel module for handling containers. Docker is just a shell around that kernel module.
Here are the best Docker instructions I've found so far:
First, you must find an image to explore. So, to see your choice of images, perform the following command:
Once you know which image you want to run as a container, then use the docker run command, using the image ID corresponding to the image you chose from the list of images. The command will look something like this:
docker run -i -t ubuntu:<image_id> bash
And, of course, to obtain the Image ID, you'd use the docker images command, as discussed earlier.
You use the docker ps command to list containers, both running and non-running. To show the running containers, do this:
Sometimes you want to see all containers, including those that ran in the past but now are not running. To do that, you perform the following command:
docker ps -a
In the preceding, notice you can grep or grep -v on phrases like "Exited" to do a pretty good job of showing only non-running containers.
Often, you want to see info on the last container you ran. This is very handy in development. To do this, use the -l flag like the following:
docker ps -l
Or like the following, if you want to include exited containers for consideration:
docker ps -la
Sometimes, you might want to see only the container ids. This is useful when you want to use a container id in another command. To see only the container ids, and the full (long) container ids, use the following command:
docker ps -aq --no-trunc
If you want the full container id for one specific container, pipe the preceding into the proper grep command. If you want to the full container id for the newest running container, use the following command:
docker ps -lq --no-trunc
Containers you open and change are remembered even after they're closed again, using the -a argument to docker ps. But, these are sort of like temporary files: hard to find and easy to delete. So if you want to save a changed container to an image, use the docker commit command, like this:
docker commit <container_id> Repository:Tag
And of course, the way to find the Container ID, use the docker ps command discussed earlier in this document.
Each unused container takes up disk space in the layered filesystem that Docker uses. Obviously, you don't want containers you'll never use again cluttering up the disk. The following command deletes all containers not currently running:
docker rm `docker ps -a -q --no-trunc`
In the preceding, the command inside the backticks produces a list of the full container ids of every container, running or otherwise, which becomes the argument for the docker rm command. Please remember, if you're not taking the security risk of putting yourself in group docker, you'll need sudo commands before the docker commands both inside and outside the backticks.
The beauty of the preceding command is that it errors out on running containers, so it won't remove running containers.
Another thing to remember is that, if you want to save the state of any of the non-running containers, you should perform a docker commit on the non-running containers you want to save, saving them as images. The way I view things, uncommitted containers are kind of like temporary files: I don't expect to have them, unless I save them to an image.
To be useful as more than an experimentation and testing sandbox, Docker containers must be able to communicate with the outside worlds. Sure, you limit that communication, but you need some. One such communication channel is bind mounting, which enables the container to read and/or write to the host computer's disk. Here's the command to run bash where /stex inside the container refers to /tmp/steve outside the container:
docker run -v /tmp/steve:/stex -it ubuntu bash
docker run -v <host dir>:<container dir> -it ubuntu bash
I think putting the host directory before the container directory is unintuitive and a little surprising, but your mileage may vary. What's important is that you know which is which, because if you get them reversed, experimentation produces a lot of surprises without error messages.
In otherwise empty directory ~/dockerhello, create the a file called Dockerfile (note the capitalization), containing the following three lines:
FROM dockerfile/ubuntu RUN apt-get install -y elinks CMD elinks
A few words of explanation about the preceding Dockerfile:
Now it's time to build the image with the docker build command, giving the new image a tag while we're at it (with the -t option):
cd ~/dockerhello sudo docker build -t hello .
A little explanation: The -t hello tags the new image with the name "hello", and the dot at the end means "this directory", and looks for a file named Dockerfile in this directory. This directory will become more important later, as files that should go in the container are placed in trees within this directory.
Now that you've built the image named hello, running it is this simple:
docker run -it hello
You can see in the preceding there's no command to run, just the container. This is because the CMD in Dockerfile specified to run the command elinks if no command is shown on the command line.
Actually, it's a little more complicated. The CMD really specifies to run the command sh -c "elinks". To simply run elinks, the CMD would have needed to be CMD ["elinks"]. In this case, it really doesn't matter.
One way to take advantage of Docker's innate encapsulation is to have the container automatically run a specific command. You can do this two ways, from the Dockerfile:
The difference is that ENTRYPOINT does not allow overriding, whereas CMD does. Also, with ENTRYPOINT you can transparently pass arguments to it. As a matter of fact, what would have been considered a command in the docker run command translates to command line arguments for the ENTRYPOINT command.
Many times you communicate with a container via ssh. This subsection explains how to make an insecure, proof of concept container that you can ssh into. Its security problems include putting the root password in Dockerfile. Never do this in production or in an environment giving either physical or network access to other people. Never distribute an image made this way. The right way is to make a non-root user with an ssh key. This is just a proof of concept.
Start by making the following Dockerfile in its own directory:
FROM dockerfile/ubuntu RUN apt-get install -y openssh-server RUN mkdir -p /var/run/sshd RUN echo 'root:flamingo' | chpasswd EXPOSE 22 CMD ["/usr/sbin/sshd", "-D"]
The preceding starts with dockerfile/ubuntu, installs ssh-server, makes necessary directory /var/run/sshd, sets the root password to "flamingo", makes port 22 available to the outside, and tells the image to run sshd if there are no command line arguments.
Now that your Dockerfile is complete, get into the directory containing that Dockerfile, and do the following:
First, let's repeat that the preceding is an academic introduction that would be a security risk anywhere someone other than you can get their hands on it, by physical or network access. Do it at home behind a very good firewall.
You might wonder why I didn't, within the container itself, map the container port 22 to a specific port on the host, which would eliminate the need to perform a docker ps command to find the mapped port. The reason is, you might want to run several such containers. I could have also mapped the container's port 22 to a specific port on the host in the docker run command. But then I would have had to figure out what host port wasn't already used. I let Docker do the dirty work.
You might wonder why I used CMD instead of ENTRYPOINT in the Dockerfile. The answer is, until I got it completely debugged, I'd want to be able to run various commands on it. Once it's ready for production (which it will never be due to security issues), I'd change it to ENTRYPOINT.
If you really want to automate things, you can forgo the lookup for the port number, with the following one-liner:
ssh -p `sudo docker ps | \ grep myssh:latest | \ sed -e "s/.*0\.0\.0\.0://" | \ sed -e "s/->22.*//"` \ firstname.lastname@example.org
Note that in the preceding, I used sudo to run the docker commands. This is because I personally don't make myself a member of group docker, for security reasons.
Whether you use sudo or not, you'd probably implement the preceding one-liner as a shellscript or a bash command alias.
Never set a password within a Dockerfile unless it's for your own personal experimentation where nobody else can get at it either
Due to time constraints, writing the following topics must be postponed until later: