Troubleshooters.Com and eBook Tech Present

De-mystifying ePub

Career skills nobody else teaches


CONTENTS

Introduction

By Steve Litt

ePub is a format for flowing-text eBooks. "Flowing-text" means that the reader's reading device word wraps the eBook according to the reader's preferences in font, etc. Compare this to PDF type eBooks, where line breaks are compiled in to the document and can't be changed by the device.

ePub creation today (2013/2014) is where web site creation was in 1995: An arcane art only a few understand, and those few charge three figures for their cheapest work. An arcane art with only a few primitive tools to help you, and those tools often don't do what you need.

This will change. Consider that website creation today is cheap and available to all but the densest technophobes. But until ePub creation catches up to website creation, it behooves you to gain a little understanding of how ePubs are made and what's inside of them. That's what this document does for you.

This document is primarily a tutorial, meaning you actually need to perform its steps in order to gain the most knowledge from it. It's not difficult, and a lot of it can be done with copy and paste. The subject matter might be somewhat technical, but the process is pretty much of a recipe, and the final product is simple enough to understand everything.

Don't Believe It

Whole industries have developed based on convincing people that they're technologically inadequate. Bill Gates and Steve Jobs started it, leapfrogging each other to prove that you're non-technical, and that their products require absolutely no technology knowledge at all.

Don't believe a word of it. Anyone with an IQ sufficient to graduate high school can work with computer commands, if they put aside Bill and Steve's highly financed propaganda that the populace is technophobic. And don't believe Bill and Steve's assertions that you need absolutely no techno-knowledge to use a computer: True technophobes spend half their lives waiting for the computer shop to charge them an arm and a leg to fix trivial problems.

And then there's this: the principles behind ePubs aren't complicated, but they're not as simple as browsing the web, so you need some knowledge.

If you're tempted to say something like "technology and I don't get along", or "I don't have time for that stuff" or "that's not my core competency, consider the alternative. Those without basic knowledge of how ePubs are built are at the mercy of formatting mills, and those formatting mills vary from very good to terrible. A look at eBooks for any reading device will quickly convince you that ePub construction isn't trivial, even for so-called professionals. There's a lot of junk formatting out there. Junk not just from an aesthetic standpoint: Too junky to read, even for the most forgiving reader.

Beyond the need to evaluate formatting vendors, there's this: Without an understanding of the basics under the hood, you won't even know how to write your MS Word or LibreOffice or LyX documents in a way that's compatible with ePub construction. Basically, the person who refuses to take a few hours to learn how ePubs work is asking to be ripped off.

So if you're looking for a simple explanation of ePub authoring, and if you're willing to temporarily set aside any mistaken beliefs you might have about technological inadequacy, kick back and enjoy this tutorial.

Steve Litt's plans for 2014 include producing new ePub books every two weeks. That can't be done without knowledge of ePubs, and a good conversion program, which Steve is creating. Steve can be reached at his email address.

ePub vs PDF eBooks: Compare and Contrast

By Steve Litt

Doesn't it seem like the new hot discussion is "eBooks"? And doesn't it seem like, as with all hot new discussions, half-baked fakery seems the order of the day? I've heard folks say that my PDF books aren't eBooks because they're not optimized for a Kindle, iPad, Nook or other reader. To those people I ask three questions:

  1. Does eBook stand for "Electronic Book"?
  2. Is my 100K word Key to Everyday Excellence a book?
  3. Is my PDF format Key to Everyday Excellence delivered electronically?

Hopefully, what they meant to say is that my PDF, email-delivered electronic books:

The PDF format was designed from the bottom up for portability: To look the same on any computer screen, printer, or any other device. Usually, it works great that way. But viewing PDFs on tiny screen readers requires one of:

  1. Shrink the image to unreadable tininess.
  2. Format the PDF for tiny screened readers, thus wasting a lot of paper when printing.
  3. Requiring the reader to horizontally scroll on every line: An annoyance rendering it unreadable.

Note:

If you have a larger reader, like an 9" diagonal screen, and if you read the PDF in landscape mode, and if your visual acuity is pretty good (20/20), then you can easily and enjoyably read a PDF file on that reading device. But trying it on a 5" diagonal screen is walking the avenue of annoyances.

To accommodate the widely varying visual acuities and preferences of human readers, reading devices are made from the ground up to render flowing text formatted files, like Mobi, Kindle files, and the most ubiquitous, ePub. Flowing text formats determine word line breaks at read time, not compile time. If you change the magnification on your reader, the book line breaks at different places. Basically, a device reading a flowing text formatted file like an ePub word wraps, just like your favorite wordprocessor. This makes for an excellent reading experience on a small device.

As mentioned in the preceding paragraph, ePub is a ubiquitous flowing text eBook format. It can be read on almost any device (including your desktop or laptop computer, with suitable reader software). Using calibre or other conversion software, it can be converted to most other flowing-text formats.

Steve Litt is the author of the Universal Troubleshooting Process Courseware. Steve can be reached at his email address.

Note to Windows and Mac Users

By Steve Litt

This document is written from a Linux point of view, but most of it is OS independent. ePub directory layout remains constant regardless of OS. So do the contents of the files comprising ePub. A few of the commands in this document, such as the commands to zip up the directory tree, and the command to create the manifest file, are Linux-only, but equivalent commands exist in all operating systems, including OS/x and Windows.

A small part of this document's terminology is Linux-Centric, such as the word "shellscript", which would be either "powerscript" or "batch file" in Windows.

If you're a Windows or Mac user, then using your operating system's equivalents of the tools discussed in this document, you should be able to reproduce everything in this document, and learn ePub architecture from this document, with very little trouble.

Steve Litt is the author of both flowing text eBooks (Kindle in this case) and PDF eBooks. Here's an example of his flowing text eBooks: http://www.amazon.com/dp/B006QTBLA2. Steve can be reached at his email address.

Anatomy of an ePub

by Steve Litt

ePub isn't magic or rocket science. It's a zipped directory tree consisting mainly of Xhtml (the content) with CSS to determine the appearance of the Xhtml, image files for the book's graphics, and several files to serve as record keeping. The devil is in the details, and when you work on a complex ePub, it might seem dauntingly complex. But keep in mind, in theory it's fairly simple.

For convenience and so we're all on the same page, let's refer to the top directory of the zipped directory tree as the Book Root Directory.

Note:

For the purposes of this tutorial, the Book Root Directory is ~/epublearn/hello

The Book Root Directory must contain these two things:

  1. The mimetype file.
  2. The META-INF directory.

The mimetype file must contain the following text, without a newline:

application/epub+zip

Also, when zipping up the directory structure, you must not compress the mimetype file, even if the rest of the directory structure is compressed. The Hello World article later in this magazine tells you how to do that.

The META-INF directory must contain a file called container.xml. The main purpose of this file is to point to oebps package file (often referred to as the OPF), and you needn't know what that oebps package or OPF even means. All you need to know is that the OPF file describes the structure of the book and points to the files needed to define the book. So the OPF file points to all Xhtml files, all image files, and defines table of contents entries. By custom the OPF file is named content.opf, and if you want an easy life in ePub-land, you'll put this file in the Book Root Directory.

Other Things That Might Go In the Book Root Directory

Generally speaking, the more things you put in the Book Root Directory, the easier it is to get all your references right, and the easier it is to write a program to create an ePub from an Xhtml file and a little extra information. However, there's a tradeoff, because the more stuff that goes in the Book Root Directory, the more files this directory contains, which itself can become a confusion.

I think life is much easier when you put these two files in the Book Root Directory:

In addition, you probably have only one or two CSS files (sometimes called style files), and if you have that few, life will be easier putting them directly into the Book Root Directory. If, for some reason, you have a lot of style files (CSS), consider putting them in a styles directory below the Book Root Directory.

In my opinion, Xhtml files do not belong in the Book Root Directory because there are way too many Xhtml files. If you want every chapter to start at the top of the reader, the only reliable way to do that on all readers is to have each chapter be a separate Xhtml file. Therefore, there could be bunches of them. So make an xhtml directory below the Book Root Directory, and put the Xhtml files there.

Books typically have many images, so you don't want those in the book root directory. I suggest having an images directory right below the Book Root Directory, to contain all image files.

Note:

Although I recommend separate directories for filetypes occuring in large numbers, like Xhtml files and image files, most of this tutorial puts them in the Book Root Directory, just to make the tutorial simpler. Once you've used this tutorial to thoroughly understand how ePubs are built, you'll have no problem putting Xhtml files in their own directory, and image files in their own directory.

By custom, many authors and ePub creation programs create an OEBPS directory under the Book Root Directory, and put directories for Xhtml, images and styles in that OEBPS directory. Some even put their content.opf and toc.ncx files in the OEBPS directory. If your top priority is following custom, by all means do that. But if your priority is keeping things simple, with no degradation of the product the human reader encounters, then do what the examples in this magazine issue do: Put anything that others would put in OEBPS directly in the Book Root Directory.

The following is the directory structure that will be used throughout the tutorial portion of this document:

Book Root Directory |-- META-INF directory | `-- container.xml |-- mimetype |-- content.opf |-- toc.ncx |-- mybook.css |-- cover.xhtml |-- toc.xhtml |-- chap01.xhtml |-- chap02.xhtml |-- epubrocks.svg `-- cover.svg

In this tutorial, the files in the preceding structure will be put in one at a time, and will be modified from time to time as the tutorial progresses.

Steve Litt is the acting president of the Greater Orlando Linux User Group. Steve can be reached at his email address.

Theory vs. Authoring Tools

by Steve Litt

Ever since Gates and Jobs convinced the world that humans are ignorant technophobes, it's been fashionable to point and click everything, get websites to do the work for you, etc. So why not approach ePub construction that way?

The best way to answer this question is to download and read several Kindle books, including best sellers from known publishers, and notice the prevalence of horrid quality. I don't just mean aesthetics, I mean downright lack of clarity. For instance, the dialog in my Kindle edition of Eliyahu Goldratt's "The Goal" has no separation between speakers, so you can't tell who is saying what, except by going back over and over the dialog to try to deduce who is speaking, by context and speaking style. This of course completely negates the great writing style of the book. I find this kind of flaw, to a greater or lesser extent, on over half the Kindle eBooks I've downloaded. I'd say that on 15% to 20% of the books I've downloaded, such formatting errors ruined an otherwise well written book.

Note:

The preceding paragraph singles out Kindle books for one and only one reason: I have no iPad or Nook on which to read, so I can't point to bad eBook construction on iPad and Nook. However, I seriously doubt that the quality control on those is any better than on Kindle books or on the few independent ePubs I've read. The fact is, in 2013, most publishers and even independent authors and self-publishers are creating their flowing-text eBooks on the cheap.

One of the goals of this document is to change that.

Publishers seem to think there's a magic software machine that can take their backlist and convert former print books to flowing text eBooks suitable for reading. The resulting eBooks they create show the fallacy of their thought: It's an eBook all right, but bad fonts and confusing formatting get in the way of the reader almost continuously. Some publishers go so far as to try to convert PDF files to flowing text eBooks. What could possibly go wrong. "Gee whiz, why do I keep getting these artifacts just before what was the bottom of each PDF page?"

Generally speaking, my opinion is that self-publishers do a better job of eBook formatting than their big-publisher brethren, but they too screw up plenty. They typically try to shoot their MSWord manuscript through calibre and hope for the best. Or they try to write their book in Sigil, an excellent interface to the ePub directory tree and files, but pretty confusing. Sigil works best when you know the theory.

Which brings up this fact: ePub authoring tools work best when you know something about ePub directory and file structure. Such theory helps you author the input to the process, and helps you intervene in the conversion so as to produce a good product. So you need the theory, the theory isn't all that hard, and this document teaches you the theory.

Steve Litt is the author of Twenty Eight Tales of Troubleshooting. Steve can be reached at his email address.

ePub Hello World

by Steve Litt

In this article you'll make an ePub from scratch, and test it with the ebook-viewer program that comes with Calibre. Perform the following steps:

  1. Create the directories with mkdir -p ~/epublearn/hello/META-INF, and notice that this makes epublearn/hello below your home directory the book's Book Root Directory.
  2. cd ~/epublearn/hello
  3. echo -n application/epub+zip > mimetype to create the mimetype file.
  4. Put the following into ~/epublearn/hello/META-INF/container.xml:
    <?xml version="1.0"?>
    <container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
    <rootfiles>
    <rootfile full-path="content.opf" media-type="application/oebps-package+xml"/>
    </rootfiles>
    </container>
    
  5. Put the following into ~/epublearn/hello/content.opf:
    <?xml version="1.0" encoding="utf-8" standalone="yes"?><package xmlns="http://www.idpf.org/2007/opf" unique-identifier="BookID" version="2.0">
    <metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
    <dc:title>My First Epub</dc:title>
    <dc:identifier id="BookID" opf:scheme="CustomID">HelloWorld</dc:identifier>
    </metadata>
    
    <manifest>
    <item href="chap01.xhtml" id="chap01_page" media-type="application/xhtml+xml"/>
    </manifest>
    
    <spine toc="ncx">
    <itemref idref="chap01_page" linear="yes"/>
    </spine>
    
    <guide>
    </guide>
    
    </package>
    
  6. Put the following into ~/epublearn/hello/chap01.xhtml:
    
    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
    <head>
    <meta http-equiv="content-type" content="application/xhtml+xml; charset=UTF-8"/>
    <meta http-equiv="content-style-type" content="text/css"/>
    <meta http-equiv="expires" content="0"/>
    </head>
    
    <body>
    <h1 id="chaptertitle">Chapter One: Overview of ePub</h1>
    <p>ePub is wonderful when done right. When done wrong, it's a crime against nature. A look at a few Kindle books show that lot of people do it wrong. A good ePub enables the reader to tell where paragraphs start and end at a glance, and where dialog paragraphs start and end so the reader knows who's saying what.</p>
    
    <p>Of course, Kindle books aren't ePubs, but they're usually made from ePubs.</p>
    
    <p>The trick to making a good ePub is to understand that ePubs are flowing text documents, meaning that their line breaks don't happen until read time, and there's no such thing as set pages. Line spacing and margins that work well on paper and in PDF files fail miserably in ePub files.</p>
    
    <p>Here are just a few tips that scratch the surface of making a good and readable ePub book:</p>
    <ul>
    	<li>Set all appearances with CSS styles: Never change appearance within the body of the source Xhtml file. This means you need a style for every kind of paragraph or word sequence that should have its own appearance.</li>
    	<li>Don't override the default appearance settings the human reader put into the reading device. Assume nothing about the reader's taste or visual acuity. Your book isn't an aesthetic masterpiece, but if you're lucky it might be a literary one. But only if the reader can easily read your text, which might not happen if you override his or her default font.</li>
    	<li>Graphics and diagrams that work well on paper and PDF can fail miserably in an ePub read on a small-screen device. All diagrams should be simple enough to be legible when viewed full size, without horizontal scrolling, on a small-screen device. Such graphics aren't always easy to create, and they might not look good in a print book or PDF, but they're the only way to go for ePub expected to be viewed on small-screen devices.</li>
    	<li>If you're writing a book destined for both PDF (or print) and ePub, build it from the ground up to be both, and understand you'll need to hand-tweak styles on the ePub to make them look good.</li>
    	<li>If you're converting an existing paper or PDF book, don't just put it through a process, but instead, do the work to get it to look good on a small-screen device hosted ePub. If you don't do that work, your readers will hate you.</li>
    </ul>
    </body>
    </html>
    
    
    
  7. cd ~/epublearn/hello
  8. zip -0Xq /tmp/hello.epub mimetype
  9. zip -Xr9Dq /tmp/hello.epub *
  10. ebook-viewer /tmp/hello.epub

If you've done everything right, ebook-viewer should now show the contents of your chap01.xhtml file. If not, make sure you've installed the Calibre program so you have ebook-viwer. Use an archive reader program to make sure /tmp/hello.epub contains what you think it does. Look for error messages, and investigate each one.

Comments on the Process

If you're mystified why it took two zip commands to make the ePub, remember back when I said that the manifest file could not be compressed? The first zip command zipped manifest without compression, and the second zip command zipped everything else.

If you're wondering why I used echo -n to make the manifest file, remember that earlier I said the manifest file cannot have a newline? The -n is how that's accomplished.

Don't forget that the ebook-viewer program is part of the Calibre eBook format conversion package, so you must install Calibre. Calibre is also essential for converting your newly made ePub to other formats.

Wrapup

You just made an ePub, by hand, from scratch, no specialized tools involved. You ePub had no stylesheet, and therefore no style. It had no table of contents: Either the ePub kind or the HTML kind. It had no cover, no title page. It had only one chapter. It was trivially simple, but you made it by hand, and now you've seen ePub construction isn't rocket science. The next few articles expand on what you've done.

Steve Litt is available to select clients to personally teach the Universal Troubleshooting Process Course. Steve can be reached at his email address.

Add a Stylesheet

by Steve Litt

Stylesheets for ePub books are CSS (Cascading Style Sheets). This article guides you through adding a stylesheet to your ePub. This will be a very simple stylesheet: All it does is make your ordinary paragraphs (very) indented instead of spaced. The reason for the exaggerated indentation is so there can be no doubt that indentation was caused by your stylesheet.

Do the following:

  1. Add this line to the <head> section of chap01.xhtml:
  2. Put the following CSS in new file mystyles.css, located in the Book Root Directory:
  3. Add this line to the manifest part of content.opf:
  4. rm /tmp/hello.epub
  5. cd ~/epublearn/hello
  6. zip -0Xq /tmp/hello.epub mimetype
  7. zip -Xr9Dq /tmp/hello.epub *
  8. ebook-viewer /tmp/hello.epub

Isn't it great when a plan comes together? Notice that now there's no space between paragraphs, but the first line of each paragraph is extremely indented. This was an experiment: You don't want your paragraphs to really look this way, so in mystyles.css, change the p to p.disabled1. Once you've done this, the style applies only to pararaphs whose class is "disabled1", so all your paragraphs revert to the default. Re-zip to see your paragraphs go back to the default.

Lessons Learned

Here's what you can take away from this article:

Steve Litt is the author of The Key to Everyday Excellence. Steve can be reached at his email address.

re-zip.sh

by Steve Litt

It's about time to make the whole re-zip process and the re-displaying of the revised ePub into a single command. Make the following shellscript:

zip -0Xq /tmp/hello.epub mimetype zip -Xr9Dq /tmp/hello.epub * ebook-viewer /tmp/hello.epub

Name the shellscript rezip_hello.sh. You can now re-zip like this:

./rezip_hello.sh

Note to Mac and Windows Folks:

A shellscript is the same as a Windows batch file or powerscript. Shellscripts work just fine in the Mac world, as long as you've installed bash. For a more Mac like experience, you can assign a shortcut to the shellscript and a facility to run it with a single click.

For the rest of this document, when the document tells you to re-zip or rebuild the ePub, just run ./rezip_hello.sh (or an equivalent you made for your operating system) from the Book Root Directory. If you're using Windows, this should be an equivalent batch file or powerscript. If you're using Mac and have Bash installed, put the same commands into a Bash script on your Mac, and run that. If you have any difficulties getting these things working on Windows or Mac, email me.

Steve Litt has written a large collection of Lyx documentation. Steve can be reached at his email address.

Add a Second Chapter

by Steve Litt

This section guides you though adding a second chapter to your ePub, and also guides you in making chapter titles using <h1> entries at the top of each chapter's body.

Before doing anything else, create the following chap02.xhtml file in the Book Root Directory:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<title>Lame, Moot Title, Chap2</title>
<meta http-equiv="content-type" content="application/xhtml+xml; charset=UTF-8"/>
<meta http-equiv="content-style-type" content="text/css"/>
<meta http-equiv="expires" content="0"/>
</head>

<body>
<h1 id="chaptertitle">Chapter Two: Calibre</h1>
<p>When it comes to converting between various eBook formats, and sometimes even importing and exporting to some of these formats, Calibre is a powerful tool. Calibre also comes with great utilities like ebook-viewer to view an ebook, and epub-fix to fix and check an ebook, and find subtle errors.</p>

<p>Be careful with epub-fix though: It changes the file rather than just checking it. Use epub-fix on a copy, not on the original.</p>

</body>
</html>

Note:

If you look carefully, you'll see that both chap01.xhtml and chap02.xhtml have an <h1> item whose id attribute is "chaptertitle". This wouldn't work if both were in the same Xhtml file, but they're not, they're in two different Xhtml files. So it's perfectly acceptable to have identical id elements in an ePub, as long as they're in different Xhtml files. This can be very handy.

The preceding is the Xhtml for the second chapter. Notice the reference to mystyles.css in its <head> section.

You've now added the content for Chapter 2, and applied a CSS file to it with a reference to mystyles.css, but you still need to include chap02.xhtml in the manifest and spine sections of the content.opf file. So do the following:

  1. Edit content.opf
  2. Anywhere within the manifest section, insert the following line:
  3. Immediately after the Chapter 1 entry within the spine section of content.opf, insert the following command:
  4. Re-zip and view the result.

If things go the way they should have, the book will start with chapter 1, and after the conclusion of chapter 1, chapter 2 will begin on a new page. You now have a two chapter book.

Discussion

This exercise showcased the following facts:

In order to give you a gut feel for how the Spine, Manifest and content files relate, the next article walks you through some experiments...

Steve Litt is the author of Troubleshooting: Just the Facts. Steve can be reached at his email address.

Experiments With Multiple Files

By Steve Litt

Here are some experiments to help instill in your mind the purposes of certain files and their contents, and the relationships between all of them. When you finish these experiments, try to form a picture in your mind about how everything interrelates.

Reversing Spine Entries

In the Spine part of content.opf, reverse the chap01_page and chap02_page entries, so the chap02_page entry comes first. Re-zip, and notice that now, when you page through the book, Chapter 2 comes before Chapter 1. Then reverse their entries in the Manifest section, re-zip, and notice there's no change. Finally, put the Spine entries back in their proper order, re-zip, and notice that once again, Chapter 1 comes first when reading the book from start to finish.

This experiment proves that, for the purpose of paging straight through a book, that the order of the entries in the Spine determine reading order, but Manifest entries can be in any order at all.

Adding Xhtml file Without Manifest And Spine

Make a backup copy of content.opf, delete all references to chapter two, both in the Manifest and in the Spine, and re-zip. You'll notice that the book opens just fine, but it now contains only chapter 1.

What you've proven by this experiment is that extra files (in this case chap02.xhtml, because it's no longer referenced by the Manifest or Spine) can harmlessly exist in your directory tree. Other than consuming more disk space, there's no harm in extra files.

Danger Will Robinson!

The ebook-viewer program, and I assume certain other devices and ePub reading programs, processes any file with an .ncx extension, even if it's not listed in the Manifest or Spine. This causes all sorts of problems and troubleshooting headaches. This is an exception to the "extra files cause no harm" rule.

Of course, unless you have a very specific reason to have extra files just laying around, it's poor practice. But it won't throw an error, so if you have error messages, they're probably not because of extra files not named in content.opf.

Adding Spine Without Manifest

This experiment adds a Spine entry for an existing file, but forgets the corresponding Manifest entry.

So, using the content.opf you just got through experimenting with, add back the Spine entry:

<itemref idref="chap02_page" linear="yes"/>

Re-zip, and notice that the book opens just fine, with Chapter 1 only. The bogus Chapter 2 spine entry, whose idref refers to something not existing in the Manifest, is ignored. I find that rather unexpected, but that's how it goes.

Warning

The ebook-viewer program would not have been so forgiving if you had had a correct toc.ncx file containing an entry for Chapter 2.

Adding Manifest Without Spine

This experiment adds a Manifest entry to an existing file without adding a corresponding Spine entry.

So, using the content.opf you just got through experimenting with, delete the Chapter 2 spine entry, and add the Manifest entry for chapter 2:

<item href="chap02.xhtml" id="chap02_page" media-type="application/xhtml+xml"/>

You'll notice that it makes an eBook containing only Chapter 1, because there's no Spine entry for Chapter 2.

Warning

The ebook-viewer program would not have been so forgiving if you had had a correct toc.ncx file containing an entry for Chapter 2.

Adding Manifest and Spine Without File

This experiment references a nonexistent file from both the Spine and the Manifest. Start by restoring backup.opf from the backup you made, and make sure to leave the backup intact for later. Re-zip and verify that you now get the book, with both chapters in the right order.

Now add this line to the Manifest:

<item href="chap99.xhtml" id="chap99_page" media-type="application/xhtml+xml"/>

And add this line to your Spine:

<itemref idref="chap99_page" linear="yes"/>

Re-zip, and notice that the book works just fine, with two chapters in the right order and no error messages. For extra credit, delete just the Spine Chapter 99, and then put it back and just delete the Manifest Chapter 99, to verify that wrong references in content.opf are ignored.

Note

This missing file scenario won't throw a major error in ebook-viewer as long as the toc.ncx file contains no entry for Chapter 99. Generally speaking, fatal errors get thrown when toc.ncx or another .ncx file has a reference to a nonexistent filename. The ebook-viewer program does not throw an error when a toc.ncx id doesn't match the corresponding one in content.opf.

Missing Stylesheet Reference in Manifest

Using the content.opf with which you've been experimenting, delete this line from the Manifest:

<item href="mystyles.css" id="css" media-type="text/css"/>

Also, uncomment the p entry from mystyles.css. Re-zip, and if you're anything like me, you'll find the result surprising: It compiles just fine, and recognizes the styles in mystyles.css, without the entry in the Manifest. My guess would be that the link references in the individual Xhtml files were sufficient. But my understanding is that the correct way to do things is to list every single file needed as a resource for the ePub within the Manifest, so I imagine some readers would have trouble with a missing stylesheet reference within the Manifest.

Restore content.opf From Your Backup

You're done experimenting for the time being, so please restore content.opf from your backup.

Lessons Learned

It's pretty hard to make an ePub fail completely just by having bad references between files, Manifest entries, and Spine entries within content.opf. There are ways to make an ePub re-zip fail, but badly referenced Manifest and Spine entries alone aren't among the ways.

One way to reliably make the ePub fail is to reference a nonexistent file from toc.ncx.

Another lesson learned is the relationship between Manifest and Spine. The Manifest must list every file used by the ePub. Actually, as shown in the experiments, must might be too strong of a word, but you should list every file used by the ePub. On the other hand, only visible files, like Xhtml files, should be listed in the Spine, and these should be listed in the order you want them to appear within the ePub book.

Steve Litt is the author of several human performance books. Steve can be reached at his email address.

Add ePub Table of Contents

By Steve Litt

ePub readers such as the ebook-viewer software program, as well as hardware like Kindle, iPad and Android, provide a view to a book's table of contents. This view is entirely outside of the reading material. There's typically a "Table of Contents" (TOC) button to click or press, and when clicked or pressed, the book's table of contents is shown, such that when you click on a chapter, you're transported to that chapter.

This TOC view is not the same as an Xhtml table of contents that may or may not appear toward the front of the book, the TOC view is outside the text of the book.

A little about terminology. The TOC outside the reading material, seemingly provided by the device itself, is usually called the ePub TOC. The TOC within the text, typically in the same place as a TOC would appear in a print book, is usually called the HTML TOC, or the Xhtml TOC. For the rest of this document we'll use those terms.

The ePub you've developed so far has neither kind of table of contents. In the section you're now reading, you'll create the ePub TOC.

The ePub TOC is created using the toc.ncx file. It doesn't have to be called that, but it almost always is. I like to keep my toc.ncx in the Book Root Directory. After all, why make life any more complicated than it has to be?

ePub Making Procedure

First, put the following in new file toc.ncx, located in the Book Root Directory:

<?xml version='1.0' encoding='utf-8'?>
<ncx xmlns="http://www.daisy.org/z3986/2005/ncx/" version="2005-1" xml:lang="eng">
  <head>
    <meta content="HelloWorld" name="dtb:uid"/>
    <meta content="1" name="dtb:depth"/>
    <meta content="Me, myself and I" name="dtb:generator"/>
    <meta content="0" name="dtb:totalPageCount"/>
    <meta content="0" name="dtb:maxPageNumber"/>
  </head>
  <docTitle>
    <text>My First Epub</text>
  </docTitle>
  <navMap>
    <navPoint id="chap01_page" playOrder="1">
      <navLabel>
        <text>Chapter 1: Overview of ePub</text>
      </navLabel>
      <content src="chap01.xhtml#chaptertitle"/>
    </navPoint>
    <navPoint id="chap02_page" playOrder="2">
      <navLabel>
        <text>Chapter 2: Calibre</text>
      </navLabel>
      <content src="chap02.xhtml#chaptertitle"/>
    </navPoint>
  </navMap>
</ncx>

Next, add the following line to the Manifest portion of content.opf:

<item href="toc.ncx" id="ncx" media-type="application/x-dtbncx+xml"/>

Then, re-zip. If you have errors, see the Rules of the Road and Gotchas and Landmines subsections of this section. If it works OK, go to the software or device's Contents facility, and notice that Chapters 1 and 2 are represented, and each properly takes you to its associated chapter. If that worked, you've been successful, but you should still read the Rules of the Road and Gotchas and Landmines subsections of this section.

Rules of the Road

Here are some rules of the road about toc.ncx:

Gotchas and Landmines

Danger Will Robinson!

The ePub incorporates any file with extension .ncx that you happen to keep laying around the Book Root Directory, or perhaps the directory pointed to by META-INF/container.xml. Either way, if you keep getting strange error messages, check to make sure there are no extraneous files ending in .ncx hanging around your directory tree. This could be a very difficult problem to diagnose if you aren't aware of it ahead of time.

If an entry in toc.ncx has a reference to a nonexistent file, this will cause a fatal error with an error message something like this:

u'/tmp/calibre_0.8.51_tmp_fmHr9G/e_POy5_ebook_iter/chap02e.xhtml' is not in list

If you see an error message like that, verify that there are no extraneous .ncx files in the directory tree, and that toc.ncx contains no references to filenames that don't exist. Note that one reason they might not exist is a bad path in front of them.

Steve Litt is the author of Rapid Learning for the 21st Century. Steve can be reached at his email address.

Add HTML Table of Contents

By Steve Litt

The ePub TOC is triggered by a special action on the device itself. Perform that action (usually touching or clicking an icon), and the book is temporarily replaced by the Table of Contents. An ePub TOC is not the same as the Table of Contents appearing at the start of a document, within the Xhtml. This section is about the Table of Contents appearing at the start of a document, called the Xhtml TOC.

Note:

Some eBook publishers and authors put the Xhtml Table of Contents at the end of the book. That actually makes sense for a fiction book, where the reader wants to start at the Prolog or Chapter 1, and read straight through to the end. But for the purposes of this document, the Xhtml Table of Contents is presumed to be at the beginning of the eBook.

You might wonder why you should have both types of TOCs. The answer is simple enough: To accommodate the reader. Some people prefer going to the front of the book to use a traditional (like print books) TOC, while others prefer to use the one triggered by the device itself, and therefore not lose their place reading, if they decline to select a chapter.

So make the following Xhtml file, and call it toc.xhtml:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<title>Table of Contents</title>
<meta http-equiv="content-type" content="text/html; charset=UTF-8"/>
<meta http-equiv="content-type" content="application/xhtml+xml; charset=UTF-8"/>
<meta http-equiv="content-style-type" content="text/css"/>
<meta http-equiv="expires" content="0"/>
<link href="epublearn/hello/mystyles.css" type="text/css" rel="stylesheet" /> 
</head>

<body>
<h1>Table of Contents</h1>
<ul>
<li><a href="chap01.xhtml#chaptertitle">Chapter 1: Overview of ePub</a></li>
<li><a href="chap02.xhtml#chaptertitle">Chapter 2: Calibre</a></li>
</ul>
</body>
</html>

Now, put toc.xhtml in the Manifest and Spine of content.opf. Put the following line in your Manifest:

<item href="toc.xhtml" id="toc_page" media-type="application/xhtml+xml"/>

Next, put the following line in the Spine:

<itemref idref="toc_page" linear="yes"/>

And finally, put the following line in the Guide part of content.opf, the part delineated by <guide> and </guide>:

<reference href="contents.xhtml" type="toc" title="Table of Contents"/>

Re-zip, and you have a functioning Xhtml table of contents in the front.

Discussion

OK, a lot of new things happened in this section. The most obvious was the use of the Guide part of content.opf. I'm not positive what the Guide is, but I think it's a way of letting the device know about special Xhtml files. Later, when we make a Title Page, that too will go in the Guide. And that makes sense, because almost all readers have a built in facility to go to the title page.

Notice how, in the links for each chapter, the href value is a filename, followed by a pound sign (#), followed by the id of the <h1> chapter title from that file. The href doesn't contain a reference to anything in the Manifest. There's no way of an Xhtml file knowing about ID values for other Xhtml files, or even ID values from the Manifest. What goes in each href is a filename.

By the way, the filenames in the links are relative to toc.xhtml, so even if you put all the Xhtml in a different directory, those links wouldn't change. References to them in your Manifest, Spine and Guide would change, but not within toc.xhtml itself.

Steve Litt is the originator of the Universal Troubleshooting Process. Steve can be reached at his email address.

Add Title Page

By Steve Litt

A book's title page gives its title and author, and is usually fairly stylized. Therefore, this is the process to incorporate your title page:

  1. Make titlepage.xhtml
  2. Put titlepage.xhtml in the Manifest
  3. Put titlepage.xhtml in the Spine
  4. Put titlepage.xhtml in the Guide
  5. Add the necessary styles to mystyles.css

The reason you put it in the Guide section is because the title page is one of those special pages that the device can point you to.

Make titlepage.xhtml

Create titlepage.xhtml in the Book Root Directory, containing the following:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<title>Titlepage</title>
<meta http-equiv="content-type" content="text/html; charset=UTF-8"/>
<meta http-equiv="content-type" content="application/xhtml+xml; charset=UTF-8"/>
<meta http-equiv="content-style-type" content="text/css"/>
<meta http-equiv="expires" content="0"/>

<link href="mystyles.css" type="text/css" rel="stylesheet" />

</head>
<body>
<div class="titlepage">
<p class="title">My First ePub</p>
<p class="author">by Your-Name-Here</p>
<p>First in a series by Your-Name-Here</p>
<p class="information">This is the title page for my book. If this were a real book, the copyright page would follow.</p>
</div>
</body>
</html>

The only interesting thing in the <head> section is the link to stylesheet mystyles.css, which you've seen before. The Body contains the title, author, a "first in series" blurb, and a little explanation. It's pretty much a straightforward Xhtml page.

Put titlepage.xhtml in the Manifest, Spine and Guide

Make the following entry in the Manifest:

<item href="toc.xhtml" id="toc_page" media-type="application/xhtml+xml"/>

The following entry goes in the Spine, at the top of the list:

<itemref idref="title_page" linear="yes"/>

Last but not least, the following entry goes in the guide, because the Title Page is a special page that should be accessible directly to the device:

<reference href="titlepage.xhtml" type="title-page" title="Title Page"/>

A word of explanation on the last entry: There is a predefined list of types that can be put in the guide. Here are some of the major ones:

When you put items in the guide, their Types should be one of those if you want an easy life. Every real book should have at least the cover, title-page and toc entries. To see a complete list of the predefined types, go here: http://www.idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.6

Add the necessary styles to mystyles.css

Before going farther, re-zip your ePub, and when you view it you'll see it has a title page, but the title page is all normal font, regular paragraph styles. It doesn't look like a title page. We'll fix that now...

Add the following to mystyles.css:

div.titlepage{text-align: center;}
div.titlepage p.title{font-size: 200%; font-weight: bold; margin-top: 2ex;}
div.titlepage p.author{font-size: 150%; font-weight: bold; margin-top: 3ex; margin-bottom: 10ex;}
div.titlepage p.information{text-align: left;}

Re-zip again, and now the title page should look like a real title page, though simple.

Steve Litt is the originator of the Universal Troubleshooting Process. Steve can be reached at his email address.

Add an Image

By Steve Litt

In this section you'll add a very simple graphic. Right click images/epubrocks.svg and choose "download" or "save this link as" to download the graphic. If you wonder what this graphic looks like, it looks like the following:

epubrocks.svg

Put the graphic in the Book Root Directory. Be sure to put it in the Book Root Directory, because in this simple example, all graphics go in the Book Root Directory.

Now, add the following tag to chap01.xhtml, just before the bullet item starting with "Graphics and diagrams that work well on paper and PDF". Put it before the <li> of that bullet item and after the </li> of the bullet item before it.

Finally, add the graphic to the Manifest of content.opf, like this:

<item href="epubrocks.svg" id="epubrocks.svg" media-type="image/svg"/>

Re-zip, and you should see the graphic in the ePub. If not, you probably got the filename wrong, or accidentally put it in a directory other than the Book Root Directory, or perhaps one of your links or references pointed to a wrong directory.

Steve Litt is the originator of the Universal Troubleshooting Process. Steve can be reached at his email address.

Add Cover

By Steve Litt

You might wonder why eBooks need a cover. They need it for marketing purposes: Even though eBooks have no need of a protective covering, people still expect and enjoy a cover, and in fact a lot of people judge the book by its cover, and you're not going to change that. So your book needs a cover, and this section shows you how to put it on.

Note:

The book cover used in this section is simplistic and not particularly tasteful, but it's simple and small and good for a tutorial. If you write books you intend to have people read, your cover should be much better looking.

Adding your cover requires the following steps:

  1. Create the cover graphic
  2. Create the cover Xhtml page, whose job is to display the graphic
  3. Put a reference of the cover Xhtml page in the Manifest
  4. Put a reference of the cover Xhtml page in the Spine
  5. Put a reference of the cover Xhtml page in the Guide
  6. Re-zip and view

Create the cover graphic

There's not much to say about this. Query the device maker for the best dimensions for the cover graphic. Unless it's intended to be photo-realistic, an SVG file is probably the way to go. If you don't have the artistic skill to make good cover art, hire somebody. Do not use a random downloaded graphic or a graphic that might belong to someone else, because that could get you in terrible legal trouble. Personally, I don't use the stock artwork sold by stock art vendors, because I've found no way to verify that the vendor has the right to license me to use the stock artwork.

For the purposes of this tutorial, the cover looks like this:

Book cover for this tutorial

You can download this graphic by right-clicking here, and saving it in your Book Root Directory.

Create the cover Xhtml page, whose job is to display the graphic

The cover Xhtml page's sole purpose in life is to house the cover image. Create new file cover.xhtml in the Root Document Directory, and fill it with the following:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<title>Coverpage</title>
<meta http-equiv="content-type" content="text/html; charset=UTF-8"/>
<meta http-equiv="content-type" content="application/xhtml+xml; charset=UTF-8"/>
<meta http-equiv="content-style-type" content="text/css"/>
</head>
<body>
<img src="cover.svg" alt="Cover Page" class="coverimage" />
</body>
</html>

Depending on the device vendor's specifications, you may, or may not, need to set the margins of the image. If so, you'll notice that the image is of class "coverimage", so you can set margins for this one image without affecting other images. In order to make my cover image cover the entire eBook renderer's screen, I created the following style, in mystyles.css, to set its margins to zero:

img.coverimage{margin: 0ex;}

Your mileage may vary on the style, and please remember to consult the vendor of your target device.

Put a reference of the cover Xhtml page in the Manifest, Spine and Guide

Put the following line in your Manifest:

<item href="cover.xhtml" id="cover_page" media-type="application/xhtml+xml"/>

Next, put the following line at the top of the Spine list:

<itemref idref="cover_page" linear="yes"/>

Finally, put the following line in the Guide list:

<reference href="cover.xhtml" type="cover" title="Cover"/>

In the preceding, notice the type attribute is "cover". This is one of the predefined types that can go in the guide. A complete list of such predefined types for the Guide is contained at http://www.idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.6

Re-zip and view

Re-zip and view the ePub. If you've done everything right, the book now starts with the cover image, then goes on to the cover page, and then on to the table of contents.

Steve Litt is the originator of the Universal Troubleshooting Process. Steve can be reached at his email address.

Reorganize Directories

By Steve Litt

This article first discusses the whole idea of organizing your ePub into directories, and then walks you through reorganizing your current ePub into the directory structure I recommend.

Discussion

In all the exercises so far, did you notice all files except ~/epublearn/hello/META-INF/container.xml were in the Book Root Directory, ~/epublearn/hello/? I made them like that because life is much simpler that way. Think of all the references, first from container.xml to content.opf, and then from content.opf to toc.ncx to all Xhtml, image, and style (Css) files, and the references from each Xhtml file to style files, and imagine how much more complex all of that would be if everything were in different directories.

These references must use relative paths, not absolute. Imagine if a reference from content.opf needed to go up one directory and down two. You'd better have a good relative directory parsing tool if you're writing a converter, and you'd better be able to count well if you're doing it by had.

And yet...

And yet having 200 or so loose files in the Book Root Directory would be even worse. What a confusing mess that would be, especially if your naming conversion weren't exactly perfect. The majority of those bunches of files must be segregated by purpose. So here's what I propose...

The vast majority of all those files, in a real life book, will be Xhtml files comprising chapters (and the cover page and appendices and indexes and the like), and image files. So my advice is to put an images directory and a directory for the Xhtml files directly under the Book Root Directory. The great and good Sigil program calls the Xhtml directory text. Perhaps they envisioned content that's not Xhtml going into an ePub. Well, whoever wrote Sigil is a genius, so I'd currently suggest using Sigil's convention of calling the directory for Xhtml text rather than xhtml.

Once all images and Xhtml files are in their own directories, what do you have left in the root? Those directories, the META-INF directory, content.opf, toc.ncx, and one or a few style (CSS) files. Very manageable.

If your eBook has lots of some other kind of file, like video files or font files (but remember what I said about respecting your reader's choice of fonts?), you might want to segregate those. My rule of thumb is this: One of a kind or a few of a kind (like style files, how many can you have?) should be in the Book Root Directory. File types that occur in great numbers should be segregated in their own directories directly off the root.

But that isn't customary!

I know, I know, the customary way to structure ePub directories is to have the Book Root Directory contain nothing but mimetype, the META-INF directory, and a catch-all directory called OEBPS, with the OEBPS directory containing content.opf, toc.ncx, all content files, either laying around loose in OEBPS, or in directories such as OEBPS/text and OEBPS/images. That's the idiomatic organizational method, and you look more professional doing it that way.

But please understand, from a technological standpoint, as well as from the viewpoint of the reader, neither organization is superior to the other. So I recommend keeping it simple. Keeping the one-off files in the Book Root Directory makes authoring, whether by hand or if you're writing a conversion program, much simpler, with inter-file references that are much simpler. The simpler organization reduces the likelihood of mistakes, and I'll prioritize that over "customary" every single time.

Reorganization Procedures

Note:

I will delay writing of the actual instructions to reorganize the directories until later, because doing so is fairly complex, and I want to get this document up on the web as soon as possible.

Within the next couple months I plan on completing this subsection, so check back.

This section walks you through putting your Xhtml files and your image files in their own directory, and changing everything so that works. When you're finished, the directory structure will look like the following:

Book Root Directory |-- META-INF directory | `-- container.xml |-- mimetype |-- content.opf |-- toc.ncx |-- mybook.css |-- xhtml directory | |-- cover.xhtml | |-- toc.xhtml | |-- chap01.xhtml | |-- chap02.xhtml `-- images directory |-- epubrocks.svg `-- cover.svg

The following procedure is written using Linux commands. If you're using Windows or Mac, please follow along with your OS's commands.

  1. cd ~/epublearn
  2. cp -Rp hello withdirs to make a duplicate directory to work with.
  3. cd withdirs
  4. Edit your re-zip.sh script, changing all occurrences of hello.epub to withdirs.epub
  5. rm /tmp/hello.epub to get rid of that file and eliminate some possible confusion.
  6. Make absolutely sure the directory you're in is ~/epublearn/withdirs to prevent later confusion.
  7. Run your re-zip script, and notice it works perfectly, making you a brand new withdirs.epub. This is your baseline to show that simply copying the whole directory didn't cause things to fail.
  8. mkdir xhtml
  9. mkdir images
  10. mv *.xhtml xhtml/
  11. mv *.svg images/
  12. Run your re-zip script, and notice it fails badly, with a "not in list" fatal error, and several "missing spine item" messages on the terminal on which you ran your re-zip script. You will fix this by editing content.opf and toc.ncx.
  13. Open content.opf in your favorite editor and make the following changes:
  14. Open toc.ncx in your favorite editor and, for each <content> tag in the <navMap>, prepend xhtml/ in front of the src property value. In other words, change src=chap02.xhtml#chaptertitle to src=xhtml/chap02.xhtml#chaptertitle and then save your changes.
  15. Run your re-zip script. Hooray, it works! Oops, wait a minute, it doesn't quite work. If you look carefully, you'll notice the cover image isn't visible, and neither is the epubrocks.svg graphic. And if you look very carefully, you'll notice that the title page is all regular text, with no styles. This is happening because links, within the .xhtml files to CSS files and images need to be adjusted for the new directory structure. No problem, you'll do that next.
  16. For each .xhtml file in the xhtml/ directory:
  17. Run your re-zip script. This time it really works, because all CSS links and image sources in every .xhtml file have been adjusted to reflect the new directory structure, and everything in content.opf and toc.ncx has likewise been adjusted.

In my opinion, the directory structure you put together in this exercise is a professional directory structure suitable for eBooks without fonts and video. My ePub making software that I'm going to rewrite soon will be based on this directory structure.

Steve Litt is the originator of the Universal Troubleshooting Process. Steve can be reached at his email address.

Troubleshooting Your ePub

By Steve Litt

The exercises in this ePub tutorial were simple enough to convince you that writing ePubs is easy. Don't you believe it! First of all, the two chapter ePub you made here can't compare with a fifty chapter ePub with a couple appendices, a multi-level TOCs for the ePub and in the Xhtml, a hundred graphics, and a 100K word length. Additionally, as you saw in the tutorial, the ebook-viewer program allows the book creator to do some pretty sloppy ePub making. That's nice, but I doubt all devices would be so nice.

Before embarking on a discussion of ePub troubleshooting, let's review a few troubleshooting tools for ePubs...

ePub Troubleshooting Tools

Here's a list of ePub troubleshooting tools:

Read error dialog

The error message is a gift from the ebook-viewer developers to you. It's your first and best chance to narrow the root cause scope of the problem. Consider the following error dialog box:

Fatal error dialog box

There's quite a bit of information here. The dialog's title bar tells you it couldn't open the eBook, so you know it's a fatal error. As far as the message itself, most of it appears to be a temporary path, but do you see the "chap02z.xhtml" at the end? Note that the error message says that file is "not in list". What list? The list is a Python (computer language) list (array) visible only if you're looking at, and debugging, the Python source of ebook-viewer. Nevertheless, this error message gives you enough info to troubleshoot, without resorting to Python debugging.

Later in the Troubleshooting section I'll discuss good ePub coding practices, one of which is good naming conventions. If you've been practicing good naming conventions, you'll know whether there really should be a "chapter02z.xhtml". If not, you can use grep to find it, and change it to what it should be. Even if you there should be a "chap02z.xhtml" or even if you don't know, the grep can show you how many places it occurs, which can often help you.

See the "Show details" button? Click it, and you'll see the entire Python stack trace for the fatal error. On that stack trace, the very last entry is your error message, so that's what you'll pay attention to (unless you're very good at Python and willing to go through the source code of ebook-viewer).

See the "Copy to clipboard" button? If you click it, you put the Python stack trace on the clipboard, ready to insert in a file. This is handy if you want to record the error message long term, which is a darn good idea because it might change.

The bottom line is this: Carefully read and remember any fatal error messages, because they give priceless hints as to where the problem might reside.

Read terminal from which epub-viewer was run

Always run ebook-viewer from a terminal. If you're running ebook-viewer from a shellscript, run that shellscript from a terminal, and make sure ebook-viewer is not run in the background (with an ampersand at the end of the command in Linux). The reason is simple: ebook-viewer writes messages to the terminal it's run on, and you want to see those messages. The following text is what you see on that terminal if ebook-viewer prints no errors or warnings to the terminal:

slitt@mydesk:~/epublearn/hello$ ./jj
InputFormatPlugin: EPUB Input running
on /tmp/hello.epub

If you see anything about missing files, or one file not pointing to another, investigate and kill the warning.

The epub-fix program

ebook-viewer is a very, very, very permissive program. It will let you get away with only a warning if you forget to include META-INF/container.xml, for gosh sakes. The ebook-viwer program has all sorts of intelligent defaults that guess right on all sorts of omissions. Don't count on other devices and software letting you slide like that.

You need an ePub checker program much pickier than ebook-viewer. That pickier program is epub-fix, a utility that comes with the Calibre eBook conversion software.

Danger Will Robinson

Although epub-fix claims not to fix the file in-place unless you use specific arguments, my experience has been that at the very least, epub-fix changes the ePub file's modify date. So be sure to run epub-fix on files you don't care about, either because you can compile yourself another copy with your shellscript, or because the ePub you're running it on is a copy of the original.

epub-fix checks everything. Are all required files in the ePub? Do all references resolve correctly? Does your file structure contain files not listed in the Manifest? If your ePub is perfect, epub-fix outputs nothing.

Your ePub isn't ready for prime time until epub-fix outputs absolutely nothing. However, during construction, you'll have extra files like shellscripts and backup files, that are in the directory structure but not in the Manifest. During construction, you can ignore warnings about extraneous files you know about. Fix all other warnings and errors immediately. You don't need a backlog collecting and hanging over your head.

Minimal example

The Minimal Example, sometimes called the Minimal Working Example or MWE, is a hugely powerful troubleshooting tool when dealing with software, whether as a developer or user. Here are some links, which will open in different browser tabs or windows, explaining MWEs:

  1. http://www.tex.ac.uk/cgi-bin/texfaq2html?label=minxampl
  2. http://minimalbeispiel.de/mini-en.html

The reason I don't call it a Minimal Working Example is because the example doesn't work: It errors out. The Minimal Example is the smallest possible file that still reproduces the error symptom. To make a minimal example you copy your entire ePub directory tree to an experimental place, and then start removing things until the symptom toggles. Then put things back until it comes back. As much as possible, try to remove things halfwise, so you keep eliminating half the remaining file with each removal.

Eventually you'll have a tiny example that reproduces the symptom. The root cause will probably be dead-bang obvious just by looking at the minimal example, but if not, the minimal example can be posted online and you can ask for help. Another advantage of the Minimal Example is that, presumably, all your secret and proprietary information will have been removed, so you can ask for help.

A look-inside archiver program

When all is said and done, an ePub is just a .zip file. How do you look inside a .zip file? With an archiver. The archiver needn't (and in fact shouldn't) extract the .zip file, or change it. Just look inside it.

On Ubuntu Linux, I use the File Roller archiver. It's Gui, and you can navigate the entire zipped up structure as if you were in a file manager. You can left-click on any file and choose a program with which to edit it. Just don't save the edits (unless that's your intent). Sometimes you'll look inside other peoples' ePubs either to learn about how they structure things, or to compare theirs to yours. The archiver is the perfect tool to do this.

One other way to use an archiver is to actually change the ePub. Let's say you're writing a conversion program to produce an ePub, and your ePub is throwing an error. You really don't want to do diagnostic tests in your program before you even know what the error means. Instead, you save a copy of the error-throwing ePub, and then, using your archiver, edit, save, and try again, until you've found the cause of the error. Then and only then do you go back to your conversion program and change it so it correctly writes the ePub file.

Compare your ePub to a working one

While researching for this article, epub-fix exited with the following Python stack trace:

slitt@mydesk:~/epublearn/hello$ epub-fix /tmp/hello.epub
Traceback (most recent call last):
  File "/usr/bin/epub-fix", line 19, in 
    sys.exit(main())
  File "/usr/lib/calibre/calibre/ebooks/epub/fix/main.py", line 56, in main
    run(epub, opts, default_log)
  File "/usr/lib/calibre/calibre/ebooks/epub/fix/main.py", line 40, in run
    container = Container(tdir, log)
  File "/usr/lib/calibre/calibre/ebooks/epub/fix/container.py", line 55, in __init__
    raise InvalidEpub('META-INF/container.xml contains no link to OPF file')
calibre.ebooks.epub.fix.InvalidEpub: META-INF/container.xml contains no link to OPF file
slitt@mydesk:~/epublearn/hello$

How does one deal with such a cryptic situation?

I knew this problem was far beyond my technical knowledge, so I fell back on the troubleshooting tactic I've used for the last three decades, on both electronics and software: Compare with a working system.

I had already downloaded a bunch of legally free ePub files here in order to help me understand the ePub architecture. One of the most helpful was "Free Stories 2013", available here. When I ran epub-fix on that book, it didn't crash the way my book did. Take it from a long time Troubleshooter: The minute you have one system displaying a reproducible symptom and another not displaying it, you have that problem in a headlock and can quickly take it to the mat.

Because the stack trace had mentioned container.xml not containing a link to OPF, I compared container.xml files between my eBook and "Short Stories 2013", and found these two differences:

  1. My book had no introductory <?xml version="1.0"?> tag on top
  2. My <container> tag had only one attribute, version. "Short Stories 2013" had one more attribute: xmlns="urn:oasis:names:tc:opendocument:xmlns:container".

I added the <?xml version="1.0"?> line on top, added attribute xmlns="urn:oasis:names:tc:opendocument:xmlns:container" to the <container> tag, and bang, problem solved.

Whenever a problem is kicking your butt, and the whole thing seems hopeless, ask yourself "How can I narrow it down just one more time?" Often, the answer is finding a working system (in this case a working ePub) to compare it to.

Five Levels of ePub Debugging

As I see it, there are five levels of ePub debugging:

  1. Use good coding practices in the first place.
  2. Fix problems that create fatal errors in ebook-viewer
  3. Fix problems that create warnings while running ebook-viewer
  4. Find problems not apparent with ebook-viewer
  5. Test on all major target devices

Use good coding practices in the first place.

ePub documents over 5000 words are much too complex to take lightly. At 50,000 words, you could spend hours or days debugging an ePub.

ePub is a cool format for cool devices, but there are a lot of interlinking references, a lot of XML and Xhtml tags, a directory structure, and big chapter files that could contain errors like leaving out an angle bracket. The opportunity to make a mistake is immense, and once that mistake goes unnoticed for a few minutes, it can become a debugging nightmare. By using good coding practices you can lessen the frequency with which you make errors.

Use a good tool to create your Xhtml. Bluefish is excellent. Vim with addons is less so. WYSIWYG web editors take you on the path to perdition. If you're converting from another format to Xhtml, be sure the conversion tool makes good Xhtml, and even more important, consistent Xhtml. You can always create a postprocessing program to fix up the created Xhtml if it's consistent, but if the created Xhtml varies all over the map, you can't do that.

Use good and consistent naming conventions. Using chap04.xhtml and chap04_page is a lot easier for a human to debug than xr232Fc44_xy or link288.

First, run your individual Xhtml files through a checker to make sure they're error free. Personally, I use the tidy program available with Ubuntu Linux (and probably many other Linux distros). There are lots of checkers and validators. There are many web services that validate your Xhtml online, but of course you can't do that if your document is proprietary or secret. To get a taste for the various checkers and validators, see http://www.w3.org/QA/Tools/.

I've said it before and I'll say it again: Put all files in the Book Root Directory except META-INF/container.xml and file types that become too numerous, like Xhtml files and graphic files. Put those one directory below the Book Root Directory. Simplicity limits mistakes, complexity promotes mistakes.

Resist the temptation to demonstrate your artistry with fonts. Fonts hugely complicate your ePub, make it a bigger file, risk looking bad on devices you haven't tested with, introduce legal jeopardy due to copyright issues, and defeat the wishes of the device designer and the end-reader, who set his device fonts the way he likes them. So forget the fonts, and let the device take care of them. Any appearances you introduce should be the relative type: Size, color, slant, weight, etc. Don't try to outguess the device designer and the human reader unless art pundits regularly call you the next Rembrandt.

Test early and often. Use something like Calibre's epub-fix program to check your ePub, and make sure it throws no warnings except for no-Manifest entries you already know about, like your compile shellscript. Naturally, just before making the final eBook, you'll get rid of those too.

Danger Will Robinson

Although epub-fix claims not to fix the file in-place unless you use specific arguments, my experience has been that at the very least, epub-fix changes the ePub file's modify date. So be sure to run epub-fix on files you don't care about, either because you can compile yourself another copy with your shellscript, or because the ePub you're running it on is a copy of the original.

When you test early and often and find a problem, you can check what you've done during the past half hour and probably fix it. If you don't test for hours or days at a time, you'll need to troubleshoot the entire ePub.

Fix problems that create fatal errors in ebook-viewer

You go to view your newly compiled (re-zipped) ePub, and a dialog box with a Python error pops up. You've just gotten a fatal error. What now? Obviously you have to fix it, but how?

The first thing to do is carefully read the error message, because the error message usually gives an idea of what's wrong. Then, you use all the tools listed in the ePub Troubleshooting Tools subsection of this section. With fatal errors and stack traces, comparing your ePub with a working one often yields the quickest solution, so always have a working ePub available for test, and remember, work off a copy of the working one so if you change it, you can go back to the original when you're done.

Obviously, you must fix fatal errors immediately. You really can't continue when your eBook can't even render.

Fix problems that create warnings while running ebook-viewer

I mentioned earlier you should always run ebook-viewer from a terminal so you can see the messages it sends to the terminal. If it sends anything about a missing file, or an extra file, or anything that looks like an error or warning, try to fix it. You can confirm that it's really a problem by running epub-fix.

It's tempting to just let runtime warnings go. After all, you can still read the eBook, and it looks perfect. Don't do that. Because I guarantee you, somewhere out there exists a device that isn't as forgiving as ebook-viewer, and if you let the warning slide in the final version, the owner of that device will hate you for giving him a no-open ePub.

Once again, use all the tools listed in the ePub Troubleshooting Tools subsection of this section to narrow the root cause scope until you find the root cause, then kill it.

Find problems not apparent with ebook-viewer

Same logic as before. You want it acting right on all devices, not just ebook-viewer. Therefore, run epub-fix on your ePub until it has no errors or warnings other than "file not in manifest" warnings for files you know are there -- shellscripts and backup files and the like. And of course, once you're making the real thing for distribution, delete or move those files so it comes up with no errors or warnings at all: No output. Remember to re-zip every time you make changes: It's easy to forget and wonder why your change made no difference.

Test on all major target devices

If you're distributing your ePub far and wide, be sure to test it on a wide variety of devices. That will reduce unpleasant surprises and annoyed emails from your readers. Also, if you're planning on turning your ePub into a Kindle book or iPad book, be sure to follow their requirements.

Troubleshooting Mindset and Process

I often hesitate to give advice like that in this section for fear some folks will mistake my tips for a complete treatise on troubleshooting the technology at hand (in this case, ePubs). Nothing is farther from the truth: Knowledge of the technology, knowledge of the troubleshooting tools, and a few words of advice don't substitute for employing a good Troubleshooting Mindset and following a good Troubleshooting Process.

Given a couple hours, I could probably find twenty people knowing more about ePub books than I. But I'm considered a world expert on the mindset and process of troubleshooting, so if you haven't read my troubleshooting books, I'd start with either Twenty Eight Tales of Troubleshooting or Troubleshooting: Just the Facts. Either one of those books will help you tremendously, not just for troubleshooting ePubs, but for all your troubleshooting and debugging activities.

Steve Litt is the originator of the Universal Troubleshooting Process. Steve can be reached at his email address.

Conversion Techniques

By Steve Litt

From what my limited research has revealed, there is currently there's no Open Source word processor type interface to directly create an ePub. Sigil is the Open Source that comes the closest I know to that functionality, but Sigil is hugely inconvenient for writing long documents. Calibre is a converter, not a writer.

I've heard that Scrivener (http://www.literatureandlatte.com/scrivener.php) authors WYSIWYG and exports to ePub, according to the preceding URL. It costs somewhere around $45, which is cheap assuming it really does a good job. Unfortunately, it's available only for Windows and Mac, so as a Linux user, I'm out of luck on that one. But it might help you.

BlueGriffon (not to be confused with Bluefish), has an "ePub Edition" (http://www.bluegriffon-epubedition.com/BGEE.html), running on Windows, Mac and Linux, which claims to directly author ePubs in a WYSIWYG environment. I say "claims to", because the only trial edition they have does not output, which is kind of useless as an evaluation, and the full edition costs 195 Euro, which is $267 USD (conversion 12/26/2013). If I knew it would work for me, I'd buy it in a heartbeat, but I don't pay that kind of money to evaluate, even if it were to have a money back guarantee.

You could use the Bluefish Xhtml editor to make your chapter files and other contained Xhtml files. That's how I created the tutorial ePub. But chapter by chapter editing, while keeping up with additions to the Manifest, Spine, Guide, and toc.ncx would slow down a writer who needs to write 2000 words per day to stay on track.

So, for any substantially sized eBook, the most practical way is probably to write the book in a wordprocessor style interface, and then convert that into individual chapter and other files, all cataloged in the Manifest, Spine, Guide, and toc.ncx. Such wordprocessor style interfaces include:

There are all sorts of people ready, willing and able to do that conversion for you for money, and not a princely sum, either. But the quality of their products vary all over the map, and if you need to make a small change, you're stuck with either making the change to both the ePub and the source, or paying them again to again turn your source document into an ePub.

By the way, when I said the quality of conversion vendors' varies, some of the variation is because of the source document. If your source document is 100% styles based, meaning every appearance is specified by styles, the conversion will be fairly easy. Conversely, the more your document depends on inline appearances, such as application of underlining or bold to various phrases without invoking styles, the harder it will be to convert. By anyone or any program.

There are zillions of conversion utilities out there, the majority tuned to convert Microsoft Word documents to ePub. Some are programs you install on your computer, and some are web services. But here's the thing: MS Word documents are not all created equal. There's a huge difference between a rigorously styles-based Word doc, which is fairly easy to convert, compared to a fingerpainted, inconsistent document, which is almost impossible to convert to something a human reader will appreciate.

As a matter of fact, a completely styles-based Word document, with all the right styles, could be converted by a Wordbasic program, or whatever they call those Word macro things these days, to an intermediate form that a reasonably good Python programmer could write a program to convert to an ePub.

Here's the bottom line: Most people have neither the time nor the technical chops to do the conversion themselves, either manually or with a program, or by writing a conversion program. Most people will require outside help, until such time that someone writes a good, solid, flexible conversion program. When you go looking for such help, make sure that:

  1. Your source document is 100% styles based
  2. Make sure your conversion vendor is very, very good

Steve Litt is the originator of the Universal Troubleshooting Process. Steve can be reached at his email address.

Writing an Xhtml to ePub Converter Program

By Steve Litt

I spent about 2 weeks trying to write a flexible and configurable Xhtml to ePub converter. I would write the book as a single Xhtml file in Bluefish, and then run it through my converter to get an ePub. It actually worked on simple books, but I could tell it wasn't written right to scale it to more complex books. Part of the problem is that, when I started writing the program, I had no idea how ePub really worked.

So I wrote this document, as much to make things clear in my mind as to teach others about ePubs. And now that I've finished this document, I'm fairly certain I can rewrite my program, from the ground up, in a much simpler and more scalable way.

Your first decision when you write an Xhtml to ePub converter is whether to use an Xhtml parser or not. The "right" way is to use the parser, and in some respects doing so makes your task easier. For instance, you needn't concern yourself with preprocessing the Xhtml so that tags end up on the right lines. You don't need to restrict the incoming Xhtml nearly as strictly.

But before you say, "yeah, I'll use a parser, that sounds easy", consider the fact that when you use the parser, you need to handle every single tag. If things nest ten deep, you need to make a stack to keep track of the nesting. For every object (basically a tag pair) in the incoming document, you need to handle, calculate, and write that object. If you use a parser, you're not at liberty to ignore a single thing.

I chose to handle it as a text parsing job, and accept the limitations that come with that decision. Here are the major components you need in a text-parsing conversion program:

Converting From Non-Xhtml

Perhaps you want to convert from MS Word or LyX. You'd do approximately the same thing, but you'd have an earlier step of converting Word or LyX to either Xhtml or some intermediate representation of text with styles applied. And of course, you'd need to write CSS styles for each and every style used in the MS Word or LyX program.

If the earlier step outputs Xhtml suitable for the converter described earlier in this section, that's it, you're done. If the earlier step outputs some intermediate program, you'd need to make a program whose scanner and splitter are adapted to that intermediate file format. The configuration manager, the writer, and the zipper would be pretty much the same as with straight Xhtml.

Steve Litt is the originator of the Universal Troubleshooting Process. Steve can be reached at his email address.

Some Final Thoughts

By Steve Litt

I'd guess that 95% of those who produce ePubs after reading this document will use a paid service or a program written by others. That's as it should be, because most authors, business owners, publishers, and others having a financial interest in making ePub files, aren't computer programmers, and don't have the time to manually make an ePub.

Better yet, those who pay others to make their ePubs or use the tools of others will have a much better understanding of ePub quality, so they can pick the right tools and/or vendors. By knowing how ePubs are made, they'll be much more likely to write source documents, whether MS Word, LibreOffice, LyX, Xhtml, or something else, that are easy to convert. They'll understand you can't just toss a PDF into some sort of machine and expect to get back an ePub readers will enjoy reading. When they see a terrible ePub with ridiculous font sizes and dialog whose speakers are indistinguishable due to layout, they'll understand how that came to be, and if that ePub was given to them by a vendor, they'll give it back, tell the vendor to do it right, and tell the vendor how to do it right.

Now I'll address the five percent of you who go on to make your own ePub creator or create ePub files by hand or with Sigil. Take it from a guy who tried and failed, you cannot make an ePub creating program if you don't know exactly how ePubs are built. If you don't know the internals, your program just grows and grows, as you write more code to cover this and that eventuality. And if you create your own, which, as you've seen from this tutorial, is not impossible, you'll know how to make it, and how to troubleshoot it when it doesn't work the first time.

Getting back to writing your own creator program: Now you know that you parse the input document into pages, and ringtoss those pages into Xhtml documents, and lists for the Manifest, Spine, Guide, and toc.ncx. Yeah, it's more complicated than that, but that's the basic idea, and going about it otherwise puts you on the struggle bus.

And no matter your technology prowess and time constraints, if you need to write stuff, you need to get familiar with ePubs. ePubs are the Rosetta Stone of flowing-text documents. Almost every device can read them, and almost every converter can convert them to other formats, such as Kindle Mobi and the like. You know that, in 2014, declining to make ePubs because they're "too much of a hassle" will severely limit your readership, and that's something you can't afford.

You're now armed with the information necessary to produce ePubs, whether you, others, or tools do the actual construction. Go get em!

Steve Litt's plans for 2014 include producing a new ePub book every two weeks. In order to do that, he is currently creating a one-command converter to convert from multi-chapter Xhtml to ePub, with little or no human intervention. Steve can be reached at his email address.