Troubleshooters.Com, T.C Linux Library and Litt's LyX Library Present

makeindex Options
Including Minimum Range Sizes

Copyright (C) 2008 by Steve Litt, All rights reserved. Material provided as-is, use at your own risk.


Contents

A Funny Thing Happened While Indexing

So there I was, minding my own business, indexing my latest book. Because I was indexing on content rather than from a concordance (list of unique words in the document), many sections and subsections were index ranges. In other words, they had an index start code at the end of the section or subsection title, and an index stop code at the end of the section's or subsection's text.

This is really the best way to index, because if I later add 17 paragraphs to the section and recompile the document, the index reflects that the section now spans several pages. However, there's a problem...

More than 50% of the lines in my finished index had ranges. The vast majority of the ranges were 2 pages, like this:
your first C program, 43-44
Of course, those entries weren't even one page long. They were just short sections that happened to fall across page boundaries. They ugly-up the index. I decided I wanted such 2 page ranges to look like this instead:
your first C program, 43
So I wrote a Ruby program to go through the mydoc.idx file, spitting out entries with ranges 2 or less pages, and began to manually change every range, within the LyX document, to non-range single entries.

After 3 hours of this unpleasant work, it occurred to me that makeindex already printed just a single number when the beginning and ending entry were on the same page, and the makers of makeindex had to be smarter than hard coding the minimum range size. So I spent some time with the makeindex man page and search engines. The result was not only finding a way to suppress two page ranges, but finding ways to change almost everything about your index. Read on...

Executive Summary

LyX and LaTeX indexes are created by the makeindex program. This program has many handy defaults. When those defaults don't work for you, you can create a "style file" to configure makeindex. You then tell makeindex to use that style file using the -s option, like this:
makeindex -o test_index.ind -s mybookis.ist mybook.idx
In the preceding, mybook.idx is a no-format list of index entries created by running latex on your book's LaTeX file. The test_index.ind file is the LaTeX output file to create a nicely formatted index. If you don't use the  -o option, makeindex would default the output file to mybook.ind -- it simply changed the input filename's extension to .ind. The makeindex program also outputs a log file, that defaults to the input filename with the index changed to .ilg. This is handy for debugging. The  mybookis.ist file is the style file with which you configure makeindex. The style file is a list of key/value pairs like this:
keyname  "value"
The value can be on a line below the key name, and the value can take up several lines.

In other words, you can change the format of the index just by adding/changing key/value pairs in the style file.

As mentioned, makeindex does one thing: it reads the index input file (.idx) and outputs a LaTeX file that prints out an LaTeX file containing mostly \item and \subitem and \indexspace commands. When run through LaTeX, this file produces a formatted index conforming to the contents of the .idx file.

To summarize, the first latex run on the book produces the .idx file, which contains all index information but nothing about how that information will be formatted. Then makeindex is run on the .idx  file to create a .ind LaTeX file containing not only index information, but formatting. The second latex run then formats the .ind LaTeX file into a .dvi suitable for inclusion at the end of the book.

LaTeX Index Commands

The following LaTeX commands comprise the majority of what you need, in a LaTeX source file, to implement indexing:
COMMANDLOCATIONEXPLANATION
\usepackage{makeidx}Document preambleIncludes code necessary for indexing functionality
\makeindexDocument preambleThis command generates the .idx file when the latex command is run on the book's LaTeX file.
\index{atoms}MainmatterSingle page index reference under name atoms
\index{atoms!carbon}MainmatterSingle page index reference to carbon subcategory of atoms. A two level index entry.
\index{atoms!carbon!diamond}MainmatterSingle page index reference to diamond subcategory of carbon subcategory of atoms. A three level index entry. Indices can go only three deep.
\index{subjectname|(}MainmatterBegining of range index reference under name subjectname. Also works with multilevel index entries.
\index{subjectname|)}MainmatterEnd of range index reference under name subjectname.Also works with multilevel index entries.
\printindexBackmatterInsert the index here.

The preceding are LaTeX commands. LyX provides special provisions to insert them in the LyX file. The \printindex command is inserted in LyX with Insert->List/TOC->Index_list. All the \index commands are inserted with Insert->Index_entry, in which case when you click on the inset, you insert everything between the curly braces.

Hello World

Start by making a small book, with a functioning index, in LyX. Refer to the preceding article on how to create this index-enabled LyX document. The document should have several index ranges -- some so short as to be on one page, some a little longer to span a page boundary, and some long enough to go on several pages.

Next, make a script to compile your LyX file. Yes, you could use LyX's View->PDF function, but that often doesn't reflect the latest index build, thereby causing erroneous results. So it's much better to create a script to compile the LyX file. Here's the script, set up for lyx-1.5.3:. If your LyX executable is called

#!/bin/bash
StyleFile=mybook.ist

rm -f $1.aux
rm -f $1.dvi
rm -f $1.ps
rm -f $1.pdf
rm -f $1.idx
rm -f $1.ilg
rm -f $1.ind
rm -f $1.log
rm -f $1.tex
rm -f $1.toc

lyx-1.5.3 --export latex $1.lyx

latex $1.tex

if test -f $StyleFile; then
makeindex -s $StyleFile $1.idx
else
makeindex $1.idx
fi

if grep -qi "error" $1.ilg; then
echo ERROR: Inspect $1.ilg and $1.ind!
gvim $1.ilg $1.ind
echo ERROR: Inspect $1.ilg and $1.ind!
else
latex $1.tex
latex $1.tex
dvips -o $1.ps $1.dvi
ps2pdf $1.ps
xpdf $1.pdf &
fi
Bash script
Name the style file




Delete all
intermediate files






Export from LyX to LaTeX (LyX 1.5.3)

Compile LaTeX and create .idx file


Run makeindex with the style file
if it exists, without it otherwise



If error messages in .ilg log,
show log and .ind file


otherwise,
compile to dvi
do it again just in case
convert dvi to postscript
convert postscript to pdf
view the pdf

My system's LyX command is lyx-1.5.3. Yours will almost certainly be lyx. If so, adjust the script accordingly.

About the multiple latex runs: The first one creates the .idx file, the second one compiles from LaTeX to DVI, and the third one is probably just superstition but I put it in anyway.

If makeindex runs without errors, PDF creation proceeds. Otherwise, you're shown the makeindex error file (.ilg).

Run the script like this:
./mktest.sh rjust
Where mktest.sh is the script and rjust is the LyX file's filename minus the .lyx extension. Look at the resulting PDF, and note the index. Make sure you have some single page entries, some ranges that go past one page border (i.e 11-12), and some ranges that go multiple pages (i.e. 23-27).

So far you've done nothing but compile a default index. Now let's take it to the next level...

Use a Style File to Modify the Index

Create the following mybook.ist and try it again.

suffix_2p "and1more"
delim_r "to"

If you get errors in the .ilg file, reference the line numbers and look at the .idx file. If there are no errors, look at the index in the resulting PDF and notice that former ranges of 1 (12-13) now say 11and1more, while multiple page ranges now look like 80to84. Ranges that don't span page boundaries, and index entries not involving ranges, continue to say only the page number.

What suffix_2p and delim_r do

The delim_r option tells makeindex what string to put between the starting and ending page number in an index range. It's the range delimiter. It defaults to a double dash, which latex translates into a long dash. The suffix_2p option tells makeindex, in the case of an index range spanning exactly one page boundary, what string to substitute for the range delimiter and the ending page number.

In typical usage, delim_r would be a long dash, which results from two ascii dashes in the style file. In the case of suffix_2p, typical usage would be an empty string, which would be produced by a single space in the style file.

Inserting Spaces in the Delimiters

If you try to put spaces in the strings "and1more" and "to", so they look like "23 and one more" or " to ", those spaces don't appear in the PDF. In order for the spaces to appear in the PDF, they must be preceded by backslashes. In order for backslashes to be recognized, they themselves must be escaped by backslashes. So you use double backslashes as shown:

suffix_2p "\\ and\\ 1\\ more"
delim_r "\ to\\ "

The resulting PDF index might contain an entry like this:
Wikipedia, 23 and 1 more, 67 to 69

Useful Values for suffix_2p and delim_r

The preceding values were good for debugging, but for an actual book you'd want 2 page ranges to simply show up as a single page, and multipage ranges to show up with a long dash between them. Here's how you do it:

suffix_2p " "
delim_r "--"

The suffix_2p single space degenerates to no spaces (an empty string) in the PDF, but if you'd put an empty string in the style file, it would have acted as if there were no suffix_2p option. The two dashes in delim_r are converted to a long dash in the output.

Style File Construction

So far we've discussed a specialized style file that slightly modified ranges. Style files can make the makeindex program jump through incredible hoops.

The makeindex program does exactly one thing: It converts a format independent index listing (the .idx file) to a formatted form (the .ind file) useful in creating the final output.

The input file typically exists in a customary syntax. That customary syntax is partially described in the LaTeX Index Commands article. However, input files of very different syntaxes could be read by makeindex simply by changing the input file specifiers in the style file. The formatting of the output file (the .ind file) can be changed by changing the output file specifiers in the style file. The suffix_2p and delim_r options discussed earlier were output file specifiers.

You can read the input file specifiers in the makeindex man page. Their default values are what LaTeX puts in its .idx file if you have an index.

The output specifiers are much more useful for the LyX person. You could use them to radically change the look of your index. You could use them to create a separate LaTeX file for the index, instead of  including the index in the book PDF itself. In that case you'd need a document class, a start document, and perhaps a change of page numbers for the index PDF.

You can even use makeindex for non-LaTeX work. For instance, the makeindex man page has an example using output file specifiers to create a troff index source file. An example later in this article details the use of makeindex to create an HTML format of an index.

Example: Separate LaTeX File

The following style file enables makeindex to create a .ind file suitable for separate latex compilation:

preamble
"\\documentclass[12pt,english]{book}
\\usepackage{makeidx}
\\usepackage{hyperref}
\\begin{document}
\\begin{theindex}
{\n"

postamble
"\n\n}
\\end{theindex}
\\end{document}\n"

suffix_2p " "
delim_r "--"

Notice that the makeidx and hyperref packages must be included so that commands like \item and \indexspace are recognized.

In the output, everything in the preamble is printed first, then the index entries are printed, and then the postamble is printed. The result is a complete LaTeX file suitable for compiling to DVI (and then on to PDF).

HTML Format Index

Just to make sure the concept is clear, let's use makeindex and the style file to make the index show up in an HTML document. Here's the necessary style file:

preamble
"<html><head>\n
<meta http-equiv=content-type content=\"text/html; charset=UTF-8\">\n
</head><body>\n
<big><big><b>INDEX</b></big></big><br>&nbsp;<br>\n"

postamble
"\n</body></html>\n"

suffix_2p " and1more"
delim_r " to "

group_skip "&nbsp;<br>&nbsp;\n"
item_0 "\n<br> &nbsp;"
item_1 "\n<br> &nbsp;&nbsp;&nbsp;"
item_01 "\n<br> &nbsp;&nbsp;&nbsp;"
item_x1 "\n<br> &nbsp;&nbsp;&nbsp;"
item_2 "\n<br> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"
item_12 "\n<br> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"
item_x2 "\n<br> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"
Begin HTML doc as usual
Print INDEX followd by blankline




Finish HTML doc as usual


Diagnostic 2 page range
Diagnostic range delimiter

Put <br> between letter breaks
Use nonbreaking spaces as indent


Now compile it to html like this:
cat rjust.idx | sed -e "s/hyperpage//" | makeindex -s mybook.ist -o temp.html
You need to remove from the input file the "hyperpage" commands inserted by the hyperref package. Then the conversion goes as expected and can be viewed in a browser. Note that you still get some \see and \seealso commands, but those can easily be tweaked to their correct format.

The point is, with a little sed and by changing a few variables in the style file, we were able to change the index from a LaTeX file into an HTML file. 

Summary

The makeindex program converts a non-formatted .idx index file to a formatted .ind index file. The .ind file can be formatted for absolutely anything: LaTeX, HTML, XML -- anything. It can also be formatted to use different types of separaters, indentation, and the like.

The formatting choices are governed by a "style file" which traditionally has a .ist extension, but can be named anything. The style file is incorporated in the makeindex run like this:
makeindex -s mybookis.ist mybook.idx
The default output file is the .idx filename, but with an extension .idn instead of .idx.

The style file is comprised of key/value pairs, where the value is enclosed in doublequotes. The value needn't be on the same line as the key, and the value can span multiple lines. Style files have two types of keys:
  1. Input file specifiers
  2. Output file specifiers
Input file specifiers describe the syntax of the .idx file, so that if an application writes a nonstandard .idx file, makeindex can still use it. Output file specifiers describe the syntax of the .ind file, enabling you to change the appearance of the formatted index, or even output HTML, XML, troff and the like instead of LaTeX.

When an index item has a range (|( and |)) and happens to span one page boundary, many people prefer for it to show up under the beginning page rather than a range. This can be accomplished with the suffix_2p output file specifier.


Back to Troubleshooters.Com * Back to Linux Library * Litt's LyX Library