Troubleshooters.Com®, Linux Library and T.C Nginx Essentials Present:

Triple Site Example

CONTENTS

Introduction

This document describes an nginx setup that associates domain litt.cxm with three web pages and no others. On disk, these three web pages are in very different directories. An additional convenience is that the part of the URL past the domain name needn't have any relationship to the web pages brought up, so they can be short and memorable. By design, any other URL besides the three specified sends the user to a customer error page.

One disadvantage of this technique is that each of the web page files must have a different filename, even though they're in different directories. That's not a showstopper because hard or symbolic links to named-alike files fixes it.

This technique uses nginx's rewrite functionality. There are probably a million other ways to do it, but this way is fairly clean and is an excellent showcase for writes and same-site/different directory/different filename name switches.

The main purpose of this document is demonstrating rewrites, locations, and same-site/different directory work, but it could actually come in handy on a LAN where you want employees (or children) to have a very narrow choice of files, hosted on the LAN, to view.

Anatomy of a Rewrite

The following is the anatomy of an nginx rewrite statement:

rewrite <regex> <replacement> <flag>

The <regex> is just what it says, regex, and it has two functions:

  1. Match or not match the incoming URI. If it doesn't match, the rewrite is not performed.
  2. Via use of parentheses in the <regex> and $1 and $2, etc, in the <replacement>, the <replacement> can be influenced by the <regex> match.

#2 above is not used in this document. In every rewrite statement in this document, selection is done by location so the <regex> needs to match every time. This is done by using / for the <regex>.

You can put together the <replacement> pretty much any way you want, and it will serve as the part of the URI, including the leading slash, following the domain name.

Now about the <flag>. It can be absent, or it can be one of thefollowing four values:

  1. last
  2. break
  3. redirect
  4. permanent

Only the first two are applicable to this document. #1 stops looking for any other rewrite statements and starts looking for a new location that matches the changed URI (the <replacement>. #2 stops looking for any other rewrites, and continues looking for other things in its current block. Rewrite statements are often used one right after another, which is why you need #1 and #2 to stop viewing rewrites. If you want to continue looking at a sequence of rewrites, don't use a flag.

Remember the difference between break and last: break quits dealing with

This is beyond the scope of this document, but two or more rewrites can be used, in a row, to do some very powerful things. In such cases, you often don't use flags because you want the cascade of rewrites. However, when you have a single rewrite inside a location, you'll usually want either a last or break for the flag.

Now that you have some understanding of rewrites, let's look at the code...

The Code

server {
	listen 80;
	server_name litt.cxm;

	### CATCH INCOMING URL AND REWRITE
	location = /todo {
		rewrite / /todo.otl last;
	}
	location = /links {
		rewrite / /index.html last;
	}
	location = /tjunct {
		rewrite / /troubleshooters.htm last;
	}

	## CATCH REWRITTEN URL AND SET PATH AND FILES
	location = /todo.otl {
		root /bizzdata/otl/todo;
		types {
			text/plain otl;
		}
		
	}
	location = /index.html {
		root /bizzdata/websites/littlinks;
	}
	location = /troubleshooters.htm {
		root /bizzdata/websites/tjunct;
	}

	## CATCHALL FOR WHATEVER DIDN'T MATCH ABOVE
	location / {
		rewrite / /bad_litt_cxm.html;
	}
	location = /bad_litt_cxm.html {
		root /bizzdata/websites/noip;
	}
}

Code Explanation

The code starts by looking up exact location matches on todo links and tjunct.

They're exact matches because of the equal signs, so at most one of the three can be a location match. If one of them is a location match, then a rewrite within that location block specifies a URI with only one slash: The leading one. If none of them matches, the catchall location, which is a prefix match on a single slash, matches and handles the situation. But if one of the three top locations offers an exact location match, then its rewrite changes the URI to a single slash URI, which will turn out to be a filename within an (at this point) arbitrary directory.

Notice that first rewrite ended in last, which means it starts looking for other locations that match the new URI. The next three locations are exact matches for three possible URIs from the first set of locations with rewrites. Because everything in the first three locations and the second three are exact and hard coded, if one of the three first locations is a match, the corresponding one of the second will also be a match. Each of those second three locations does just one thing: Sets a root. If a file with the location's incoming URI matches a file in that root directory, it's displayed. Pretty cool, right?

One minor problem is all the rewritten URIs must have unique values, and yet must correspond to a filename. And there are lots of index.html files in a website hierarchy. One way to get around this is to use symbolic or hard links to give files new names. Another is to use one of the two alternate setups revealed in this document.

VITAL INFORMATION:

Location blocks aren't necessarily tried in sequential order. Exact matches take precedence over prefix matches. There's actually a complete precedence order for nginx locations, but for this document it suffices to understand that an exact match always takes precedence over a prefix match.

To flesh out the preceding note, the way locations work is that the first exact location match found "wins". Location searching stops, and the exact match's location is executed. There's only one exception: If a rewrite within the exact match location's block has last for a flag, then a brand new location search is performed. With this in mind, let's examine the "catchall" situation.

If none of the first three locations offered a location match, none of the second three will either. This leaves two locations: The one with a prefix match on a single slash, which will always match, or the one looking for /bad_litt_cxm.html. A match on either of these locations displays the same web page, but the usual flow would be for the single slash prefix match.

The location that matches a single slash doesn't have an equal sign, or any symbol at all, which therefore makes it a prefix match. Its block won't be entered if there are any exact matches at all, which is what makes it a catchall. This single prefix block rewrites the URI to /bad_litt_cxm.html, If it matches the first part of the incoming URI, it's a match. Because every normal incoming URI will start with a slash, this location will always match. It does a rewrite to change the URI to /bad_litt_cxm.html. Because that rewrite has a last flag, starts another location search, which triggers an exact match on the /bad_litt_cxm.html location. The location block for that exact match changes the root to the directory containing /bad_litt_cxm.html, and that error page is displayed.

One more thing to point out. Note that the todo.otl location contains types {text/plain otl;}. This is simply because this is neither a text file nor an HTML file nor anything else standard, so the preceding statement sets things so the browser interprets the file as plain text.

So Far So Good

Hey, pretty cool. The nginx server coded previously routes three easy URLs to three files in completely different directories, with no relation between the URL and the filename. If you start looking deeper, however, there are flaws:

  1. No two website files can have the same name, even if they're in different directories.
  2. If you really fool around with this setup, you'll find certain URLs can be used as sort of "back doors" to the three URLs described. For instance, http://litt.cxm/todo.otl brings you to the same page as http://litt.cxm/todo.otl. This isn't harmful, because this won't ever bring up a web page file not meant to be brought up, but it's a flaw.
  3. It's longer and more complex than it needs to be.

In spite of those three flaws, this was a pretty good demonstration of nginx rewrites inside of location blocks. Now let's cure the first of the three identified problems:

Getting Rid of the Same Filenames Problem

If you remember, the previously coded server block would have done wrong things if any two or more of the web pages had the same filename, even though in different directrories. The following server block gets rid of that problem:


server {
	listen 80;
	server_name litt.cxm;


	### CATCH INCOMING URL AND REWRITE
	location = /todo {
		rewrite / /todo/todo.otl last;
	}
	location = /links {
		rewrite / /links/index.html last;
	}
	location = /tjunct {
		rewrite / /tjunct/troubleshooters.htm last;
	}

	## CATCH REWRITTEN URL AND SET PATH AND FILES
	location = /todo/todo.otl {
		root /bizzdata/otl/todo;
		rewrite / /todo.otl break;
		types {
			text/plain otl;
		}
		
	}
	location = /links/index.html {
		root /bizzdata/websites/littlinks;
		rewrite / /index.html break;
	}
	location = /tjunct/troubleshooters.htm {
		root /bizzdata/websites/tjunct;
		rewrite / /troubleshooters.htm break;
	}

	## CATCHALL FOR WHATEVER DIDN'T MATCH ABOVE
	location / {
		rewrite / /bad_litt_cxm.html last;
	}
	location = /bad_litt_cxm.html {
		root /bizzdata/websites/noip;
	}
}

In the preceding server block, the first three location blocks have the rewrites create URIs that are the concatenation of the incoming URI, then a slash, and then what will be a filename. In other words, the change from the previous server block is that this one prepends the simple URI before the filename.

So that the second three locations are matched, each of their match lines has the simple URI prepended also. Finally, in each of the second three, a rewrite with a break flag eliminates the prepended short URI so the new URI is just the filename. Now, even if every single one of the filenames were index.html, the prepended short URI would make each unique.

Unfortunately, this one is even more complicated than the first, and it also doesn't get rid of the second problem. The URI http://litt.cxm/todo/todo.otl works exactly the same as http://litt.cxm/todo/todo.otl.

More refinement is needed.

The Real Deal

This section rolls out a nice, simple

server {
	listen 80;
	server_name litt.cxm;


	### CATCH INCOMING URL AND DISPLAY
	location = /todo {
		root /bizzdata/otl/todo;
		rewrite / /todo.otl break;
		types {
			text/plain otl;
		}
	}
	location = /links {
		root /bizzdata/websites/littlinks;
		rewrite / /index.html break;
	}
	location = /tjunct {
		root /bizzdata/websites/tjunct;
		rewrite / /troubleshooters.htm break;
	}

	## CATCHALL FOR WHATEVER DIDN'T MATCH ABOVE
	location / {
		root /bizzdata/websites/noip;
		rewrite / /bad_litt_cxm.html break;
	}
}
  1. The preceding works with same-named files in different directories.
  2. I've found no way to make a backdoor URL
  3. It uses five locations instead of nine.

Because the break flag keeps execution in the location, the rewritten URI, which is really a filename, can be found in the locations root. This version of the triple site can actually be practical.

Wrapup

This document exhibited three different nginx server blocks which basically did the same thing, each with less flaws than the previous. If you go through this document, you'll have a good working knowledge of the rewrite facility using either the last or the break flag. You'll also have at least a fighting chance with the redirect and permanent flags, as well as sequential rewrites with no flag at all.

Now that you've completed this document, you also have a good working knowledge of nginx location blocks.


[ Training | Troubleshooters.Com | Email Steve Litt ]