Troubleshooters.Com Presents

Linux Productivity Magazine

Volume 2 Issue 1, January 2003

Copyright (C) 2003 by Steve Litt. All rights reserved. Materials from guest authors copyrighted by them and licensed for perpetual use to Linux Productivity Magazine. All rights reserved to the copyright holder, except for items specifically marked otherwise (certain free software source code, GNU/GPL, etc.). All material herein provided "As-Is". User assumes all risk and responsibility for any outcome.

See also Troubleshooting Techniques of the Successful Technologist
and Rapid Learning: Secret Weapon of the Successful Technologist
by Steve Litt

[ Troubleshooters.Com | Back Issues | Troubleshooting Professional Magazine ]

Linux is a cancer -- Steve Ballmer
(during a Chicago Sun Times interview )


Editor's Desk

By Steve Litt
Big iron's not what it used to be. In the 1970's a professional computer programmer couldn't afford even the most mundane of his programming tools. By 1988 the PC revolution commoditized many software development tools, enabling a generation of kitchen table programmers to make their living creating vertical market apps on their home PC. But as the Wang commercials of that year so obnoxiously pointed out, enterprise development still required technology most computer programmers couldn't afford.

By 1996 Borland, Microsoft and Powerbuilder/Sybase gave the kitchen table programmer tools to build an enterprise app, including a "DBMS" which, while good enough for testing, was limited to a handful of connections.You could buy the development tool for $400 to $2000, but to deploy it you'd spend a fortune on licenses for DBMS and other middleware.

Then came Linux.

One by one, technologies unaffordable to individual software developers became free. Compilers, network technologies, and even PostgreSQL -- a professional grade DBMS.

PostgreSQL, often nicknamed "Postgres", offers SQL query support. It offers a high power command line front end called psql for a DBA to manage the database. It has excellent transactional support. Sophisiticated triggers and stored procedures are available thanks to the built in plpgsql programming language. Postgres is rich in datatypes, as shown by the following *partial* list:
For a more complete list, perform the \dT command within psql (explained later in this magazine).

Postgres has native hooks to most of today's important computer languages, including C, C++, Perl, Python, PHP, Java and Tcl/TK. If you're working in a language not directly supported, don't worry, Postgres has ODBC support.

You can find excellent PostgreSQL documentation at There are books on PostgreSQL, including the outstanding "PostgreSQL Developer's Handbook" by Ewald Geschwinde and Hans-Jurgen Schonig.

TCO (Total Cost of Ownership) is all the rage these days, so consider the following PostgreSQL assets:
The final assertion in the preceding list deserves some scrutiny. I have no first hand knowledge that you can get a Postgres guru cheaper than an Oracle guru of the same caliber. But it seems reasonable. Postgres has absolutely no financial or license barrier to entry. A technologist can install a Linux distribution and begin working with Postgres in conjunction with C, C++, Perl, Python, PHP, Java, Tcl/TK, or even shellscripts (via backticks and the psql utility).

Postgres can run ubiquitously -- pretty much anywhere in the Lintel or Wintel/Cygwin worlds, as well as Mac OS X. Also supported are IBM RS6000 with AIX, Alpha with various Unices, Sparc with various Unices, and other platforms.

Postgres isn't for everyone. You'd probably want your enterprise wide, international, million transaction per day app to be in Oracle or DB2. The customer can afford it, and those DBMS's have a track record with such big iron apps.

But for a smaller app, Postgres is an easy and free way to step in into the world of professional DBMS's.

So kick back, relax, and learn how to make PostgreSQL work its magic. And remember, if you use Open Source or free software, this is your magazine.
Steve Litt is the author of Samba Unleashed.   Steve can be reached at Steve Litt's email address.

PostgreSQL Intro

By Steve Litt
PostgreSQL is packaged with most Linux distributions, including Mandrake and Red Hat. Simply select it when installing Linux. Or use the rpm utility to install it after the fact (Debian users use apt-get). Make sure service postgresql is set to run automatically (use the Linux Control Panel or similar tool to do so).

PostgreSQL is a professional strength DBMS with triggers, stored procedures, a programming language, and naturally full SQL support. Although it costs nothing, for moderate sized projects it replaces DBMS products costing tens of thousands of dollars or more.

The Postgres system consists of three top level components:
  1. Backend
  2. Postmaster
  3. Frontend
The backend is the interface to the database itself. The backend is what handles SQL queries. Every SQL query spawns a new backend process.

The frontend is what interacts with the user. It could be a PHP or Perl program, or it could be the psql data administration utility. When you create a frontend, you use an API to make actual calls to the backend. There are such API's for all major languages -- sometimes multiple API's for a single language (Perl). The API's are free.

The Postmaster is the supervisor daemon. When a frontend process requests a connection, the Postmaster sets up a backend process to handle the connection. After that, the frontend/backend pair cooperate without the need for further Postmaster intervention.

The system looks something like this:
Architecture diagram

Most common SQL is covered, and the Postgres SQL is pretty much industry standard. It conforms mostly to the ANSI SQL92 standard, and PostgreSQL developers are working to make it totally ANSI SQL92 compliant. As an old Sybase programmer who also has some Informix experience, my impression is that I can do pretty much anything with Postgres that I can do with Sybase, and SQL statements are pretty much portable between the two. If you know Sybase, you'll be able to find your way around Postgres. I'd imagine Oracle, DB2 and SQLServer developers will find Postgres easy to understand, and will easily be able to port their code.

PostgreSQL is licensed with the BSD license, which makes it free software in the "free as in speech" contect as well as "free as in beer".

PostgreSQL has the following limitations:

Limitation type
Postgres Limitation
Other limiting factors
Database size
Limited by disk space and RAM
Table size
64 Terabytes
Limited by disk space and RAM
Row size
Limited by table size
Limited by disk space and RAM
Column size
1 Gigabyte
Limited by disk space and RAM
Rows in table
Limited by disk space and RAM
Columns per row
Limited by disk space and RAM

Bottom line -- Postgres is a high performance, full featured professional DBMS that is free of cost and license hassles.
Steve Litt is the author of " Troubleshooting Techniques of the Successful Technologist".  Steve can be reached at Steve Litt's email address .


By Steve Litt

GNU/Linux is comprised of the Linux kernel originally crafted by Linus Torvalds, plus many, many utilities, a large number of which were utilities from the original GNU project. "GNU/Linux" is probably the most accurate moniker one can give to the operating system. Please be aware that in all of Troubleshooters.Com, when I say "Linux" I really mean "GNU/Linux". I completely believe that without the GNU project, without the GNU Manifesto and the GNU/GPL license it spawned, the operating system the press calls "Linux" never would have happened.

I'm part of the press and there are times when it's easier to say "Linux" than explain to certain audiences that "GNU/Linux" is the same as what the press calls "Linux". So I abbreviate. Additionally, I abbreviate in the same way one might abbreviate the name of a multi-partner law firm. But make no mistake about it. In any article in Linux Productivity Magazine, in the whole of Troubleshooters.Com, and even in the technical books I write, when I say "Linux", I mean "GNU/Linux".

There are those who think FSF is making too big a deal of this. Nothing could be farther from the truth. The GNU General Public License is the only reason we can enjoy this wonderful alternative to proprietary operating systems, and competition from Free Software is the only reason proprietary operating systems aren't even more flaky than they are now. Last but not least, it's significant to note that in Infoworld's October 6, 2000 article entitled "E-business innovators", author Mark Leon named GNU's Richard Stallman as the innovator associated with GNU/Linux.

Steve Litt is the author of Rapid Learning: Secret Weapon of the Successful Technologist. He can be reached at Steve Litt's email address.

Postgres Installation

By Steve Litt
By far the easiest way to install Postgres is to grab a box, install Linux on it, and when you pick packages to install, pick PostgreSQL. Also remember to install Perl, Python, and PHP so you can build front ends.

On an Intel or Intel clone computer with Linux, if Postgres isn't already installed, install it with a package manager such as RPM.

On other types of boxes, consult documents at Compiling Postgres is beyond the scope of this Linux Productivity Magazine issue.
Steve Litt is the author of the Universal Troubleshooting Process courseware.   Steve can be reached at Steve Litt's email address.

The psql Environment

By Steve Litt
Most of your remaining work will be done in the psql environment, with a few bash shell commands thrown in for good measure. The psql environment is a command line environment complete with history like the bash shell. The readline library is completely integrated into psql. You must enter the psql environment for a specific database. Therefore psql must have at least one argument -- the name of the database. Because Postgres ships with a database called template1, you can always run psql as follows:
psql template1
If you enter psql as template1, be sure not to change template1 in any way, because it's the template for created databases. It's far better to create a database with the createdb shell command, and then run psql on that new database.

Within the psql environment you'll encounter a prompt showing the database name, an equal sign, and then a pound sign. You may type commands from that prompt. You can recall prior commands with the uparrow key, similar to the history feature of the bash shell.

Before running psql you need to remember four commands, two of which have synonyms (synonyms in light brown background):
Quit psql
Ctrl+d represents EOF, which terminates the psql session
Quit psql
A harder to remember synonym for Ctrl+d
Help on psql commands
The psql environment provides a rich set of commands, all starting with a backslash. \? lists all the internal commands and their function. If you're not running psql in a terminal equipped with scrollbars, you can pipe the help to the less command as follows:
psql -c "\?" mydb | less
Naturally, you could also redirect it to a file. But bottom line, for an easy life in psql, use a terminal emulator with a scrollbar so you can interactively get help.
Help on SQL commands
This gives you help on SQL commands. It's a mini SQL reference. A simple \h lists all possible SQL commands. From there you can "drill down" by appending the desired command to the \h in order to get syntax help on the particular SQL command. Example:

\h create table

A scrolling terminal emulator is a MUST, because the help text will scroll off the screen.
Terminate an SQL command
In the psql environment, no SQL command is executed until given a semicolon. The semicolon can be on the end of the SQL command, or on a following line, but until it appears the command will not be run.
Terminate an SQL command Synonym for semicolon

Armed with the preceding 4 commands, you can do almost anything in psql. With these four commands you can operate psql without memorizing anything. However, to make life easier, here are a few other heavily used psql commands you might want to memorize:

List user tables
User tables are those created for the application at hand, not those used by Postres to define the database structure
List views

List System tables
System tables are tables used by Postgres to define database structure. System table names all start with pg_.
\d tablename
Show table structure
This shows the columns and their types. It does NOT show their contents.
select * from tablename;
Show table contents
This shows all the columns in all the rows for tablename.
Execute commands from a file
This is the easiest way to create and recreate triggers and stored procedures. Code the entire procedure in a text editor, and then import it with the \i command. This also assures code security because trigger and stored procedure source code won't be erased if the database is dropped or corrupted.
Connect to a different database
Syntax is:
\c[onnect] [DBNAME|- [USER]]

So far the discussion has centered on using psql interactively. An interactive tool is nice, but much of psql's power is derived from its ability to take SQL and psql commands in stdin and produce the result in stdout.

For instance, you could generate a comma delimited, doublequote enclosed list of all the first and last names of the people in the people table with the following shell command:
echo '\t\a\f ","\\ select fname,lname from people order by lname' | psql mydb -q | sed -e 's/\(.*\)/"\1"/'
Combined with the backtick operator featured in most computer languages, this is a way to access the database even from a language that for some reason supports neither Postgres nor ODBC. You could build quickie database utilites based around the psql command.The psql program is so powerful that the possibilities are almost unlimited.

The psql man page is your friend. View it either with the man command, or by placing the man:psql url in the Nautilus brower.
Steve Litt is the author of the course on the Universal Troubleshooting Process.  He can be reached at Steve Litt's email address .

Postgres Configuration

By Steve Litt
You configure Postgres three ways:
  1. With commands from the command line
  2. From within the psql utility
  3. By editing the pg_hba.conf file
You can usually substitute work within the psql utility for command line commands, but often the command line commands are easier to remember. This article attempts to give you the easiest ways to perform your Postgres configuration activities.

First, make sure that Postmaster runs at bootup. This can be controlled by the Linux Control Panel or other app used for daemon startup. Unless Postmaster is running you'll not be able to configure or use Postres. Find out if it's running like this:
ps ax | grep postmaster
If it reports /usr/bin/postmaster you're all set.

To run Postmaster, log in as root and run the following command:
service postgresql restart
If your distro doesn't have the service command, do this:
/etc/rc.d/init.d/postgresql restart
The preceding both work on Mandrake Linux versions 8.2 and 9. On your distribution, find the directory for service startup scripts, and run the one that runs Postgres.

Next, you need the ability to log into your Linux or Unix box as user postgres, and you probably don't know the password for that user. So as user root, run the passwd command and set the password for user postgres.

Now it's time to create your first database. Log into Linux as user postgres and run the following command:
createdb mydb
createlang plpgsql mydb
In the preceding, the first line created the database, and the second enabled that database to use the PL/pgSQL language. Without the second line, the first time you try to create a trigger or stored procedure, you'll get an error something like this:

mydb=# \i /d/at/php/phptest/incr.sql
psql:/d/at/php/phptest/incr.sql:14: ERROR: Unrecognized language specified in a CREATE FUNCTION: 'plpgsql'.
Pre-installed languages are SQL, C, and internal.
Additional languages may be installed using 'createlang'.

Using psql for Configuration

Now get into psql:
psql mydb;
Now that Postgres is running, create a simple people table:
create table people (lname varchar, fname varchar, job varchar);
If you already have a people table from an earlier exercise, drop it:
drop table people;
Now populate it:
insert into people values ('Litt', 'Steve', 'Troubleshooter');
insert into people values ('Gates', 'Bill', 'Monopolist');
insert into people values ('Stallman', 'Richard', 'Leader');
insert into people values ('Torvalds', 'Linus', 'Programmer');

Verify that the values got in:
select * from people;

Setting up a second user

Most work isn't done as user postgres. User postgres has way too many priveleges -- it would be like doing your Linux work as user root. -- one mistake could blow off your whole database. Also, if everyone accesses the data as user postgres there's no data security. But try to access your new people table as brand new Linux user myuid and watch what happens:

[myuid@mydesk myuid]$ psql mydb
psql: FATAL 1: user "myuid" does not exist
[myuid@mydesk myuid]$

The problem is that user myuid exists in Linux, but not in postgres. That's easily fixed by running the following command as user postgres within the psql environment:
create user myuid;
That gets you into psql as user myuid, but watch what happens when you try to access the table as myuid in psql:

mydb=> select * from people;
ERROR: people: Permission denied.

That's easily solved by user postgres performing a grant command:
grant select,insert,update,delete on table people to myuid;
Now watch what happens in user myuid:

mydb=> select * from people;
lname | fname | job
Litt | Steve | Troubleshooter
Gates | Bill | Monopolist
Stallman | Richard | Leader
Torvalds | Linus | Programmer
(4 rows)


Be sure to read all the help text revealed by the \h create user and \h grant and \h revoke commands.

Postgres User Passwords

The fact that you needed to create user myuid within psql indicates that postgreSQL handles users independently of Linux users. That's a VERY GOOD thing because there are cases in web development where the database password must be coded in scripts. That's bad enough, but you REALLY don't want to code a genuine Linux password.

You could have created a password for user myuid when you created him in psql, but instead we'll create the password after the fact:
alter user myuid encrypted password 'mypassword';
Switch to user myuid, and as user myuid get out of psql and back in, and notice -- that nothing's changed! You can still access the data, even though a new password has been put in place for user myuid within postgres. What's that all about?

Read on...
Steve Litt documented the Universal Troubleshooting Process. Steve can be reached at Steve Litt's email address.


By Steve Litt
The last article demonstrated that in a default Postgres installation, changing a user's password within Postgres did not affect the user's ability to access the data. Bummer!

Postgres authentication is determined by a config file called pg_hba.conf in the /var/lib/pgsql/data directory. This file is scanned top to bottom, and the first line that matches a given user determines that user's method of access. It's like an if/elsif/elsif/else construct in Perl -- you keep going until you match something, and then you're done. In the case of pg_hba.conf, when you match you quit parsing the file. If you fall through the entire file without matching anything, you are denied access.

So you want to tighten up security on database mydb. Place the following 3 lines at the top of pg_hba.conf:
local      mydb                                          md5
host mydb md5
host mydb reject
The first line governs access via UNIX sockets, and it states that via UNIX sockets all access to mydb is password protected, and those passwords are md5 encrypted. The second line governs access via IP address, stating that access via IP address to database mydb through IP address (localhost) is only via md5. The third line rejects anything coming from an IP address other than loopback, so it restricts mydb to local access. Note that IP addresses must be stated as IP addresses, never domain names. Postgres can't do dns resolution.

Each line of pg_hba.conf is a security record. No record can span lines. Any line preceded by a pound sign (#) is a comment.

Every record defines a combination of an access type and an authentication type. Access types are:

Access Types
Access via a  UNIX socket
local  database  auth_type  auth_argument
Access via TCP/IP, ssl and non-ssl connections
host  database  ip_address  mask  auth_type  auth_argument
Access via TCP/IP, but ONLY for ssl connections
hostssl  database  ip_address  mask  auth_type  auth_argument

There are many authentication types, each optimal for certain situations. The following table lists them in approximately increasing order of security:

Authentication Types
Auth type
no authentication
This is good if you're the only person on your computer, and the computer itself is absolutely secure (whatever that means :-)
Plain text password
Better than nothing, but not much
ident Rely on the OS The idea is if you're logged in as myuid on the OS, you can be trusted to be myuid in the database. Unfortunately, the ident method isn't difficult to spoof. Ident is a good choice if convenience is critical, and you need to "keep the users honest", but you're not defending against determined attackers.
Encrypted password Good security, but can't store as a shadow password. Pre Postgres 7.2 clients cannot use md5 auth, so if you have pre-7.2 clients in the system use crypt.
Md5 encrypted password
Strong encryption, uses pg_shadow. Requires Postgres 7.2 clients or better.
Kerberos 4
Obviously strong security on a system with Kerberos set up
krb5 Kerberos 5
Obviously strong security on a system with Kerberos set up
Very strong security, but the PAM system must be constructed
No access
The strongest security -- it just plain doesn't let them in :-)

You'll sometimes follow the auth type with an auth argument. Here are some of the auth arguments you might use:

Authentication Arguments
Used with auth types
Logs the user into Postgres with the same username as his UNIX username. The sameuser keyword can also be used in other contexts.
all auth types
Restricts login to a specific Postgres user
all auth types
Restricts login to a specific Postgres group

My Mandrake 9.0 Postgres shipped with an pg_hba file with a single non-commented line, located at the bottom of the file. That line is:

local	all	ident	sameuser
The preceding means that all databases are accessible to those who log into a local UNIX socket, and are authenticated by the OS itself, and allowed into Postgres as the same user as their OS username. So if anything drops down that far, any local connection will be affected by that line.

This article has just scratched the surface of pg_hba.conf. For more info, read the comments in the file itself, and read the documentation on the PostgreSQL website.
Steve Litt is the author of " Troubleshooting Techniques of the Successful Technologist".  Steve can be reached at Steve Litt's email address .

Exploring Your Postgres Database

By Steve Litt
You need the ability to "look around" in your database. This article explores how to do it.

Postgres' psql environment has several backslash commands to explore the database:

\l List all databases
\dS List all system tables
\dt List all user tables
\i List all indexes
\s List all sequences
\v List all views

The backslash commands are nice, but very often you're interfacing with the database via SQL from a computer program and can't use them. In those cases there are often SQL equivalents. For instance, here's how you list all tables:
select tablename, tableowner from pg_tables;
And here's how you list all the columns in a specific table:
SELECT column_name FROM information_schema.columns WHERE table_name = 'mytable';
Steve Litt is the author of " Troubleshooting Techniques of the Successful Technologist".  Steve can be reached at Steve Litt's email address .

Sequence Objects and Serial Fields

By Steve Litt
Postgres has a really cool feature called a sequence object. Here's how you make and use it (within psql as user postgres):

mydb=# \h create sequence
Description: define a new sequence generator
[ MINVALUE minvalue ] [ MAXVALUE maxvalue ]
[ START start ] [ CACHE cache ] [ CYCLE ]

mydb=# create sequence mysequence start 10000;
mydb=# select last_value from mysequence;
(1 row)

mydb=# select last_value from mysequence;
(1 row)

mydb=# select nextval('mysequence');
NOTICE: mysequence.nextval: sequence was re-created
(1 row)

mydb=# select nextval('mysequence');
(1 row)

mydb=# select nextval('mysequence');
(1 row)

mydb=# select last_value from mysequence;
(1 row)


Notice a few things in the preceding:
Like any other Postgres object, the sequence must be GRANTed to other users:
grant select,update on mysequence to myuid;
The preceding grants both select and update. Select is necessary for selecting normal fields like last_value, while update is necessary to perform the nextval()function.

The following is a PHP script that uses the mysequence sequence. Every time you refresh the screen the number will increment:

$connection = pg_Connect ("dbname=mydb port=5432 user=myuid password='mypassword'");
if($connection == 0)
die("Connection failed\n");
$sql = "select nextval('mysequence');";
$result=pg_Exec($connection, $sql);
$row = pg_fetch_row($result, 0);
$number = $row[0];
echo $number;

Serial Fields

A serial field is an autoincrementing database field. It's used primarily to create a unique numeric id suitable for a primary key.

mydb=# drop table people;
mydb=# create table people (person_id serial, lname varchar, fname varchar, job varchar);
NOTICE: CREATE TABLE will create implicit sequence 'people_person_id_seq' for SERIAL column 'people.person_id'
NOTICE: CREATE TABLE / UNIQUE will create implicit index 'people_person_id_key' for table 'people'
mydb=# select * from people;
person_id | lname | fname | job
(0 rows)


Notice that a sequence was created to aid in autoincrementing this field, and an index was created for fast lookup on this field, which will now function excellently as a primary key.

Now add some records and once again view the results:

mydb=# insert into people (torvalds,fna,job) values ('Litt','Steve','Troubleshooter');
ERROR: Relation 'people' does not have attribute 'torvalds'
mydb=# insert into people (lname,fname,job) values ('Litt','Steve','Troubleshooter');
INSERT 16601 1
mydb=# insert into people (lname,fname,job) values ('Torvalds','Linus','Pioneer');
INSERT 16602 1
mydb=# insert into people (lname,fname,job) values ('Gates','Bill','Monopolist');
INSERT 16603 1
mydb=# select * from people;
mydb-# ;
person_id | lname | fname | job
1 | Litt | Steve | Troubleshooter
2 | Torvalds | Linus | Pioneer
3 | Gates | Bill | Monopolist
(3 rows)


The person_id column incremented as the rows were added. It was not necessary to write a trigger to autoincrement the rows. Postgres handles it easily.

Setting a Serial Field

How do you set a serial field? For instance, how would you set it so all records have a 6 digit person_id?

In a word, DON'T DO IT!!! Table primary keys should never be meaningful to humans. If humans want a meaningful field (employee number, for instance), it should be a data field, not a primary key. Because if you change an employee number (because the employee changes offices or gets promoted to partner), you certainly don't want to go through the entire database updating his ID in every row of every table. What you want instead is to place the employee number as just another column in the employee table, and then look up the person_id from the employee number, and use that person_id to locate a record in another table. Something like this:

select * from sales
where emp_id = (select id from employees where empno='PARTNER0005');
The one exception is when you've dropped a table and need to recreate the table, serial field and all. You cannot leave primary key values to change -- all primary key values must exactly match that of the dropped table (which presumably you've dumped). If the original had its primary keys in proper order, with no holes, you can simply drop the table and its associated sequence, create the table with serial field, and copy in the data (minus the primary key field), and the primary key fields will be properly incremented. BE SURE TO CHECK to make sure.

If there are holes in the original keys, it's best to copy the data complete with the primary key, and then run a loop to increment the associated sequence  above the highest value in the original table.

Remember, manually assigning a value to a serial key does NOTHING to the associated sequence. In other words, if the serial field has gone from 1 through 7, and you then insert a row whose primary key is explicitly set to 10, the next time you insert a row whose serial field is not explicitly stated, that field will be set to 8, NOT to 11.

So the obvious question is this: What happens when the sequence gets up to 10? The answer is that it will error out with a duplicate, but the sequence will increment, so the next time it's attempted the insert will succeed, with a serial field number of 11.
Steve Litt is the author of Rapid Learning: Secret Weapon of the Successful Technologist . He can be reached at Steve Litt's email address.

Rant: Non-Meaningful Primary Keys

By Steve Litt

Did I shout loud enough?

I remember a finance VP ordering me to change various employee's IDs in order to have his pet report sort correctly. I remember a company whose every computer program had special logic, operating on the employee ID, to deduce things like seniority and employee type. Then one of the employee  ID ranges ran out of numbers, and every program's logic needed modification.

Especially interesting were cases, in that company, where employees were promoted to a new employee type midyear. They got new employee ID's matching their new status. But imagine the hoops all the applications had to jump through in order to report that single human's income to the IRS, given that the employee had two different employee ID's during the year. In fact, every report had that same employee's numbers split in two.

Please, I beg you, resist the temptation to tamper with primary keys. Make primary keys so non-meaningful that the temptation is gone.

If you really need a human-meaningful code, include it as just another column. For instance, imagine these two tables, where hr_code stands for HumanReadable code:

id: int8
lname: varchar
fname: varchar
hr_code: char10
start_date: date
end_date: date

id: int8
salesperson: int8 <employee:id>
custnum: int8 <customers:id>
saledate: date
amount: money

Let's say you want to look up all the sales info for the employee whose human readable code is CHISALE005. Just put it in the where clause:
select * from sales where salesperson=(select id from employees where hr_code='CHISALE005');
Perhaps you want to update the salesperson field in the sales table, using the hr_code field in the employees table. Here's how you do it:
update sales set salesperson=(select id from employees where hr_code='CHISALE005') where custnum=44;
In the preceding, be sure to use a where clause that identifies only those rows you want changed.

To insert a row, do the following:
insert into sales (custnum,date,amount) values(
select id
from employees
where lname='Anderson' and
Bottom line, meaningful primary keys are a temptation to primary key manipulation, and primary key manipulation is a recipe for disaster. Use non-meaningful primary keys.
Steve Litt is the author of Rapid Learning: Secret Weapon of the Successful Technologist . He can be reached at Steve Litt's email address .


By Steve Litt
In society, laws enable proper functioning of commerce. For instance, without laws against fraud, there would be no basis for trust. Prepayment and credit are impossible without laws requiring adherance to a contract. Without antifraud laws, a trade would require, both parties to show up with their respective goods and money at the same time, and they must be armed to prevent a "ripoff". Not conducive to a sophisticated economy.

Similarly, in databases, constraints are "laws" enabling proper functioning of data manipulation. Imagine a table in which duplicates were allowed in all columns, including the primary key. How would you identify specific entities?

Imagine an employee table, some of whose first and last names are empty. What is the meaning of a person without a name?

How would you like to work with an employee table in which certain employees start dates were later than their termination dates? Exactly when did they work for the company? How much vacation are they eligible for?

Problems like these can be prevented by database constraints.

There are 2 basic scopes of constraints:
There are several types of constraints:
Column constraints affect a single column. Table constraints affect a combination of columns. For instance, if your employee table is keyed on a unique id, that column must be unique. That's a column constraint. However, if your employee table is keyed by last name and first name (you wouldn't really do that, but this is an example), then the combination of last and first names must be unique. There could be a John Doe and a John Smith, a Debbie Doe and a Debbie Smith, but there couldn't be two John Smiths. When the constraint is on a combination of columns, it's said to be a "table constraint", and its syntax is different.

Notice that in order to qualify as a table constraint, the constraint must operate on a combination of columns, not on multiple columns. In other words, if you constrain the first name to uniqueness, and also constrain the last name to uniqueness, what you have is two column constraints -- not a table constraint. With two such column constraints, a table could not contain both Debbie Doe and Debbie Smith, as those two rows would create a duplicate first name. Clearly, to enforce personal uniqueness, what you want is a single table constraint on the combination of first and last name.

The following table shows the syntax for a column UNIQUE constraint and a table UNIQUE constraint:

UNIQUE Constraint Syntax
Column Constraint
Table Constraint
create table employees
(id serial NOT NULL UNIQUE),
lname varchar NOT NULL,
fname varchar);

create table employees
(lname varchar NOT NULL,
fname varchar,
UNIQUE (lname,fname)))

The remainder of this article discusses the types of constraints.

UNIQUE Constraint (table or column constraint)

This constraint says the column (or combination of columns in a table constraint) cannot contain duplicate values. An excellent application for the UNIQUE constraint would be the email address column in the employee table, because no two employees should have the same email address.

UNIQUE Constraint Syntax
Column Constraint
Table Constraint
create table employees
(id serial NOT NULL UNIQUE),
lname varchar NOT NULL,
fname varchar);

create table employees
(lname varchar NOT NULL,
fname varchar,
UNIQUE (lname,fname)))

PRIMARY KEY Constraint (table or column constraint)

The PRIMARY KEY constraint is similar to the UNIQUE constraint in that it bans duplicates in the column or combination of columns for which it's applied. In addition, the PRIMARY KEY constraint works with the FOREIGN KEY constraint in another table, which will be discussed in the next article.

One other difference between PRIMARY KEY and UNIQUE is that there can be no more than one column per table with the PRIMARY KEY constraint, whereas any number of columns in a table may contain the UNIQUE constraint.

The hot tip is to save the PRIMARY KEY constraint for the column used as a row identifier for the table, and use UNIQUE for any columns which, for logical reasons, should not contain dups, such as email address.

PRIMARY KEY Constraint Syntax
Column Constraint
Table Constraint
create table employees
lname varchar NOT NULL,
fname varchar);

create table employees
(lname varchar NOT NULL,
fname varchar,
PRIMARY KEY (lname,fname)))

NOT NULL Constraint (column constraint only)

Sometimes a row would make no sense if a certain column were empty. For instance, the if the last name were empty, who is the person? How are they identified? Even if the person's name is simply "Prince" or "Madonna", that must be placed in the last name field.

CHECK Constraint

The previously discussed constraints could be said to implement business logic, but their simplicity puts them more into the realm of "common sense".  The CHECK constraint can actually implement real business logic, although it shouldn't be relied upon to implement ALL business logic. For instance, imagine that there's a company enforced commission limit of 20% (0.2). That might look like this:

create table employees
lname varchar NOT NULL,
fname varchar,
commission numeric CHECK (commission < 0.2));

There are many other ways to use the CHECK constraint, including limiting entries to a hard-coded list. But in general, you might not want to implement business logic in CHECK constraints, because business logic changes, and it can be difficult to change constraints with ALTER commands. In my opinion CHECK constraints should be restricted to those logical constraints that, for logical rather than business reasons, are more or less cast in concrete.

PRIMARY KEY Constraint Syntax
Column Constraint
Table Constraint
create table employees
lname varchar NOT NULL,
fname varchar,
commission numeric CHECK (commission < 0.2));

create table employees
(lname varchar NOT NULL,
fname varchar,
service_commission numeric,
CHECK(commission < 0.2 AND service_commission < 0.4))


This isn't a constraint at all, but simply a value which a field is set to if the field's value isn't specified upon insert. But its effect can be similar to a constraint. For instance, it can be used to prevent nulls.

It can also be used to reduce work. For instance, imagine a single office business, headquartered in Los Angeles, whose database has hooks for further expansion to other offices. At the current time it would be wasteful to require keypunchers to specify the office. So place the following default:

create table employees
lname varchar NOT NULL,
fname varchar,
office DEFAULT = 'Los Angeles');

FOREIGN KEY Constraint

The FOREIGN KEY constraint is sophisticated enough to rate its own article.


Constraints prevent anarchy in databases, and to a small degree enforce business rules. Constraints can be on a single column, in which case they're called column constraints, or a combination of multiple columns, in which case they're called table constraints.
Steve Litt is the author of Rapid Learning: Secret Weapon of the Successful Technologist . He can be reached at Steve Litt's email address .

One to One, One to Many, and Many to Many

By Steve Litt
This article discusses relations between tables, specifically one to one, one to many, and many to many relations. Throughout this article I mention primary keys and non-primary keys. For the purposes of this article, the actual technically correct description would be unique keys and non-unique keys. However, I think the material's easier to understand if you think in terms of primary keys, and in most cases the primary key is how you look up unique records.

One to One Relationships

A one to one relationship is formed any time the primary key of one table is used as a the primary key of a second table. This is rare because it's much easier to simply put all the information in a single table. Some possible motivations for splitting a single table into multiple one-to-one tables include:
  1. Limitations on table size
  2. Elminate lookup of optional information when not needed
Table size limitations aren't usually a reason in Postgres. With a maximum of 1600 columns per row, infinite table size, and infinite row size, all but the most unusual data will fit in a single Postgres table. More relevant might be performance. If all you want to do is look up the person's start date, you'd prefer not to pull up the 1280x1024x32bpp image associated with that person.

Nevertheless, one to one relationships are rare.

One to Many Relationships

What's more common is the one to many relationship, where the primary key of one table is used as a non-unique, non-primary column in another table. The table in which the value is the primary key is the "one" side of the relationship. This is obvious, because primary keys cannot be duplicated throughout a table. The table containing the value in a non-unique column is obviously the "many" side.

Diagram of 1 to many join

One to many relationships often represent a hierarchy, such as a single supervisor supervising many employees, or a single committee containing several members. As a matter of fact, before the widespread adoption of relational databases, hierarchical databases were often used to make one to many associations.

Many to Many Relationships

There's no such thing as a built-in many to many relationship in relational databases. Instead, such relationships are built from a third table whose purpose is to combine two one to many relationships into a  many to many relationship.

For instance, one person can be a member of several committees, and one committee can (and usually does) contain more than one person. This is your classic many to many relationship. Obviously a People table is needed, as is a Committees table. But how to relate the two?

Relate them with a third table called People_Committee, having two columns -- person and committee. Each row of the People_Committee table represents an individual person on an individual committee. For instance:

People_Committee table

In the preceding table, person 4 is on committees 2 and 4, person 6 is on committeds 2 and 3, person 7 is on committee 4, and person 8 is on committee 4. Looking at it from the committee membership viewpoint, committee 2 contains people 4 and 6, committee 3 contains person 6, and committee 4 contains people 4, 7 and 8. As you can see, to view the committees on which individuals serve, sort the table by person_id. To view it by committee membership, sort it by committee_id. This table should have a PRIMARY KEY constraint on the combination of columns, because it makes no sense to say twice that person 7 is on committee 4. But neither column should have a unique constraint, because in fact by definition of the business case a person can belong to many committees and a committee can have many people.

For performance purposes, you'll need indexes to quickly locate either by committee or by person.

This table yields absolutely no information about the people, nor about the committees, so no SQL statement on this single table yields much info. For instance, you can't list the names of the members of the steering committee. In order to provide full information, the People_Committee table must be joined with the People table and the Committees table as shown below:
Diagram of a many to many relationship

In the preceding, you can look up the names of the members of the steering committee by looking up the committee_id of the committee whose name is steering, then querying the People_Committee table for that committee id, then querying the People table for the person id of the found People_Committee row. Here's the SQL statement:

select p.fname,p.lname 
from People p, People_Committee pc, Committees c
and c.committee_id=pc.committee
and pc.person=p.person_id;

To find all the committees of which Steve Litt is a member, do this:

from People p, People_Committee pc, Committees c
where p.lname='Litt'
and p.fname='Steve'
and p.person_id=pc.person
and pc.committee=c.committee_id;

The SQL statements used to create the three tables are as follows:

create table People_Committee (
person int8,
committee int8,
UNIQUE (person, committee)

create table People (
person_id serial NOT NULL PRIMARY KEY,
lname varchar NOT NULL,
fname varchar,
startdate date,
termdate date

create table Committees (
committee_id serial NOT NULL PRIMARY KEY,
name varchar NOT NULL,
meeting_room varchar


One to one, one to many, and many to many relationships are common data constructs, with one to one being rarer than the others. One to one relationships are implemented by relating the primary key of one table to the primary key of a second table. One to many relationships are impemented by relating the primary key column of the "one" side table to a non-unique column of the "many side table. Many to many relationships are created by creating a third table relating to  the two "many" side tables. The third table has two columns, each being related to the primary key of one of the two "many" side tables.
Steve Litt is the author of Samba Unleashed.   Steve can be reached at Steve Litt's email address.

Foreign Key Constraints

By Steve Litt
Foreign key constraints enforce data integrity of between different tables in one to many relationships. Perhaps the simplest example is to imagine a sales table containing a customer ID column, and a customer table keyed on an ID. When looking up a sales record, you can look up the customer record whose ID is the same as the customer ID in the sales table.

But what happens if that customer's row is deleted from the customer table? The lookup would fail. And from a logical view, what's the meaning of a sale without a customer? What would you do with such information?

To prevent such a situation, the database designer can place a FOREIGN KEY constraint on the sales table that would prevent deletion of any employee table row whose primary key value is contained in a column of the sales table.

Unlike other constraints, the FOREIGN KEY constraint constrains the behavior of two (or more) different tables. Although there are different ways of  implementing foreign key constraints, in Postgres by far the easiest is to match a foreign key in the "many" side table to the primary key in the "one" side table.

The following example implements a relationship between the customer table ("ones" side) and the sales table ("many" side) in which you cannot delete a customer record if that customer has rows in the sales table, and if you update the customer_id in the customer table then the same modification is made in all that customer's rows in the sales table.

First, create a brand new database, called junk, in which to perform this exercise. As user postgres, from the command prompt, perform the following operations:

-bash-2.05b$ dropdb junk
-bash-2.05b$ createdb junk
-bash-2.05b$ createlang plpgsql junk

You now have a virginal database with which to work. Next, go into the psql environment for database junk and create the tables:

create table customers (
customer_id serial NOT NULL PRIMARY KEY,
 name varchar,
address varchar

create table sales (
sale_id serial NOT NULL PRIMARY KEY,
 customer int8
REFERENCES customers(customer_id)
 on update cascade
 on delete restrict,
amount money,
 date date

\d customers

\d sales

In the preceding, notice the REFERENCES clause. This clause means that you cannot insert a sales record with a customer not in the customers table, nor can you update it to something not in the customers table. But it doesn't say anything about what happens to existing sales table records if a customers table record is changed or deleted. The on update and on delete clauses take care of those eventualities.

Now add some data to each table. Note that when we add to the sales table, we use a relation rather than hard copying the proper customer_id:

insert into customers (name) values ('General Motors');

select * from customers order by customer_id;

insert into sales (customer, amount) values(
select customer_id
from customers
where name='General Motors'

select * from sales order by customer;

So now we have a row in the customer table, and a related row in the sales table. And due to the foreign key constraints on the database, we cannot delete the customer row without first deleting the related sales row. Also, if we alter the customer_id in the customers table, the customer column in the sales table will automatically be similarly altered for any sales rows containing the original customer number. Let's test the theory. First, update the customer_id column of the "General Motors" row of the customers table, and observe the effect on both tables:

update customers set customer_id=999 where name='General Motors'; 

select * from customers order by customer_id;

select * from sales order by customer;

Here's the output of the preceding operations:
junk=# update customers set customer_id=999 where name='General Motors'; 
junk=# select * from customers order by customer_id;
999 | General Motors |

junk=# select * from sales order by customer;
1 | 999 | $1.44 |


Finally, attempt to delete the "General Motors" row from the customers table:

delete from customers where name='General Motors';

select * from sales order by customer;

And here's the output, proving that the delete failed.

junk=# delete from customers where name='General Motors';
ERROR: <unnamed> referential integrity violation - key in customers still referenced from sales
junk=# select * from sales order by customer;
sale_id | customer | amount | date
1 | 999 | $1.44 |
(1 row)


If you REALLY want to delete the "General Motors" row from customers you must first delete any rows from the sales table having General Motors' customer id in the customer column:

delete from sales 
where customer=(select customer_id from customers where name='General Motors');

delete from customers where name='General Motors';

select * from sales order by customer;

select * from customers order by customer_id;

The results are shown below. Both tables are now empty.

junk=# delete from sales 
junk-# where customer=(select customer_id from customers where name='General Motors');
junk=# delete from customers where name='General Motors';
junk=# select * from sales order by customer;
sale_id | customer | amount | date
(0 rows)

junk=# select * from customers order by customer_id;
customer_id | name | address
(0 rows)


Table vs Column Foreign Key Constraints

If the foreign key relates to a single column primary key, use the column constraint. But if the primary key is multisegment (for instance, order_id and item_id), you need to use the table constraint syntax.

The following table shows both. In the sql statement on the left the customer column in the sales table has a foreign key constraint on the customer_id column in the customers table. The sql statement on the right relates to an employee table whose primary key is the combination of last and first name (naughty, this primary key is meaningful, and it breaks the minute they hire a second John Smith, but it's great as a multisegment key example).

FOREIGN KEY Constraint Syntax
Column Constraint
Table Constraint
create table sales (
sale_id serial NOT NULL PRIMARY KEY,
 customer int8
REFERENCES customers(customer_id)
 on update cascade
 on delete restrict,
amount money,
 date date

create table chairmen (
char_id serial NOT NULL PRIMARY KEY,
lname varchar,
fname varchar,
committee varchar,
FOREIGN KEY (lname, fname) REFERENCES employees(lname,fname)
on delete restrict
on update cascade

Types of Foreign Key Constraints

As mentioned earlier, the REFERENCES or FOREIGN KEY keyword, in and of themselves, prevent setting the foreign key field to a value not contained as a primary key in the referenced table. But by themselves they don't solve problems associated with changing the primary key value in the referenced table or deleting the row in the referenced table.

To determine what happens when a primary key value gets updated or deleted, the foreign key clause can include either or both of the following clauses:
Each of the preceding eventualities can be assigned one of three action types:
With RESTRICTED, the delete or update simply fails. This is an excellent choice, because changing primary key values is a bad thing, as explained earlier in this magazine. And as far as deletion, if sales information is important enough to keep, why would you delete the salesman or the customer? Customers could have an active or inactive flag, and salesmen have hire date and term date information, so there's no need to delete a salesman or customer when they stop being your salesman or your customer.

CASCADES means that if you change the primary key in one table, the matching foreign keys in other tables will be likewise changed. If your business rules allow changing primary keys (a bad idea IMHO), then cascading updates is an excellent way of keeping everything in sync. ON UPDATE CASCADES is often used.

Cascading deletes is pretty radical. Accidentally deleting a customer record is fixed with a 5 minute input. But if the deleted customer takes all that customer's sales records with it, you've got some serious recovery to do. In my opinion record deletion is like purging paper records. You have a certain time of inactivity (like 7 years or 20 years) that signals the records are no longer needed. At that point you purge. But you don't need cascading deletes for such a purge -- a few simple SQL statements will do it just fine. I doubt you'll ever see cascading deletes here at Troubleshooters.Com, except possibly in a program not needing historical perspective, such as a menu program.

NULLS means that when you delete a primary key record, any foreign key records referring to the deleted records have their corresponding foreign key columns set to NULL. That way you can find rows with null values and assign a new primary key id to each such row. If you feel a need to allow deletions of the primary key table, and if such a deletion shouldn't take related records with it, then ON DELETE NULLS is a reasonable choice.

ON UPDATE NULLS has little logical meaning, so don't use it.


Foreign keys are a method of keeping your various tables in sync. This "in sync-ness" is often called "referential integrity.

To implement foreign keys for a one to many table relationship, declare a primary key in the "one" table, and refer to that column in the REFERENCES (column constraint) or FOREIGN KEY (table constraint) syntax of the "many" table.

The foreign key guarantees that you cannot set the foreign key field in any row of the "many" table to a value that is not the value of the primary key column in a row of the "one" table. That's all it does.

To make it do more, such as determine when the referred row in the "one" table is deleted or updated, you use the "ON UPDATE" or "ON DELETE" syntax, with an action of either RESTRICTED, CASCADES, or NULLS.

RESTRICTED is an excellent choice, because changing primary key values is IMHO bad practice, and deleting records referenced by other tables is downright nasty. You don't need foreign keys to purge ancient data -- a few easy SQL statements can do it.

If your business rules allow changing primary keys, then ON UPDATE CASCADES is a reasonable choice.
Steve Litt documented the Universal Troubleshooting Process. Steve can be reached at Steve Litt's email address.

Life After Windows: Big Iron on Your Kitchen Table

Life After Windows is a regular Linux Productivity Magazine column, by Steve Litt, bringing you observations and tips subsequent to Troubleshooters.Com's Windows to Linux conversion.
By Steve Litt
I'm old enough to remember when the only way to get real, SQL enabled database experience, was to work for a big iron shop that paid tens or hundreds of thousands for their DBMS. If you programmed at home, you used DBASE or FoxPro or Clipper or whatever, and did without SQL. In later days those programs worked with SQL, and you could beg, borrow or steal tiny, crippled DBMS's to test the logic (but not the high volume performance) of your programs.

This was a huge barrier to entry for enterprise programming. The only way to get hands-on SQL experience was to get an SQL job, and why would anyone give you an SQL job without experience? And there was absolutely no way you could write personal apps with a real SQL database. You couldn't sell SQL enabled vertical market apps unless you were also an unpaid salesman for the database vendor. And usually those license fees were more than you could charge for your program. Even web programming with SQL was problematic, with few web hosts offering a database, and charging bigtime if they did.

How different it is now. Almost every Linux distribution is bundled with both PostgreSQL and MySQL. Most small web hosts run on Linux, so it's trivial for them to set up either of those DBMS's. Those DBMS's might not be scalable enough for huge, enterprise apps, but by the time you get to the point of needing something like Oracle, you've got enough money to buy it.

In the old days programmers kept their jobs with scarce and hard to obtain big-iron knowledge. Today anyone can buy a cheap computer and a $59.00 Linux distro with a professional level PostgreSQL DBMS, languages to hit it, such as PHP, Perl, Python, Ruby, C and C++, and professional level editors such as Vim and Emacs to write the code. A cottage industry is growing up around these tools. In these grim days of high unemployment and business stagnation, application developers around the world are making money writing vertical market apps and web apps.

They're not making as much as big-iron programmers. But of course more and more big-iron programmers find themselves unemployed as their skills are supplanted by kitchen table programmers with Linux based mainframe-lites on their kitchen tables.
Steve Litt is the author of the course on the Universal Troubleshooting Process.  He can be reached at Steve Litt's email address .

Letters to the Editor

All letters become the property of the publisher (Steve Litt), and may be edited for clarity or brevity. We especially welcome additions, clarifications, corrections or flames from vendors whose products have been reviewed in this magazine. We reserve the right to not publish letters we deem in bad taste (bad language, obscenity, hate, lewd, violence, etc.).
Submit letters to the editor to Steve Litt's email address, and be sure the subject reads "Letter to the Editor". We regret that we cannot return your letter, so please make a copy of it for future reference.

How to Submit an Article

We anticipate two to five articles per issue, with issues coming out monthly. We look for articles that pertain to the Linux or Open Source. This can be done as an essay, with humor, with a case study, or some other literary device. A Troubleshooting poem would be nice. Submissions may mention a specific product, but must be useful without the purchase of that product. Content must greatly overpower advertising. Submissions should be between 250 and 2000 words long.

Any article submitted to Linux Productivity Magazine must be licensed with the Open Publication License, which you can view at At your option you may elect the option to prohibit substantive modifications. However, in order to publish your article in Linux Productivity Magazine, you must decline the option to prohibit commercial use, because Linux Productivity Magazine is a commercial publication.

Obviously, you must be the copyright holder and must be legally able to so license the article. We do not currently pay for articles.

Troubleshooters.Com reserves the right to edit any submission for clarity or brevity, within the scope of the Open Publication License. If you elect to prohibit substantive modifications, we may elect to place editors notes outside of your material, or reject the submission, or send it back for modification. Any published article will include a two sentence description of the author, a hypertext link to his or her email, and a phone number if desired. Upon request, we will include a hypertext link, at the end of the magazine issue, to the author's website, providing that website meets the Troubleshooters.Com criteria for links and that the author's website first links to Troubleshooters.Com. Authors: please understand we can't place hyperlinks inside articles. If we did, only the first article would be read, and we can't place every article first.

Submissions should be emailed to Steve Litt's email address, with subject line Article Submission. The first paragraph of your message should read as follows (unless other arrangements are previously made in writing):

Copyright (c) 2001 by <your name>. This material may be distributed only subject to the terms and conditions set forth in the Open Publication License, version  Draft v1.0, 8 June 1999 (Available at (wordwrapped for readability at The latest version is presently available at

Open Publication License Option A [ is | is not] elected, so this document [may | may not] be modified. Option B is not elected, so this material may be published for commercial purposes.

After that paragraph, write the title, text of the article, and a two sentence description of the author.

Why not Draft v1.0, 8 June 1999 OR LATER

The Open Publication License recommends using the word "or later" to describe the version of the license. That is unacceptable for Troubleshooting Professional Magazine because we do not know the provisions of that newer version, so it makes no sense to commit to it. We all hope later versions will be better, but there's always a chance that leadership will change. We cannot take the chance that the disclaimer of warranty will be dropped in a later version.


All trademarks are the property of their respective owners. Troubleshooters.Com(R) is a registered trademark of Steve Litt.

URLs Mentioned in this Issue