Troubleshooters.Com Presents

The logeval
GPL Project

Copyright (C) 2002 by Steve Litt


NO WARRANTY!
There is no warranty for anything contained in the logeval distribution or documentation or its web pages, to the extent permitted by applicable law.  Except when otherwise stated in writing the copyright holders and/or other parties provide the program, documentation and web pages "as is" without warranty of any kind, either expressed or implied, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose.  The entire risk as to the quality and performance of the program is with you.  Should the program, documentation or web pages prove defective, you assume the cost of all necessary servicing, repair or  correction.

logeval is a program to analyze a set of UNIX/apache log files and come up with meaningful statistics.

CONTENTS

Project charter

The logeval program is intended to give daily statistics important in using a website as an advertisement. Like other log analysis programs, it prints out all web pages in reverse order of traffic. But in addition, it allows you to flag specific web pages to analyze. Typically these would be advertisement pages.

Also, unlike most analysis programs, it prints the top 10 most hit sites for each day, together with the total visits for the day and the total distinct IP addresses for the day. This daily refinement enables you to more quickly and accurately gauge the effect of changes in advertisements, correlating changes in content with both changes in traffic and changes in sales.

Another feature is the ability to place special events in an event file, so that the events will print before the day on which they occurred. Thus you might have a June 4, 2002 event called "Got a link from bigsite.com", which will then print above the June 4, 2002 report entry, thus reminding you why your stats went up 15%.

This object oriented program can be enhanced as desired. The Accumulator class compiles totals for a given period. Each day gets an Accumulator object, and there's an Accumulator object for the entire report. It would be easy to create weekly or monthly Accumulator objects, or an Accumulator object for the last 7 days.

There are a few downsides. The program is written in Perl, and is therefore slower than you might expect. To minimize the Perl effect, the program has a preprocessor (logeval.cgi) which pre-trims the log file using highly efficient and thoroughly tested UNIX utilities. Other downsides include the fact that it ignores graphic files and it ignores bandwidth. This would NOT be a good tool to analyze bandwidth.

Depending on your web host and the size of your log(s), this program might be runnable on the web host. However, it might time out, in which case your best alternative is every day to use ftp to incrementally get (reget) the new parts of your log file, and then run the program on your desktop computer. The Troubleshooters.Com logs from 3/24/2002 thru 6/4/2002 comprise 518703 html page accesses, and the analysis takes 3 minutes and 11 seconds to run on my dual Celeron 450 with 512Meg and Mandrake Linux 8.2.

To repeat, logeval is built to analyze the immediate effect of content changes on traffic patterns and sales.

Project specifications

This program consists of the following files:

logeval.cgi

This is a shellscript that cats the list of files produced by logfilelist.cgi, and pipes it  through grep statements to get rid of graphic file records and other filetypes that aren't being tracked, as well as accesses that didn't produce a 200 result, and finally pipes the result to logeval_worker.cgi, which does all the analysis.

logfilelist.cgi

Based on a wildcard in configuration file  $HOME/.logeval/logeval.conf, this program outputs a list of log files, sorted in date order from earliest to latest. This program may need to be changed to accommodate the way your ISP names log files.

logeval_worker.cgi

This is an OOP Perl program that does all the analysis work. For best performance with high traffic sites, the per-line algorithm looks something like this:
foreach line
    parse the line
    if datestamp != previous datestamp
        do break logic
    add to current daily Accumulator
On a site like Troubleshooters.Com, only 1 out of every 5000 lines triggers a date change, so by offloading anything date related to the break logic, and by updating ONLY the daily Accumulator, you maximize performance. Other accumulators are updated during break logic by accumulating the proper daily Accumulators.

Two classes are intended to be substantially modified: the Breaklogic and Writer classes. You can customize the report by modifying these. As far as the Writer object, you're probably better off subclassing it. For instance, you could have a DailyWriter, WeeklyWriter, MonthlyWriter, Last7FullDayWriter, and ReportWriter, all descended from the Writer class. The current program just uses Writer to write both the daily Accumulators and the Report Accumulator.

The following is a list of classes in this program:

logeval.conf

This is the config file for the program. At present it contains only 2 types of lines:
  There's only 1 log file wildcard record, and it looks something like this:
log wildcard = /scratch/tclogs/troubleshooters.com-access_log*
Each special URL record defines one URL to track separately, and looks like this:
special url = /bookstore/order.htm

specialevents.list

This is a list of major special events that you believe would explain changes in traffic or sales. For instance, on 4/21/2002 I changed the bookstore main page to be an order form, and aimed ALL Troubleshooters.Com links at that main page. Main page traffic skyrocketed, but sales plummeted. T.C's readers obviously needed to read about the book before purchasing it. On 5/12/2002 I aimed Troubleshooters.Com book links at the pages for the specific books, instead of the main page. Main page (order form) visits dropped like a rock, but book pages skyrocketed and sales went back to its pre 4/21 sales.

The following is my ~/.logeval/specialevents.log:

2002/04/21@15:00 Bookstore main page becomes order form, all links aimed there
2002/05/12@17:00 Move links back to book ads
These entries print above the output for their respective days, yielding a very clear view of exactly what happened.

Download links for the project sources.

Maintainers list

Needed Programming and Documentation Tasks

Here are some  items on the todo list:

How to Participate

Currently this project isn't mature enough for multiple programmers. If you make what you consider a valuable change to the program, please feel free to email me describing the change.

Instructions on how to join the project mailing list

No mailing list currently.
 

FAQ (Frequently Asked Questions) list.

None currently.
 

HTMLized versions of the project documentation

None currently.
 

Links to related projects.

None.
 

Dedication: We Stand On Their Shoulders

Larry originated the language that made this an easy 1 day project, Linus originated the OS that it runs on, and Richard orginated the license that made all the rest possible.

Progress

To be annonced

Top of Page
 

^