Logwrangler: Batch Log Analysis

Logwrangler 2

Logwrangler is a perl program to generate web traffic reports using Analog, Report Magic, DNSTran, and the included logtable.pl script (which generates a table summarizing statistics across multiple sites).

Security Tip

Logwrangler doesn't require special privileges, so you should run it (and all the associated programs) as an unprivileged user. In its base configration, Logwrangler does all its work inside ~/logwrangler, which should contain three directories: cfg (for Analog config files), data (for Analog output files), and stat (for reports). If you want to make reports publicly visible, you can make stat a symlink to a directory accessible through your webserver. You could set this up with mkdir -p ~/logwrangler/cfg ~/logwrangler/data && ~/public_html/stat && cd ~/logwrangler && ln -s ../public_html/stat.

Warning: Be sure the user running Logwrangler has permission to write to Analog's dnscache and dnslock files (Analog isn't always configured this way -- you can see which files it is configured to use with analog --settings | grep DNS) and the data and output files (data and stat directories).

How to Use Logwrangler

  1. Download logwrangler-2.0.1.tgz, which includes logwrangler.pl, logtable.pl, and readme.html (this document).
  2. Install Analog.
  3. Install Report Magic (strongly suggested); with a current version of Fink, you can just use fink install analog reportmagic to get both programs.
  4. Install DNSTran (optional). Note: if using Analog from Fink, you'll probably want to put cache /sw/var/analog/dnscache into dnstran.cfg.
  5. Put them both somewhere in your PATH (~/bin and /usr/local/bin are reasonable places).
  6. Under Mac OS X, you might need to add
    setenv PATH $PATH":/usr/local/bin:~/bin"
    to your ~/.cshrc or ~/.tcshrc file, and start a new Terminal session to pick up the configuration change; otherwise, just rehash (if using a csh-based shell) to pick up the new programs.
  7. Put all your Analog configuration files into a single directory (~/logwrangler/cfg); make sure nothing else is in that directory.
  8. Each configuration file should contain at least the following Analog commands (shown with sample values):
    HOSTNAME www.reppep.com
    HOSTURL  http://www.reppep.com/
    OUTPUT   COMPUTER # This could also go in the master analog.cfg
    OUTFILE  /home/pepper/logwrangler/data/www.reppep.com.txt
    LOGFILE  /var/log/httpd/www.reppep.com-access_log*
    # this should pick up compressed logs
  9. Both Report Magic and logtable.pl use Analog's COMPUTER output; if you want Analog to generate human-readable reports directly (if you're not using Report Magic, for example), use a second configuration file for each site, perhaps with OUTPUT HTML and something like OUTFILE /home/pepper/logwrangler/stat/www.reppep.com/index.html in the other file. Note: Analog doesn't understand '~' -- you must spell out the path to your home directory.
  10. Logwrangler will create a directory for each report from the file part of OUTFILE (after removing any .txt, .out, or .dat suffix). The summary table will link to these directories. If you are not using Report Magic, you should use the same scheme for OUTFILE in your formatted reports, or adjust the programs to suit your configuration. Note: Files other than Analog COMPUTER output in the OUTFILE directory may confuse Logwrangler.
  11. Check the variables at the beginning of logwrangler.pl, and adjust them as necessary to match your configuration. Make sure all directories mentioned in OUTPUT commands exist (mkdir -p ~/logwrangler/data); Analog won't create directories.
  12. logwrangler.pl -v -- this should run DNSTran on your logs (if available), Analog on your configuration files, Report Magic on the Analog output files, and finally logtable.pl to summarize the same output files. The -v switch provides verbose output; you can leave it off for less status information, or use -q for no output except errors.
  13. Visit $outputpath in a web browser; you should see the summary table (index.html), with links to the individual reports.
  14. If you want to make your reports visible on the web, make ~/logtable/stat a symbolic link to a directory accessible from the web, or change the value of $outputpath in logwrangler.pl. Security: You might want to restrict access to your reports; you can configure this in your web server.

Requirements

Notes

logwrangler.pl -v will show timings for each Analog configuration run.

If you remove a configuration file, but leave its data file in $outputpath, Report Magic and logtable.pl will continue to use the old data file.

Possible Future Enhancements


Change Log

2.0.1
Changed logtable.html to index.html.

Multilog 1.x

Multilog version 1.x was a simpler sh script to run Analog and DNSTran on a batch of files. I wrote it because I needed to crunch a lots of logfiles for many different sites nightly, and used it with at (Windows NT's cron equivalent). I later wrote a small supplemental awk program to generate an HTML table summarizing stats across sites; it has been rewritten as logtable.pl. To avoid confusion with the daemontools multilog program, version 2 has been renamed to Logwrangler.


May, 2003

Up (Pepper's Analog page)