Archive for the ‘Perl’ Category

World’s smallest Apache log analyser.

Friday, April 11th, 2008

Right here:

#!/usr/bin/perl

##
# Simple Apache access log parser that yields useful statistics.
# Designed to serve basic web analysis needs.
#
# Alex Balashov 

use strict;
use warnings;

## GLOBALS ##
my      @hits = ();
#############

# Preload.

while() {
        chomp;

        if(/^([0-9.]+)\s+\-\s+\-\s+\[(.[^]]*)\]\s+\”(.[^”]*)\”\s+[0-9]+\s+[0-9]+
\s+\”(.[^”]+)\”/o) {
                push(@hits,
                        {
                                ‘ip_addr’ => $1,
                                ‘date’ => $2,
                                ‘request’ => $3,
                                ‘referrer’ => $4
                        });
        }
}

# Generate reports.

print “Access log report: \n\n”;

print “Top 20 IP addresses:\n” .
      “——————-\n\n”;

&sorted_report(’ip_addr’, 20);

print “\nTop 50 requests: \n” .
      “—————\n\n”;

&sorted_report(’request’, 50);

print “\nTop 50 referrers: \n” .
      “——————\n\n”;

&sorted_report(’referrer’, 50);

# Generate function to hash out unique requests and sort by a certain
# hashed criterion.

sub     sorted_report {
        my      ($key, $limit) = @_;
        my      %unique_tokens = ();

        foreach(@hits) {
                $unique_tokens{$_->{$key}} = 0 unless
                        exists ($unique_tokens{$_->{$key}});

                $unique_tokens{$_->{$key}} ++;
        }

        foreach(sort { $unique_tokens{$b} <=>
                        $unique_tokens{$a}
                     } keys %unique_tokens) {
                $limit — unless $limit == 0;

                printf ”  %-50s %d\n”, $_, $unique_tokens{$_};

                last if $limit == 0;
        }
}

Tells me everything I need to know. Try it yourself:

   cat access.log | perl analyse.pl

Limiting Perl regular expressions to one compilation.

Wednesday, January 23rd, 2008

Here is one thing you can do to speed up the performance of your Perl regular expression matching: use the /o operator.

By default, when you iterate through a loop and apply a regular expression repeatedly, Perl recompiles the regular expression internally every time:

while() {
        chomp;
        if(/^[a-zA-Z]+/) {
                …
        }

}

This is actually wise, as Perl makes no assumption about whether your pattern is dynamically generated. If it changes every time, the expression would require recompilation:

while() {
        chomp;
        my $pattern = get_upper_or_lower_pattern($_);
        if(/$pattern/) {
                …
        }
}

But if you’re using the same pattern over and over, you may want to apply the /o operator in order to have the expression compiled only once:

while() {
        chomp;

        if(/^[a-zA-Z]+/o) {
                …
        }
}

While my benchmarking suggests that the performance benefits are relatively negligible for a simple regex like the one above, they can be quite considerable for a pattern of greater complexity.