ASIC/FPGA Design and Verification Out Source Services
This page presents a script, which is used by this site, to extract,
what pages have been accessed (from apache2 access log).
-
To view the main page of this
work.
This page explains the basics parts of the PERL script.
-
In the first part of the script, I build the date in the very same format
used by the apache2 web server.
While the day and year are simple, the month is taken from an array:
...
- my @a_month = ();
- push(@a_month, "Jan");
- push(@a_month, "Feb");
- ...
- my $date_tmp=`date +%m`; chomp($date_tmp);
- my $date=$a_month[$date_tmp-1];
- $date_tmp=`date +%d`; chomp($date_tmp);
- $date=$date_tmp . "/" . $date;
- $date_tmp=`date +%y`; chomp($date_tmp);
- if( length($date_tmp) == 2 ) {$date_tmp="20" . $date_tmp;}
- $date=$date . "/" . $date_tmp;
...
-
The next part is the hash. The hash key is the HTML file and its content
is the number of times the page was visited today.
...
- my %hash_cnt = ();
- my $fp_val="";
- ...
- if( $line =~ /[0-9]*\.[0-9]*\.[0-9]*\.[0-9]*.*(my_web\/.*\.html).*/ ) {
- $t1=$1;
- #filter two html entries
- $search_ix=index($t1, "html ");
- if( $search_ix >= 0 ) {
- $t1=substr($t1, 0, $search_ix+4);
- }
- #filter non my page
- $search_ix=index($t1, "www\.google");
- if( $search_ix < 0 ) {
- $fp_cnt=$t1;
- $hash_cnt{ $fp_cnt }++;
...
-
The data is finally written out the hash to create an HTML file
HTML report,
using a reverse
(b <=> a and not a <=>b)
sort to show the largest value first
:
for my $key ( sort {$hash_cnt{$b} <=> $hash_cnt{$a}} keys %hash_cnt ) {
-
Note: that in many case I use the PERL function index to find a
string in a string. This is faster than the regular expression syntax:
if( $line =~ //) ...
-
The script is also capable to filter out based on an IP list. For instance
entries, which start with my router's IP at home or work.
...
my $filter_ip_1="192.168.2.1";
my $entry_ip="";
...
- my @filter_ip_a = ();
- push(@filter_ip_a, "192.168.2.1"); #home
- push(@filter_ip_a, "82.166.32.218"); #broadcom
- ...
- #filter my views, which come from my router
- FILT_L : foreach $searchAix (@filter_ip_a) {
- $search_ix=index($entry_ip, $searchAix);
- last FILT_L if ($search_ix >= 0);
- }
- if( $search_ix < 0 ) {
...
-
The script also print a IP report. Only the most popular are printed (above
30 percent of the total). The most popular ones are printed with a darker
color than others:
IP report
180.149.52.44.. 100% 50
46.117.65.202.. 088% 44
192.109.150.105 086% 43
|