comment on

A couple of days ago i looked into GeoIP in Perl+PostgreSQL+GeoIP = Awesome!. Since then i have learned that the GeoIP lists i was using were out of support. The new public lists are in a new format. So i took the opportunity to rewrite the whole thing and do pretty much everything with Perl, not using external commands like "unzip" and "wget". This should make things a bit more portable.

I'm sorry, it isn't written "nice" and isn't really documented. I designed it as a cron job for a single private server ;-)

I'm still calling the perl interpreter from a bash script so i can set the correct environment variables and stuff. But it's a lot smaller now:

#!/usr/bin/env bash

. ~/.bashrc_activestate

cd /home/myuser/src/geoip

perl updategeoip.pl
[download]

The database tables stays exactly the same as in the last post, here again for reference:

CREATE TABLE geoip
(
  netblock cidr NOT NULL,
  country_code text NOT NULL,
  country_name text NOT NULL,
  CONSTRAINT geoip_pk PRIMARY KEY (netblock)
  USING INDEX TABLESPACE "NAMEOFINDEXTABLESPACE"
)
WITH (
  OIDS=FALSE
)
TABLESPACE "NAMEOFDATATABLESPACE";
ALTER TABLE geoip OWNER TO "mydatabaseuser";
[download]

And here is the new all-in-one script:

#!/usr/bin/env perl

use strict;
use warnings;
use diagnostics;

use LWP::Simple qw[get];
use Archive::Zip;
use Data::Dumper;
use DBI;

my $dbh = DBI->connect("dbi:Pg:dbname=Cables_DB;host=localhost", 'myda
+tabaseuser', 'secretpassword', {AutoCommit => 0, RaiseError => 0})
        or die("can't connect to DB");

my $dodownload = 1;

if($dodownload) {
    print "Downloading...\n";
    my $url = 'http://geolite.maxmind.com/download/geoip/database/GeoL
+ite2-Country-CSV.zip';
    my $zipdata = get $url;

    if(!defined($zipdata) || !length($zipdata)) {
        die("Could not get ZIP data\n");
    }

    open(my $OFH, '>', 'geoip.zip') or die($!);
    binmode $OFH;
    print $OFH $zipdata;
    close $OFH;
}

my $zip = Archive::Zip->new();
$zip->read('geoip.zip');

my @fnames = $zip->memberNames();

#print Dumper(\@fnames);

# Read english "locations" file
my ($locmembername) = $zip->membersMatching('.*\/GeoLite2\-Country\-Lo
+cations\-en\.csv$');
$zip->extractMember($locmembername, 'locations.csv');

open(my $locfh, '<', 'locations.csv') or die($!);
my %locations;
my $tmp = <$locfh>; # Ignore header;
while((my $line = <$locfh>)) {
    chomp $line;
    my ($id, undef, $continentcode, $continentname, $countrycode, $cou
+ntryname) = split/\,/, $line;
    if($countrycode eq '' && $continentcode ne '') {
        $countrycode = $continentcode;
    }
    if($countryname eq '' && $continentname ne '') {
        $countryname = $continentname;
    }
    if($id eq '' || $countrycode eq '' || $countryname eq '') {
        print("Locations read error on line: $line\n");
        next;
    }
    my %country = (
        code => $countrycode,
        name => $countryname,
    );
    $locations{$id} = \%country;
}
close $locfh;

# Extract IPv4
my ($ipv4membername) = $zip->membersMatching('.*\/GeoLite2\-Country\-B
+locks\-IPv4\.csv$');
$zip->extractMember($ipv4membername, 'ipv4.csv');

# Extract IPv6
my ($ipv6membername) = $zip->membersMatching('.*\/GeoLite2\-Country\-B
+locks\-IPv6\.csv$');
$zip->extractMember($ipv6membername, 'ipv6.csv');

# Load both files
my @netblocks;
foreach my $fname (qw[ipv4.csv ipv6.csv]) {
    open(my $ipfh, '<', $fname) or die($!);
    $tmp = <$ipfh>; # Ignore header
    while((my $line = <$ipfh>)) {
        chomp $line;
        my ($cidr, $countryid, $countryid2, $countryid3) = split/\,/, 
+$line;

        if($countryid eq '' && $countryid2 ne '') {
            $countryid = $countryid2;
        }
        if($countryid eq '' && $countryid3 ne '') {
            $countryid = $countryid3;
        }

        if($cidr eq '' || $countryid eq '') {
            print ("Unknown country in File $fname on line: $line\n");
            next;
        }
        if(!defined($locations{$countryid})) {
            die("Fail in file $fname, country code $countryid unknown 
+in line: $line\n");
        }
        my %netblock = (
            cidr => $cidr,
            code => $locations{$countryid}->{code},
            name => $locations{$countryid}->{name},
        );

        push @netblocks, \%netblock;

    }
    close $ipfh;
}

print 'Loaded ', scalar @netblocks, " entries\n";
            
unlink('locations.csv');
unlink('ipv4.csv');
unlink('ipv6.csv');


print "Truncating table...\n";
$dbh->do("TRUNCATE geoip");
my $insth = $dbh->prepare_cached("INSERT INTO geoip (netblock, country
+_code, country_name) VALUES (?,?,?)")
        or die($dbh->errstr);

my $linecount = 0;

print "Inserting data into table...\n";
foreach my $block (@netblocks) {
    $linecount++;
    if($linecount % 10000 == 0) {
        my $percent = int(($linecount / scalar @netblocks) * 10000) / 
+100;
        print "   $linecount  = $percent%...\n";
    }
    my $countrycode = sanitize($block->{code});
    my $countryname = sanitize($block->{name});
    my $cidr = $block->{cidr};
    
    if(!$insth->execute($cidr, $countrycode, $countryname)) {
        $dbh->rollback;
        print "Fail on $linecount: ", $dbh->errstr, "\n", Dumper($bloc
+k), "\n";
        exit(1);
    }
    
}
$dbh->commit;
$dbh->disconnect;

unlink('geoip.zip');
print "Done\n";
exit(0);

sub sanitize {
    my ($val) = @_;
    
    $val =~ s/^\"//;
    $val =~ s/\"$//;
    
    return $val;
}
[download]

This script uses the newish "GeoLite2" databases from MaxMind. If you use them, please make sure you comply to the open source licensing stated on their official page

perl -e 'use MIME::Base64; print decode_base64("4pmsIE5ldmVyIGdvbm5hIGdpdmUgeW91IHVwCiAgTmV2ZXIgZ29ubmEgbGV0IHlvdSBkb3duLi4uIOKZqwo=");'

In reply to GeoIP revisited by cavac

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Do you know where your variables are?
	PerlMonks