Into the Void


GeoLocation Logging

Posted in Tech by Scott Baldwin on the July 10th, 2006

GeoLocation, sometimes referred to as GeoIP, is the mapping of IP addresses to geographic information.  In other words, GeoLocation is how some Websites will know that you’re from Springfield, USA.

For a long time now it has been easy and straight-forward to determine the country of a user with a high degree of accuracy, but determining the user’s city or region has been a different story until very recent years.  This greatly improved accuracy is why it’s no surprise that GeoLocation has become increasingly popular in targeting specific demographics with localized ad content.  So this morning I thought to myself, “Why doesn’t Etherice have this capability yet?!”  Well it does now… check out the screenshot.

The solution I found is called GeoLite City from www.maxmind.comGeoLite City is a free, but slightly less accurate version of the commercial GeoIP City system from MaxMind.  Installation and integration with my custom logger were incredibly simple.  First I downloaded the (PEAR-compliant) database file from maxmind.com  and copied it to a location on my Webhost.  Then I downloaded the PHP interfaces for it and added this helper function to my logger script:

function ip2loc($ip) {
 global $gi;
 if ( empty($ip) ) {
  return null_or_val();
 }
 $record = geoip_record_by_addr($gi, $ip);
 $retval = '';
 if ( !empty($record->city) ) {
  $retval .= $record->city . ", ";
 }
 if ( !empty($record->region) ) {
  $retval .= $record->region . ", ";
 }
 $retval .= $record->country_name;
 return null_or_val($retval);
}

Note: null_or_val() is just a helper function that prints a red “null” instead if the value passed in is empty.

And that’s it!  One important thing to note is that I do not perform the GeoLocation lookup on every page access.  In my opinion, it is better to do the lookup on-the-fly when the data is being presented:

echo "<span class=\"sloc\">" . ip2loc($row['ip']) . "</span>\n";

First off, this method has the advantage of not having to store the GeoLocation lookup results in my SQL database, saving space.  Second, and more importantly, it keeps the log tables free of redundancy.  If you’ve ever taken a DBMS course then you know that this allows for greater normalization.  So next month when I update the MaxMind GeoLocation database with a newer, more accurate one, older logs will benefit too.

So far I’m impressed with how accurate this free database is.  According to MaxMind it’s only 60% accurate, but it seems like “misses” are not too far off from the actual city (e.g., a suburb outside the actual city).  I also like the fact that updating the database each month will consist of overwriting a single file… in fact I’ll probably just create a cron script that does it automatically.

For now I’m simply using the GeoLocation info for traffic analysis, but I may eventually parlay it into something more useful.  When you think about it, the possibilities are near-endless.

Comments are closed.