So, like everyone else who’s been paying any kind of attention to recent database technology trends, I’d heard about this No-SQL stuff, heard it was fast, and far more approachable to your average developer due to the JSON based formatting, but had never had the time to sit down and really investigate things. And then I heard about MongoDB, specifically about their superb geo-location functions – http://www.mongodb.org/display/DOCS/Geospatial+Indexing – and as you may know already, I’m a bit of a geo-location nut-case, so was instantly hooked when I also heard that Foursquare was using MongoDB too.
So where to start…? We first needed some geo-data, and where better to get that than from GeoNames – where you can download the allCountries.zip file (containing 7 Million+ records at over 200 MB), or simply grab the country-specific information for the country you reside.
We will need to get the data into MySQL, not only so that we can compare the speeds for proximity searches, but also so that we can learn a valuable lesson on how to export from MySQL into MongoDB, which really is both flexible and easy.
If we plan to use the GeoNames data, we MUST first create a table using the following SQL (with the field names being specifically important for the next stage):
CREATE TABLE `geonames` (
`geonameid` int(10) unsigned NOT NULL default '0',
`name` varchar(200) NOT NULL default '',
`ansiname` varchar(200) NOT NULL default '',
`alternatenames` varchar(2000) NOT NULL default '',
`latitude` double NOT NULL default '0',
`longitude` double NOT NULL default '0',
`feature_class` char(1) ,
`feature_code` varchar(10) ,
`country_code` char(2),
`cc2` varchar(60),
`admin1_code` varchar(20) default '',
`admin2_code` varchar(80) default '',
`admin3_code` varchar(20) default '',
`admin4_code` varchar(20) default '',
`population` bigint(11) default '0',
`elevation` int(11) default '0',
`gtopo30` int(11) default '0',
`timezone` varchar(40),
`modification_date` date default '0000-00-00',
PRIMARY KEY (`geonameid`)
) CHARACTER SET utf8 ;
With the geonames table created, we can import the data contained within the TXT file we just downloaded from GeoNames by using SQL similar to the following (where you will need to replace the location of the TXT file to the location you just extracted it to from the ZIP file you downloaded):
LOAD DATA INFILE 'C:/xampp/htdocs/allCountries.txt' INTO TABLE geonames(
geonameid, name, ansiname, alternatenames, latitude, longitude, feature_class, feature_code, country_code, cc2, admin1_code, admin2_code, admin3_code, admin4_code, population, elevation, gtopo30, timezone, modification_date
)
We should now have over 7 million places in our database (assuming you were brave enough to use the allCountries.txt file, which to be honest, may be too big for the next stages and cause you several problems like it did when running it on my personal PC at home). Once you have the records in your MySQL database, do not forget to add an index to the “latitude” and “longitude” fields in order to give the tests a fair chance later.
At this stage, we should probably start thinking about getting MongoDB set-up, and fortunately, the online documentation and general community-spirit for MongoDB is a jolly-fine one at that. For the purpose of this comparison, we will be using MongoDB and PHP to test the results, but in a future “How To”, I also hope to document my findings for NodeJS too!
Let’s download MongoDB – http://www.mongodb.org/downloads – then run-through the getting-started guides – http://www.mongodb.org/display/DOCS/Quickstart. I found set-up on the PC and Mac equally as easy. Simply download the package, save somewhere sensible, create the necessary “data” folder, then run the two recommended console executables “mongod.exe” and “mongo.exe” and if 3+3 equals 6 everything’s working properly! For the benefit of getting to the meat and bones of this How To, I’m going to assume that setting-up MongoDB was a breeze and that you were then capable of also copying the correct .dll (for windows) or .so (for Mac) into the correct “extension” folder as specified by your php.ini file and documented here – PHP Drivers for Mongo.
We are now ready to import the MySQL data into MongoDB, and although the method I am about to show you is not the preferred method for large data-sets (which should use the import function), showing you this way teaches you the most fundamental MongoDB functions, which includes adding new collections (tables) and objects (records), but also updating those objects and then retrieving the objects, all from PHP in syntax we as PHP developers should be very comfortable with. In fact, the following code provides everything you need in order to not only import the data from MySQL to MongoDB, but also to run comparative tests to see how fast each of the databases can run and return the same results.
Before you begin, please note that importing all 7 Million records using the methods below from MySQL to MongoDB is going to take some time, several hours, if not all day (and then some, especially using these methods), so if you are doing this as a way to get started and want some immediate feedback, you REALLY should consider importing one of the country specific .txt files from GeoNames, or using the country_code and (or) limit variables to migrate only a portion of the whole MySQL data into MongoDB.
Either way, feel free to use, improve and comment on the following code as required (as a reminder, this is not the best way to import large data-sets to MongoDB but have done it this way so that it’s easy to understand the fundamentals):
// We need this function for running stop-watch later...
function ms_microtime_float(){
list($usec, $sec) = explode(" ", microtime());
return ((float)$usec + (float)$sec);
}
// We need this function for making the most optimised MySQL proximity query available...
// Simply provide a database name, your current Lat / Lng and optional DB Limits and Maximum Distances...
function ms_get_locations_by_distance_using_sql($database_name='geonames',$lat,$lng,$limit=100,$distance=100,$country_specific=false){
if($country_specific){ $extra_where_clause = "AND country_code = 'MY'"; }else{ $extra_where_clause=''; }
if($limit>0){ $limit = 'LIMIT 0, '.$limit; }else{ $limit=NULL; }
$lng1 = $lng - $distance / abs(cos(deg2rad( $lat ) ) *69 );
$lng2 = $lng + $distance / abs(cos(deg2rad( $lat ) ) *69 );
$lat1 = $lat - ( $distance /69 );
$lat2 = $lat + ( $distance /69 );
/* THIS REALLY IS THE BEST WAY TO PERFORM PROXIMITY SEARCH IN MYSQL */
$nearest_locations_ordered_by_distance = "SELECT *,
( 6371 * acos( cos( radians( $lat ) ) * cos( radians( latitude ) ) * cos( radians( longitude ) - radians( $lng ) ) + sin( radians( $lat ) ) * sin( radians( latitude ) ) ) ) AS distance
FROM $database_name
WHERE longitude between $lng1 and $lng2
AND latitude between $lat1 and $lat2
$extra_where_clause
HAVING distance < $distance
ORDER BY distance
$limit";
return $nearest_locations_ordered_by_distance;
}
/*
* THESE ARE OUR OPTIONS
*
* In order to initially import data from MySQL to MongoDO use:
*
* $collect_mongo_results = false;
* $loop_through_sql = true; // SET TO FALSE IF ONLY INTERESTED IN MONGO RESULTS
* $add_to_mongo = true;
* $print_arrays = false;
*
* In order to compare times for collecting results in both MySQL and MongoDB use:
*
* $collect_mongo_results = true;
* $loop_through_sql = true; // SET TO FALSE IF ONLY INTERESTED IN MONGO RESULTS
* $add_to_mongo = false;
* $print_arrays = true;
*
*/
$collect_mongo_results = false;
$loop_through_sql = true;
$add_to_mongo = true;
$print_arrays = false; // Be careful, with 7 million results, this can take time
$limit = -1; // Set this to -1 if you wish to import everything from MySQL to MongoDB else add limit as number
$maximum_distance = 100; // This represents the maximum distance in KM for proximity search
$database_name = 'geonames'; // This will also be used for the MongoDB name
$table_name = 'countries'; // This will also be used as the MongoDB collection name
$mysql_host = 'localhost';
$mysql_username = 'root';
$mysql_password = '';
// These default Lat / Lng Coordinates are for Malaysia
$current_latitude = 3.190908; // Do not use a string!
$current_longitude = 101.682243; // Do not use a string!
// This country code is for Malaysia
// Change to false if wanting to run queries on all records...
$country_code = false; // This is used to perform sub-set queries on specific countries...
/* END OF OPTIONS */
// Connect to Mongo
$m = new Mongo();
// Select / Create a Database
$db = $m->$database_name;
// Select / Create a Collection (table)
$collection = $db->$table_name;
/* THIS ALLOWS SCRIPTS TO RUN WITHOUT TIMING OUT */
set_time_limit(0);
if($collect_mongo_results){
/* START TIMING MONGO */
$time_start = ms_microtime_float();
// Start building MongoDB query
/* PAY EXTRA ATTENTION TO ORDER OF LONGITUDE THEN LATITUDE */
/* THIS IS VERY IMPORTANT FOR MONGODB - OTHERWISE RESULTS WILL BE WRONG !!! */
$this_latlng = array('lng'=>$current_longitude,'lat'=>$current_latitude);
$mongo_near_query = array('geoNear'=>$table_name,'near'=>$this_latlng,'$spherical'=>true,'$maxDistance'=>$maximum_distance/6378,'num'=>$limit);
$cursor = $db->command($mongo_near_query);
/* THE SAME QUERY CAN BE RUN IN MONGO CONSOLE USING THE FOLLOWING */
//-> db.runCommand({geoNear:"countries",near:[101,3],spherical:true,maxDistance:100/6378,num:10})
if($print_arrays){
// iterate through the results
foreach ($cursor['results'] as $obj) {
echo '<br />This is MongoDB Array:<br />';
echo '<pre>';
print_r($obj);
echo '</pre>';
}
}
/* END MONGO TIMER AND DISPLAY RESULTS */
$time_end = ms_microtime_float();
$time = $time_end - $time_start;
$mongo_results = "<br />Collected and Printed using MongoDB in $time seconds<br />";
echo $mongo_results;
}
if($loop_through_sql){
/* START SQL TIMER */
$time_start = ms_microtime_float();
// Connect to MySQL
$link = mysql_connect($mysql_host, $mysql_username, $mysql_password);
if (!$link) {
die('Could not connect: ' . mysql_error());
}
// Connect to
if (!mysql_select_db($database_name)) {
die('Could not select database: ' . mysql_error());
}
/* WHICH QUERY TO RUN...? DEPENDS ON OPTIONS ABOVE */
if($add_to_mongo){
if($country_code){ $where_clause = "WHERE country_code = 'MY'"; }
$query = 'SELECT * FROM '.$table_name.$where_clause;
}else{
$query = ms_get_locations_by_distance_using_sql($table_name,$current_latitude,$current_longitude,$limit,$maximum_distance,$country_code);
}
// Perform Query
$result = mysql_query($query);
// Check result
// This shows the actual query sent to MySQL, and the error. Useful for debugging.
if (!$result) {
$message = 'Invalid query: ' . mysql_error() . "\n";
$message .= 'Whole query: ' . $query;
die($message);
}
// Use result
// Attempting to print $result won't allow access to information in the resource
// One of the mysql result functions must be used
// See also mysql_result(), mysql_fetch_array(), mysql_fetch_row(), etc.
while ($row = mysql_fetch_assoc($result)) {
if($print_arrays){
echo '<br />This is MySQL Array:<br />';
echo '<pre>';
print_r($row);
echo '</pre>';
}
/* END SQL TIMER */
$time_end = ms_microtime_float();
$time = $time_end - $time_start;
/* NOW NEED TO ADD DATA TO MONGO TOO */
if($add_to_mongo){
$obj = array();
foreach($row as $key => $value){
$obj[$key] = $value;
}
// This is the MongoDB PHP Function for inserting objects (records)
$collection->insert($obj);
}
// In order to use MongoDB Geo-Spatial Indexing we need a Lng / Lat field
// This shows us how to use the MongoDB PHP Update function
$collection->update(array("geonameid" => $row['geonameid']), array('$set' => array("latlng" => array('lng'=>(float)$row['longitude'],'lat'=>(float)$row['latitude']))));
// Create Geo-Spaital Index In Mongo Console (once this field has been added) using:
// db.countries.ensureIndex({latlng:'2d'})
}
/* THEN PRINT RESULTS */
echo "<br />Collected and Printed using MySQL in $time seconds<br />";
if($collect_mongo_results){
echo "WHERE AS ".$mongo_results;
}
// Free the resources associated with the result set
// This is done automatically at the end of the script
mysql_free_result($result);
mysql_close($link);
}
Several things to remember before running or after checking the results of the comparisons is whether you remembered to added the MySQL indexes to your latitude and longitudes, not to mention whether you added the geo-spatial index to your MongoDB, to do which, you will need to use the following command from within the Mongo Console (geeky I know – there are some GUIs, but this is much easier than it looks):
db.countries.ensureIndex({latlng:'2d'})
Some other simple yet useful commands for the console include dropping databases:
db.geonames.drop()
By default, opening the Mongo Console launches the “test” database, so to switch from that to geonames is as simple as typing:
use geonames
In conclusion, MongoDB is FAST. In searching through 65,411 records / objects (as I myself only imported the places for Malaysia in order to save time and get to the learning part), and displaying only those within a 100 KM radius, and then having those results limited to 10, ordered by distance with the closest first, and echoed out onto the page took 0.23555 seconds for MySQL, where as that very same query providing the very same results took only 0.00152 seconds with MongoDB – and on several occasions, it even came in around the 0.00095 mark! And to think that that’s with just one user accessing the query. The biggest problem with MySQL and the reason I initially sought alternatives was the fact that only one person can access the database at one time, where concurrent queries get added to a queue and need to wait their turn. With MongoDB, this is not the case, so really, truth be known, there is no comparison between them! But we will be running some concurrent tests soon, as well as running the same queries using the complete 7 Million record set…
But how do we import all 7 Million+ records and perform tests on those…?
By NOT using the methods above! I tried several times and several different ways to import all 7 million records, and the quickest most reliable method I could find was to first export the MySQL as CSV. I tried several times to use a mysqldump from console, but could never get the formatting right. Believe it or not, the most Mongo-Friendly method was to export from phpMyAdmin in CSV for MS Excel format (with the first row being field names) as this was able to then utilize the mongoimport function. However, whenever I tried to export the entire set, I ran into problems. What I ended-up doing was exporting a million records at a time as CSV.Once you have all 8 CSV files, you can then import them into MongoDB using the following console commands:
mongoimport --db geonames --collection countries --type csv --file geonames_part1.csv --headerline --upsert
Do this for each part, and remember this must be done from a standard terminal console, not from the MongoDB console, which was my first mistake! After we have all 7 Million plus records into the database, we are going to need to add in the latlng / loc field, which does not yet exist and is needed by MongoDB for the geo-spatial indexing. To do this, the quickest method I found was to use a JS query. To do that, you will first need to create a new JavaScript with contents such as:
db.countries.find().forEach(function(data) {
db.countries.update({_id:data._id},{$set:{loc:{lng:parseFloat(data.longitude),lat:parseFloat(data.latitude)}}});
})
We then need to query the MongoDB using that JS file, which is perhaps my most favourite MongoDB feature and can be performed from a regular terminal (not the MongoDB console) using syntax such as:
mongo geonames add_latlng.js
Once we have that in place, we can add the Geo-Spatial Index (from the Mongo Console) by typing:
db.countries.ensureIndex({loc:"2d"})
The results are phenomenal:
Querying 7.8 Million records using MySQL and finding the 10 nearest locations ordered by distance took an average of 165 seconds!
With MongoDB, that same query took only 0.02 seconds!
Other than that, what’s next…?
