2009/08/24
Mongodb and Symfony, Part 1: Inserts
What is it? A quick review of their site gave me a taste of something that sounded good. I've been working with Symfony and Doctrine for almost a year now and really loving it. Started writing a custom orm for my current employer and found that it really brought to light some issues with relational modeling. For instance, free-form relations between objects. In a RDBMS you've got to setup your long table with key/val pairs that reference other tables and ids and then there's the whole issue of efficiently joining on that. I found that trying to translate that structure into an orm style development tool was challenging to say the least. I wondered if a DODB would offer a better way. Ideally the speed of development AND the speed of the data manipulation would increase. MongoDB has a nice little DBRef class that allows you to reference your way around your collections, effectively hardcoding your joins at the data level. Would it be a nice way to handle creating relationships between objects without premeditation?
Setup
Ok so I wanted to set up some simple benchmarks just to see what we were looking at as far as ease of use and speed for basic dumb things. I setup a basic Symfony/Doctrine project and pulled down the latest version of MongoDB for Linux 64.
I started my mongod server without the web interface since I won't need it for this:
mongod --dbpath=/var/www/sites/mongobjects/web/data/mongodb/ --nohttpinterface --quiet & Sun Aug 23 04:09:56 Mongo DB : starting : pid = 11534 port = 27017 dbpath = /var/www/sites/mongobjects/web/data/mongodb/ master = 0 slave = 0 64-bit Sun Aug 23 04:09:56 db version v0.9.9, pdfile version 4.4 Sun Aug 23 04:09:56 git version: cd0bf23e74ef8ded0684f74996ee762889b93294 Sun Aug 23 04:09:56 sys info: Linux ofc-n1.10gen.com 2.6.23.17-88.fc7 #1 SMP Thu May 15 00:02:29 EDT 2008 x86_64 Sun Aug 23 04:09:56 waiting for connections on port 27017And we're up and running.
Filling our Databases
These tests will be run via a page load in Symfony. So all the code takes place within an actions class method that is triggered by a url and tracked via symfony's developer bar. A simple benchmarking tool that produces a little fluctuation in the numbers, but fine for what we're doing here. It's worth noting that on this system Symfony takes ~24ms to render a page without anything else happening. So all our tests will have an extra ~24ms from Symfony execution time.
Mongo
I wanted a simple single table test setup so I decided to build a user set of 500,000 users. But then I realized toward the end of this article that I my 5 types of pets actually made all my tests insert 2.5 million records. So that's what we're using. Loading data with MongoDB was simple and quick. Just select the db and collection you want and if it's not already in existence it will be when you start writing to it.
$mongo = new Mongo; $users = $mongo->selectDb('objects')->selectCollection('users'); $animals = array('cat','dog','sugar glider','pigeon','dove'); for ($j=0; $j < 50; $j++) { for ($i=0; $i < 10000; $i++) { foreach ($animals as $animal) { $users->save(array( 'number' => $i, 'pet' => $animal, 'name' => 'Jason Mooberry', 'email' => 'jason@barkingsnake.com', )); } } }
Mongo ran through this like it was bored. Occasionally it had to create a new datafile but this was only apparent in the log.
Sun Aug 23 04:23:22 newExtent: objects.users file 0 full, adding a new file Sun Aug 23 04:23:22 allocating new datafile /var/www/sites/mongobjects/web/data/mongodb/objects.1, filling with zeroes... Sun Aug 23 04:23:22 done allocating datafile /var/www/sites/mongobjects/web/data/mongodb/objects.1, size: 134217728, took 0.243 secs
MongoDB uses preallocated data files to store it BSON data. Data files are created in increasingly larger sizes to handle fragmentation more efficiently.
The total insert time on 2.5 million records was 55 seconds. Let's see how MySQL holds up.
MySQL
OK let's see what happens when we create a simple table in MySQL and load the same 2.5 million user set. I created a simple table in Navicat:
CREATE TABLE `user` ( `id` int(11) NOT NULL AUTO_INCREMENT, `name` varchar(255) DEFAULT NULL, `onumber` int(11) DEFAULT NULL, `pet` varchar(255) DEFAULT NULL, `email` varchar(255) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8
I chose InnoDB since MySQL 5.4 is boasting some big improvements in the InnoDB engine. I realize much of that is about large scale systems with complex joins, but let's just see how it does. :)
Running an insert set of this size is an impossibility with this version of Doctrine. It quickly runs out of memory. I think this might be solved by a better garbage collection method in Doctrine 2.0 or perhaps even 1.1, but not in 1.0. So we do it the old fashioned way:
$db = mysql_connect('localhost', 'mongobjects_user', 'm0ng0'); mysql_select_db('mongobjects',$db); $animals = array('cat','dog','sugar glider','pigeon','dove'); for ($j=0; $j < 50; $j++) { for ($i=0; $i < 10000; $i++) { foreach ($animals as $animal) { $qs = " insert into `user` (`onumber`,`pet`,`name`,`email`) values ($i,'$animal','Jason Mooberry','a@b.com') "; mysql_query($qs); } } }
OK so I hit the url and wait... tick tock.. tick tock.. TICK TOCKKKKK... After several minutes I peek at the insert log and see that MySQL is still diligently inserting rows. So I continue to wait and wonder what my max execution time is set at..
After having a conversation with my girlfriend about stuff, MySQL finally crosses the finish line in 18.23 minutes... I realize I have the query log on and that is a bit of a hog, so let's rerun it with the log turned off. bin-log appears to be on as well and since we're not replicating here, I'll turn that off too.
I restart MySQL, truncate the table, and try again.
While I wait I'd like to pontificate on the merits of a fast file system. ReiserFS is the default system on my vps but my recent experiences with XFS have led me to believe it to be a better option... [10 mins go by] ... and so that is why I think Hans Reiser is a jerk! OH um I think we've finished.
MySQL without the logs comes in at 9.96 minutes for inserting 2.5 million records. That's a big difference. Maybe it's worth checking the speed with MyISAM. I change the table to MyISAM, truncate and repeat.
Ok MyISAM puts MySQL back in the running with an insert of 2.5 million records in 2.58 minutes. I guess it's probably a better comparison with MyISAM since MongoDB isn't transactional and we're not dealing with foreign key constraints. Still I was hoping for a much stronger showing from InnoDB..
EDIT: After rummaging through this article today about auto-commit transaction handling in InnoDB and PHP, I decided to give InnoDB another chance to prove it's worth here. So I reran the insert tests inside a single transaction:
$animals = array('cat','dog','sugar glider','pigeon','dove'); $db = mysql_connect('localhost', 'mongobjects_user', 'm0ng0'); mysql_select_db('mongobjects',$db); mysql_query('START TRANSACTION',$db); for ($j=0; $j < 50; $j++) { for ($i=0; $i < 10000; $i++) { foreach ($animals as $animal) { $qs = " insert into `user` (`onumber`,`pet`,`name`,`email`) values ($i,'$animal','Jason Mooberry','a@b.com') "; mysql_query($qs,$db); } } } mysql_query('COMMIT',$db);
And InnoDB is back in this thing... sort of. 2.5 million rows inserted inside a single transaction: 2.45 minutes. I also reran this with the Index specified below and it ran in 3.28 minutes. Our use case may not actually involve this type of scenario so it's of mixed value.
Indexes
Our insert tests were run without an index to slow it down. The inserted records were done in 50 batches of 10,000 batches of 5 records with a number field storing the record number within it's batch. The end result is that the number field will match in 250 records for each number. So a nice ranged query on that column will be a good way to test our search speed. So lets add an index to our number column and rerun our inserts.
Mongo
Adding an index to a collection in MongoDB consists of a php command:
$mongo = new Mongo; $users = $mongo->selectDb('objects')->selectCollection('users'); $users->ensureIndex(array('number'=>1));
Seeting the key 'number' to 1 creates an ascending index.
I've gone ahead and rm'd all the mongo db files so that we get an accurate test.
$mongo = new Mongo; $users = $mongo->selectDb('objects')->selectCollection('users'); $users->ensureIndex(array('number'=>1)); $animals = array('cat','dog','sugar glider','pigeon','dove'); for ($j=0; $j < 50; $j++) { for ($i=0; $i < 10000; $i++) { foreach ($animals as $animal) { $users->save(array( 'number' => $i, 'pet' => $animal, 'name' => 'Jason Mooberry', 'email' => 'a@b.com', )); } } }
With the index on the number column Mongo inserts 2.5 million records in 1.32 minutes. Not bad I guess. Let's see how MySQL does.
MySQL (MyISAM)
I'm sticking with MyISAM for the rest of these tests since I don't care to wait all day for my tests to run and the feature sets match a little better between MongoDB and MyISAM. So I'm just gonna truncate the table and add an index and rerun.
CREATE TABLE `user` ( `id` int(11) NOT NULL AUTO_INCREMENT, `name` varchar(255) CHARACTER SET latin1 DEFAULT NULL, `onumber` int(11) DEFAULT NULL, `pet` varchar(255) CHARACTER SET latin1 DEFAULT NULL, `email` varchar(255) CHARACTER SET latin1 DEFAULT NULL, PRIMARY KEY (`id`), KEY `user_number_idx` (`onumber`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8PHP did not change for this test. MySQL handles the insert with roughly the same increase at 3.02 minutes.
Insert Test Breakdown
Here's the breakdown for our insert tests with the various flavors of MySQL and MongoDB. It's a pretty impressive difference on massive inserts.Type | Time |
---|---|
MongoDb without index | 55 seconds |
MongoDB with single index | 1.32 minutes |
MySQL InnoDB with logging | 18.23 minutes |
MySQL InnoDB | 9.96 minutes |
MySQL InnoDB single transaction | 2.45 minutes |
MySQL InnoDB single transaction, with single index | 3.28 minutes |
MySQL MyISAM without index | 2.58 minutes |
MySQL MyISAM with single index | 3.02 minutes |
System Setup: Some VPS setup with 1and1.com with a gig of dedicated ram Ubuntu 8.04.1 64bit Hardy Symfony 1.2.9 dev updated repository MongoDB Linux 64 0.9.9 (mongodb-linux-x86_64-0.9.9.tgz) Doctrine 1.0.10 (bundled with Symfony) Apache 2.2.8 PHP 5.3.0 with mysqlnd MySQL 5.4 with the my.cnf for medium systems PECL MongoDB package 0.9.4