2009/08/24

Mongodb and Symfony, Part 1: Inserts

MongoDB

What is it? A quick review of their site gave me a taste of something that sounded good. I've been working with Symfony and Doctrine for almost a year now and really loving it. Started writing a custom orm for my current employer and found that it really brought to light some issues with relational modeling. For instance, free-form relations between objects. In a RDBMS you've got to setup your long table with key/val pairs that reference other tables and ids and then there's the whole issue of efficiently joining on that. I found that trying to translate that structure into an orm style development tool was challenging to say the least. I wondered if a DODB would offer a better way. Ideally the speed of development AND the speed of the data manipulation would increase. MongoDB has a nice little DBRef class that allows you to reference your way around your collections, effectively hardcoding your joins at the data level. Would it be a nice way to handle creating relationships between objects without premeditation?

Setup

Ok so I wanted to set up some simple benchmarks just to see what we were looking at as far as ease of use and speed for basic dumb things. I setup a basic Symfony/Doctrine project and pulled down the latest version of MongoDB for Linux 64.

I started my mongod server without the web interface since I won't need it for this:

mongod --dbpath=/var/www/sites/mongobjects/web/data/mongodb/ --nohttpinterface --quiet &
Sun Aug 23 04:09:56 Mongo DB : starting : pid = 11534 port = 27017 dbpath = /var/www/sites/mongobjects/web/data/mongodb/ master = 0 slave = 0  64-bit
Sun Aug 23 04:09:56 db version v0.9.9, pdfile version 4.4
Sun Aug 23 04:09:56 git version: cd0bf23e74ef8ded0684f74996ee762889b93294
Sun Aug 23 04:09:56 sys info: Linux ofc-n1.10gen.com 2.6.23.17-88.fc7 #1 SMP Thu May 15 00:02:29 EDT 2008 x86_64
Sun Aug 23 04:09:56 waiting for connections on port 27017
And we're up and running.

Filling our Databases

These tests will be run via a page load in Symfony. So all the code takes place within an actions class method that is triggered by a url and tracked via symfony's developer bar. A simple benchmarking tool that produces a little fluctuation in the numbers, but fine for what we're doing here. It's worth noting that on this system Symfony takes ~24ms to render a page without anything else happening. So all our tests will have an extra ~24ms from Symfony execution time.

Mongo

I wanted a simple single table test setup so I decided to build a user set of 500,000 users. But then I realized toward the end of this article that I my 5 types of pets actually made all my tests insert 2.5 million records. So that's what we're using. Loading data with MongoDB was simple and quick. Just select the db and collection you want and if it's not already in existence it will be when you start writing to it.

$mongo = new Mongo;
$users = $mongo->selectDb('objects')->selectCollection('users');
$animals = array('cat','dog','sugar glider','pigeon','dove');

for ($j=0; $j < 50; $j++) {
    for ($i=0; $i < 10000; $i++) {
        foreach ($animals as $animal) {
            $users->save(array(
                'number' => $i,
                'pet' => $animal,
                'name' => 'Jason Mooberry',
                'email' => 'jason@barkingsnake.com',
            ));
        }
    }
}

Mongo ran through this like it was bored. Occasionally it had to create a new datafile but this was only apparent in the log.

Sun Aug 23 04:23:22 newExtent: objects.users file 0 full, adding a new file
Sun Aug 23 04:23:22 allocating new datafile /var/www/sites/mongobjects/web/data/mongodb/objects.1, filling with zeroes...
Sun Aug 23 04:23:22 done allocating datafile /var/www/sites/mongobjects/web/data/mongodb/objects.1, size: 134217728, took 0.243 secs

MongoDB uses preallocated data files to store it BSON data. Data files are created in increasingly larger sizes to handle fragmentation more efficiently.

The total insert time on 2.5 million records was 55 seconds. Let's see how MySQL holds up.

MySQL

OK let's see what happens when we create a simple table in MySQL and load the same 2.5 million user set. I created a simple table in Navicat:

CREATE TABLE `user` (
    `id` int(11) NOT NULL AUTO_INCREMENT,
    `name` varchar(255) DEFAULT NULL,
    `onumber` int(11) DEFAULT NULL,
    `pet` varchar(255) DEFAULT NULL,
    `email` varchar(255) DEFAULT NULL,
    PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

I chose InnoDB since MySQL 5.4 is boasting some big improvements in the InnoDB engine. I realize much of that is about large scale systems with complex joins, but let's just see how it does. :)

Running an insert set of this size is an impossibility with this version of Doctrine. It quickly runs out of memory. I think this might be solved by a better garbage collection method in Doctrine 2.0 or perhaps even 1.1, but not in 1.0. So we do it the old fashioned way:

$db = mysql_connect('localhost', 'mongobjects_user', 'm0ng0');
mysql_select_db('mongobjects',$db);
$animals = array('cat','dog','sugar glider','pigeon','dove');
for ($j=0; $j < 50; $j++) {
    for ($i=0; $i < 10000; $i++) {
        foreach ($animals as $animal) {
            $qs = "
                insert into `user`
                (`onumber`,`pet`,`name`,`email`)
                values
                ($i,'$animal','Jason Mooberry','a@b.com')
            ";
            mysql_query($qs);
        }
    }
}

OK so I hit the url and wait... tick tock.. tick tock.. TICK TOCKKKKK... After several minutes I peek at the insert log and see that MySQL is still diligently inserting rows. So I continue to wait and wonder what my max execution time is set at..

After having a conversation with my girlfriend about stuff, MySQL finally crosses the finish line in 18.23 minutes... I realize I have the query log on and that is a bit of a hog, so let's rerun it with the log turned off. bin-log appears to be on as well and since we're not replicating here, I'll turn that off too.

I restart MySQL, truncate the table, and try again.

While I wait I'd like to pontificate on the merits of a fast file system. ReiserFS is the default system on my vps but my recent experiences with XFS have led me to believe it to be a better option... [10 mins go by] ... and so that is why I think Hans Reiser is a jerk! OH um I think we've finished.

MySQL without the logs comes in at 9.96 minutes for inserting 2.5 million records. That's a big difference. Maybe it's worth checking the speed with MyISAM. I change the table to MyISAM, truncate and repeat.

Ok MyISAM puts MySQL back in the running with an insert of 2.5 million records in 2.58 minutes. I guess it's probably a better comparison with MyISAM since MongoDB isn't transactional and we're not dealing with foreign key constraints. Still I was hoping for a much stronger showing from InnoDB..

EDIT: After rummaging through this article today about auto-commit transaction handling in InnoDB and PHP, I decided to give InnoDB another chance to prove it's worth here. So I reran the insert tests inside a single transaction:

$animals = array('cat','dog','sugar glider','pigeon','dove');
$db = mysql_connect('localhost', 'mongobjects_user', 'm0ng0');
mysql_select_db('mongobjects',$db);
mysql_query('START TRANSACTION',$db);
for ($j=0; $j < 50; $j++) {
    for ($i=0; $i < 10000; $i++) {
        foreach ($animals as $animal) {
            $qs = "
                insert into `user`
                (`onumber`,`pet`,`name`,`email`)
                values
                ($i,'$animal','Jason Mooberry','a@b.com')
            ";
            mysql_query($qs,$db);
        }
    }
}
mysql_query('COMMIT',$db);

And InnoDB is back in this thing... sort of. 2.5 million rows inserted inside a single transaction: 2.45 minutes. I also reran this with the Index specified below and it ran in 3.28 minutes. Our use case may not actually involve this type of scenario so it's of mixed value.

Indexes

Our insert tests were run without an index to slow it down. The inserted records were done in 50 batches of 10,000 batches of 5 records with a number field storing the record number within it's batch. The end result is that the number field will match in 250 records for each number. So a nice ranged query on that column will be a good way to test our search speed. So lets add an index to our number column and rerun our inserts.

Mongo

Adding an index to a collection in MongoDB consists of a php command:

$mongo = new Mongo;
$users = $mongo->selectDb('objects')->selectCollection('users');
$users->ensureIndex(array('number'=>1));

Seeting the key 'number' to 1 creates an ascending index.

I've gone ahead and rm'd all the mongo db files so that we get an accurate test.

$mongo = new Mongo;
$users = $mongo->selectDb('objects')->selectCollection('users');
$users->ensureIndex(array('number'=>1));
$animals = array('cat','dog','sugar glider','pigeon','dove');
for ($j=0; $j < 50; $j++) {
    for ($i=0; $i < 10000; $i++) {
        foreach ($animals as $animal) {
            $users->save(array(
                'number' => $i,
                'pet' => $animal,
                'name' => 'Jason Mooberry',
                'email' => 'a@b.com',
            ));
        }
    }
}

With the index on the number column Mongo inserts 2.5 million records in 1.32 minutes. Not bad I guess. Let's see how MySQL does.

MySQL (MyISAM)

I'm sticking with MyISAM for the rest of these tests since I don't care to wait all day for my tests to run and the feature sets match a little better between MongoDB and MyISAM. So I'm just gonna truncate the table and add an index and rerun.

CREATE TABLE `user` (
    `id` int(11) NOT NULL AUTO_INCREMENT,
    `name` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
    `onumber` int(11) DEFAULT NULL,
    `pet` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
    `email` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
    PRIMARY KEY (`id`),
    KEY `user_number_idx` (`onumber`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
PHP did not change for this test. MySQL handles the insert with roughly the same increase at 3.02 minutes.

Insert Test Breakdown

Here's the breakdown for our insert tests with the various flavors of MySQL and MongoDB. It's a pretty impressive difference on massive inserts.
Type Time
MongoDb without index 55 seconds
MongoDB with single index 1.32 minutes
MySQL InnoDB with logging 18.23 minutes
MySQL InnoDB 9.96 minutes
MySQL InnoDB single transaction 2.45 minutes
MySQL InnoDB single transaction, with single index 3.28 minutes
MySQL MyISAM without index 2.58 minutes
MySQL MyISAM with single index 3.02 minutes
The winner is clear here on inserts. But let's see how MongoDB holds up with Simple Queries, and Relations...
System Setup:

Some VPS setup with 1and1.com with a gig of dedicated ram
Ubuntu 8.04.1 64bit Hardy
Symfony 1.2.9 dev updated repository
MongoDB Linux 64 0.9.9 (mongodb-linux-x86_64-0.9.9.tgz)
Doctrine 1.0.10 (bundled with Symfony)
Apache 2.2.8
PHP 5.3.0 with mysqlnd
MySQL 5.4 with the my.cnf for medium systems
PECL MongoDB package 0.9.4