Next week starts the annual MySQL conference, from which I'll be blogging, so I decided to spice up discussion a bit by asking Kristina and Mike about one of 10gen's pet topics, "from MySQL to MongoDB." This appeals to people who find MySQL too slow or a hassle to manage (even though it's fast and easy to manage compared to most relational databases)--people who want to move an existing project to MongoDB or just start a new one while shedding their old relational thinking.
First a bit of overview: MongoDB is a document store in the (not very hoary) tradition of CouchDB. Even among the category of projects loosely grouped together under the NoSQL umbrella, MongoDB is a fairly young entrant.
MongoDB is growing quickly in popularity because it offers a relatively rich range of features, while (according to its supporters) maintaining impressive speed. The features include built-in indexes (and secondary indexes), range queries, support for replication, and auto-sharding. A Map/Reduce function allows you to add to the aggregate functions natively supported and do large-scale jobs like nightly reports.
The main relational features missing from MongoDB are joins, foreign key constraints, and multi-row transactions.
Because of the particular combination of features supported by MongoDB, the advice in this blog might not apply to other NoSQL solutions.
Kristina and Mike said the migration of an existing project from a relational database goes through four overarching steps. Which do you think is the step that requires the most time and thinking?
- Get to know MongoDB. Download it, read the tutorials, try some toy projects.
- Think about how to represent your model in its document store.
- Migrate the data from the database to MongoDB, probably simply by writing a bunch of SELECT * FROM statements against the database and then loading the data into your MongoDB model using the language of your choice.
- Rewrite your application code to query MongoDB through statements such as insert() or find().
OK, so which step do you think takes the longest? And the answer is...step 2. Design is critical, and there are trade-offs that provide no simple answers but require a careful understanding of your application. Migrating the data and rewriting the application are straightforward by comparison.
Although MongoDB supports arbitrarily large and complex data structures (basically JSON, but in a binary format called BSON), Kristina and Mike say you'd do best to create many different stores for different types of data, just as you'd put them in different tables if you were using a relational database. For instance, in a classic social networking application, you would probably put all information about your users in one document and all your information about their postings in another.
MongoDB documents aren't divided up quite as much as relational databases in third normal form. If you are likely to use a data item in conjunction with a more major item--not on its own--you should probably embed the minor item with the major one. For instance, a relational database for a social networking application would probably have a separate table of tag, which would be represented through foreign keys in the table of postings. But in MongoDB, you'd just embed an array of tags with each posting. Yes, that's redundant. Your budget can handle it.
And querying by tag is still quick and easy. MongoDB has multi-key indexes, so you can index an array of tags and quickly look for all postings containing a particular tag.
Organizing documents by key concepts (user, posting) is relatively intuitive. It is not, however, quite like an object database. MongoDB users don't normally map documents tightly with objects in the application code.
So that's a little help from MongoDB experts for making a move from a relational database to MongoDB. Now I should talk to a MySQL or Drizzle expert about how to extract data from MongoDB into a relational database when you discover you need to do some heavy data mining using joins.