Applying indexes in Mongo DB

For the past year I’ve been consulting at a client where we’ve been using the document oriented NoSQL database Mongo DB in production (currently v2.0.5). Primarily we store PDF documents together with some arbitrary metadata, but in some use cases we also store a lot of completely dynamic documents, where there might be no similar columns shared between documents in the same collection.

For one of our collections, which holds PDF documents in a GridFS structure (~ 1 TB of data/node), we sometimes need to query for documents based on a couple of certain keys. If these keys are not indexed, queries can take a very long time to execute. How indexes are handled are very well explained in the Mongo documentation. Per default, GridFS provides us indexes for the _id, filename + uploadDate fields. To view indexes on the current collection, execute the following command from the Mongo shell:

db.myCollection.getIndexes();

Ideally, you want your indexes to reside completely within RAM. The following command returns the size of the current index:

db.myCollection.totalIndexSize();

To apply a new index for keyX and keyY, execute the following command:

db.myCollection.ensureIndex({"keyX":1, "keyY":1});

Applying the index took roughly about a minute per node in our environment. The index should be displayed when executing db.myCollection.getIndexes();

{
        "v" : 1,
        "key" : {
                "keyX" : 1,
                "keyY" : 1
        },
        "ns" : "myDatabase.myCollection",
        "name" : "keyX_1_keyY_1"
}

After applying the index, assure that the total size of the index is still managable (less than avaiable memory). Now, executing a query based on the indexed fields should yield its result much faster:

db.myCollection.find({"keyX":9281,"keyY":3270});
Tommy Tynjä
@tommysdk