MongoDB

explain() method: MongoDB

We’ve seen the working of explain() method briefly in previous video tutorials – in this video tutorial lets dive in to learn more about this useful method, which helps us analyze and optimize our MongoDB commands.

temp: database name
name: collection name

Inserting 1000 Documents

MongoDB shell version: 2.6.1
connecting to: test
 
> use temp
switched to db temp
 
> for(i = 1; i < = 1000; i++) db.name.insert({a: i, b: i, c: i});
WriteResult({ "nInserted" : 1 })

Here we inserted 1000 documents using for loop.

Basic Cursor

> db.name.find({a: 5})
{ "_id" : ObjectId("53dcd070340ac74e061d7215"), "a" : 5, "b" : 5, "c" : 5 }
 
> db.name.find({a: 5}).explain()
{
        "cursor" : "BasicCursor",
        "isMultiKey" : false,
        "n" : 1,
        "nscannedObjects" : 1000,
        "nscanned" : 1000,
        "nscannedObjectsAllPlans" : 1000,
        "nscannedAllPlans" : 1000,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 7,
        "nChunkSkips" : 0,
        "millis" : 1,
        "server" : "Satish-PC:27017",
        "filterSet" : false
}

Here MongoDB server is returning a basic cursor, as we do not have index on field “a”. Also note that, it’s scanning 1000 documents in the “name” collection and scanning 1000 index keys in “system.indexes” collection, where the default key is “_id”.

explain() method: MongoDB

[youtube https://www.youtube.com/watch?v=q59nF4ziW-w]

YouTube Link: https://www.youtube.com/watch?v=q59nF4ziW-w [Watch the Video In Full Screen.]

Creating Index on fields “a” and “b”

> db.name.ensureIndex({a: 1, b: 1});
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 1,
        "numIndexesAfter" : 2,
        "ok" : 1
}
 
> db.name.find({a: 5}).explain()
{
        "cursor" : "BtreeCursor a_1_b_1",
        "isMultiKey" : false,
        "n" : 1,
        "nscannedObjects" : 1,
        "nscanned" : 1,
        "nscannedObjectsAllPlans" : 1,
        "nscannedAllPlans" : 1,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "millis" : 0,
        "indexBounds" : {
                "a" : [
                        [
                                5,
                                5
                        ]
                ],
                "b" : [
                        [
                                {
                                        "$minElement" : 1
                                },
                                {
                                        "$maxElement" : 1
                                }
                        ]
                ]
        },
        "server" : "Satish-PC:27017",
        "filterSet" : false
}

Here we create index on key “a” and “b” – it’s called compound key – after which the explain() method shows that MongoDB server is returning a Btree cursor, meaning it’s making use of index key to execute the query/command.

Also not the number of objects scanned in the “name” collection as well as in the “system.indexes” collection.

indexOnly: true

> db.name.find({a: 5})
{ "_id" : ObjectId("53dcd070340ac74e061d7215"), "a" : 5, "b" : 5, "c" : 5 }
 
> db.name.find({a: 5}, {a: 1, b: 1, _id: 0})
{ "a" : 5, "b" : 5 }
 
> db.name.find({a: 5}, {a: 1, b: 1, _id: 0}).explain()
{
        "cursor" : "BtreeCursor a_1_b_1",
        "isMultiKey" : false,
        "n" : 1,
        "nscannedObjects" : 0,
        "nscanned" : 1,
        "nscannedObjectsAllPlans" : 0,
        "nscannedAllPlans" : 1,
        "scanAndOrder" : false,
        "indexOnly" : true,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "millis" : 0,
        "indexBounds" : {
                "a" : [
                        [
                                5,
                                5
                        ]
                ],
                "b" : [
                        [
                                {
                                        "$minElement" : 1
                                },
                                {
                                        "$maxElement" : 1
                                }
                        ]
                ]
        },
        "server" : "Satish-PC:27017",
        "filterSet" : false
}

Covered Index or indexOnly
If our command is requesting for values present inside the index, mongo engine will fetch that information from the “system.indexes” collection itself and does not go to the collection where the entire document is present – in our case “name” collection. Since index is on keys “a” and “b” in our case, if we query for “a” and “b” values only, it fetches those values directly from the index information and will not go to “name” collection. Hence the speed of execution of the command is faster, and this type of commands are very efficient in creating optimized web applications.

Note from above explain() result, the number of objects scanned in the “name” collection is 0, and the number of scans made on indexes in only 1.

Index Creation for Production Server: MongoDB

We know the uses of index, how to create indexes, how to delete indexes – now its time to learn the right way of creating index/key in MongoDB for its optimum usage.

temp: database name
stu: collection name

Inserting 10 Million Documents

use temp
switched to db temp
 
for(i=1; i < = 10000000; i++)
db.no.insert({"student_id": i});

This inserts 10 Million documents inside “stu” collection. It takes atleast a minute or so to complete the insertion – patience is key!

index creation in the foreground

> db.stu.ensureIndex({student_id: 1});
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 1,
        "numIndexesAfter" : 2,
        "ok" : 1
}

Since there are 10 Million documents inside “stu” collection, it takes some time to create index on “student_id” key.

Things to note:
1. By default, index creation in MongoDB takes place in the foreground.
2. It’s faster than background index creation.
3. It blocks other write operation
4. Ideal to use when in development system – as you’re the only one writing to the server(database). Also ideal to use when you’ve replica sets – where you’ll have multiple set of mongod with same data set to operate on, in which case you can have index information on a separate replica set which helps in faster and efficient index creation.

Background Index Creation: MongoDB

[youtube https://www.youtube.com/watch?v=PIyBA3LvVDs]

YouTube Link: https://www.youtube.com/watch?v=PIyBA3LvVDs [Watch the Video In Full Screen.]

index creation in the background

> db.stu.ensureIndex({student_id: 1}, {background: true});
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 1,
        "numIndexesAfter" : 2,
        "ok" : 1
}

Simply pass in the option in second parameter to ensureIndex() method – {background: true}

It takes time to complete the process, as there are 10 Million documents in the collection.

Things to note:
1. It’s slower than foreground index creation method.
2. It does not block any other write operation.
3. Ideal in production server/system. As there’ll be other people who’ll be performing read and write operation on the same database server.
4. Ideal when you’re not making use of replica sets yet.

Sparse Index: MongoDB

All sparse indexes are unique index, but not all unique indexes are sparse! There are situations where we want to create unique index on key/field which is not present in all documents. Wherever the key is not present, it’s value will be treated as NULL. If more than 1 document has NULL value to the key we want to make as unique, then it violates unique key rule. In such situations we can make use of Sparse index and create unique key on only those documents which has the key in it.

example: database name
company: collection name

Insert documents

MongoDB shell version: 2.6.1
connecting to: test
> use example
switched to db example
 
> db.company.insert({name: "Apple", product: "iPhone"});
WriteResult({ "nInserted" : 1 })
 
> db.company.insert({name: "Motorola", product: "Smart Watch"});
WriteResult({ "nInserted" : 1 })
 
> db.company.insert({name: "Technotip"});
WriteResult({ "nInserted" : 1 })
 
> db.company.insert({name: "Google"});
WriteResult({ "nInserted" : 1 })
 
 
> db.company.find().pretty()
{
        "_id" : ObjectId("53d9f5125d1942042b4e092b"),
        "name" : "Apple",
        "product" : "iPhone"
}
{
        "_id" : ObjectId("53d9f52d5d1942042b4e092c"),
        "name" : "Motorola",
        "product" : "Smart Watch"
}
{ "_id" : ObjectId("53d9f5385d1942042b4e092d"), "name" : "Technotip" }
{ "_id" : ObjectId("53d9f53f5d1942042b4e092e"), "name" : "Google" }

We have 4 documents in “company” collection. First 2 documents has “name” and “product” keys. Last 2 documents has only “name” key.

Sparse Index – Unique Key: MongoDB

[youtube https://www.youtube.com/watch?v=6pxI5j6TZRU]

YouTube Link: https://www.youtube.com/watch?v=6pxI5j6TZRU [Watch the Video In Full Screen.]

Duplicate key error

> db.company.ensureIndex({product: 1}, {unique: true});
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 1,
        "ok" : 0,
        "errmsg" : "E11000 duplicate key error index: example.company.$product_1
  dup key: { : null }",
        "code" : 11000
}

Since last 2 documents do not have “product” key, its value will be treated as “NULL”. Since both these documents have value of “product” as “NULL”, trying to create unique key on “product” throws duplicate key error.

Sparse Index

> db.company.ensureIndex({product: 1}, {unique: true, sparse: true});
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 1,
        "numIndexesAfter" : 2,
        "ok" : 1
}

This creates a sparse index on “product” key/field.

system.indexes collection

> db.system.indexes.find().pretty()
{
        "v" : 1,
        "key" : {
                "_id" : 1
        },
        "name" : "_id_",
        "ns" : "example.company"
}
{
        "v" : 1,
        "unique" : true,
        "key" : {
                "product" : 1
        },
        "name" : "product_1",
        "ns" : "example.company",
        "sparse" : true
}

The “system.indexes” collection shows that the key “product” is a unique key and is a sparse index.

sort() method

> db.company.find().sort({product: 1}).pretty();
{ "_id" : ObjectId("53d9f5385d1942042b4e092d"), "name" : "Technotip" }
{ "_id" : ObjectId("53d9f53f5d1942042b4e092e"), "name" : "Google" }
{
        "_id" : ObjectId("53d9f52d5d1942042b4e092c"),
        "name" : "Motorola",
        "product" : "Smart Watch"
}
{
        "_id" : ObjectId("53d9f5125d1942042b4e092b"),
        "name" : "Apple",
        "product" : "iPhone"
}

This command sorts all the documents in lexicographical order. The output includes all the documents. Those documents which do not have “product” key in them are listed first.

Basic Cursor

> db.company.find().sort({product: 1}).explain()
{
        "cursor" : "BasicCursor",
        "isMultiKey" : false,
        "n" : 4,
        "nscannedObjects" : 4,
        "nscanned" : 4,
        "nscannedObjectsAllPlans" : 4,
        "nscannedAllPlans" : 4,
        "scanAndOrder" : true,
        "indexOnly" : false,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "millis" : 0,
        "server" : "Satish-PC:27017",
        "filterSet" : false
}

Since the command is even listing/sorting the documents which do not have “product” key in them, it’s simply making use of Basic Cursor – which do not help in optimizing the command/query performance.

hint() method

> db.company.find().sort({product: 1}).hint({product: 1}).pretty();
{
        "_id" : ObjectId("53d9f52d5d1942042b4e092c"),
        "name" : "Motorola",
        "product" : "Smart Watch"
}
{
        "_id" : ObjectId("53d9f5125d1942042b4e092b"),
        "name" : "Apple",
        "product" : "iPhone"
}

hint() method tells the mongoDB server to operate only on the sparse key.. hence only retrieving and operating on the documents which has the sparse key in them.

Btree Cursor

> db.company.find().sort({product: 1}).hint({product: 1}).explain()
{
        "cursor" : "BtreeCursor product_1",
        "isMultiKey" : false,
        "n" : 2,
        "nscannedObjects" : 2,
        "nscanned" : 2,
        "nscannedObjectsAllPlans" : 2,
        "nscannedAllPlans" : 2,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "millis" : 0,
        "indexBounds" : {
                "product" : [
                        [
                                {
                                        "$minElement" : 1
                                },
                                {
                                        "$maxElement" : 1
                                }
                        ]
                ]
        },
        "server" : "Satish-PC:27017",
        "filterSet" : false
}

After using hint() method, mongoDB server is only looking for documents which has the sparse index key in them, so it can directly look for the sparse index key inside “system.indexes” collection, hence it’s using Btree Cursor – and is efficient.

Remove Duplicate Documents: MongoDB

We learnt how to create unique key/index using {unique: true} option with ensureIndex() method. Now lets see how we can create unique key when there are duplicate entries/documents already present inside the collection.

Insert documents

> db.system.indexes.find()
{ "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "foo.test" }
 
 
> db.test.insert({name: "Satish", age: 27});
WriteResult({ "nInserted" : 1 })
 
> db.test.insert({name: "Kiran", age: 28});
WriteResult({ "nInserted" : 1 })
 
> db.test.insert({name: "Satish", age: 27});
WriteResult({ "nInserted" : 1 })

Here we have 3 documents. First and the last document has same value for “name” and “age” fields.

dropDups() To Remove Duplicate Documents: MongoDB

[youtube https://www.youtube.com/watch?v=aQXdtDWKBiU]

YouTube Link: https://www.youtube.com/watch?v=aQXdtDWKBiU [Watch the Video In Full Screen.]

Creating unique key on field “name”

> db.test.ensureIndex({name: 1}, {unique: true});
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 1,
        "ok" : 0,
        "errmsg" : "E11000 duplicate key error index: foo.test.$name_1  dup key:
 { : \"Satish\" }",
        "code" : 11000
}

This creates error, as the collection “test” already has duplicate entries/documents.

Create Unique Key by dropping random duplicate entries

> db.test.ensureIndex({name: 1}, {unique: true, dropDups: true});
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 1,
        "numIndexesAfter" : 2,
        "ok" : 1
}
 
> db.test.find();
{ "_id" : ObjectId("53d8f1268019dce2ce61eb86"), "name" : "Satish", "age" : 27 }
{ "_id" : ObjectId("53d8f12f8019dce2ce61eb87"), "name" : "Kiran", "age" : 28 }

dropDups() method retains only 1 document randomly and deletes/removes/drops all other duplicate entries/documents permanently.

Note: Since the documents are deleted randomly and can not be restored, you need to be very careful while making use of dropDup() method.

Creating Unique Key/index: MongoDB

We have learnt how to create a key/index so far – today lets learn how to create unique key/index in MongoDB.

foo: database name
name: collection name

Primary Key in MongoDB: _id

> db.name.insert({_id: 1, a: 1});
WriteResult({ "nInserted" : 1 })
 
> db.system.indexes.find()
{ "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "foo.name" }
 
> db.name.insert({_id: 1, a: 2});
WriteResult({
        "nInserted" : 0,
        "writeError" : {
                "code" : 11000,
                "errmsg" : "insertDocument :: caused by :: 11000 E11000 
                            duplicate key error index: foo.name.$_id_  dup key: { : 1.0 }"
        }
})

Since “_id” is treated as primary key in mongoDB, we can’t insert duplicate values to it. In above case, we are trying to insert value of “_id” as 1 twice – the second time around it threw an error stating the entered value as duplicate.

Creating Unique Key/index: MongoDB

[youtube https://www.youtube.com/watch?v=QEy1IctH99w]

YouTube Link: https://www.youtube.com/watch?v=QEy1IctH99w [Watch the Video In Full Screen.]

Creating Key/Index

> db.name.insert({a: 1});
WriteResult({ "nInserted" : 1 })
 
> db.name.find()
{ "_id" : ObjectId("53d8cadbbbfe6d81d0bcc364"), "a" : 1 }
{ "_id" : 1, "a" : 1 }
 
> db.name.ensureIndex({a: 1});
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 1,
        "numIndexesAfter" : 2,
        "ok" : 1
}
> db.system.indexes.find()
{ "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "foo.name" }
{ "v" : 1, "key" : { "a" : 1 }, "name" : "a_1", "ns" : "foo.name" }

Here we create index on field “a”.

Inserting duplicate values into key field

> db.name.insert({a: 1});
WriteResult({ "nInserted" : 1 })
 
> db.name.find()
{ "_id" : ObjectId("53d8cadbbbfe6d81d0bcc364"), "a" : 1 }
{ "_id" : 1, "a" : 1 }
{ "_id" : ObjectId("53d8cb4dbbfe6d81d0bcc365"), "a" : 1 }

insert operation simply inserts the duplicate value to field “a” even though its made as a key/index.

Removing documents and Key/Index on field “a”

> db.name.find()
{ "_id" : ObjectId("53d8cadbbbfe6d81d0bcc364"), "a" : 1 }
{ "_id" : 1, "a" : 1 }
{ "_id" : ObjectId("53d8cb4dbbfe6d81d0bcc365"), "a" : 1 }
 
> db.name.remove({a: 1});
WriteResult({ "nRemoved" : 3 })
 
> db.name.find()
 
> db.name.dropIndex({a: 1});
{ "nIndexesWas" : 2, "ok" : 1 }
 
> db.system.indexes.find()
{ "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "foo.name" }

Before implementing unique key on field “a” we need to first remove the duplicate entries present inside our collection orelse it’ll through errors. Here we also remove the index/key on “a”, so that we can create unique key/index on “a”.

Creating Unique key/index on field “a”

> db.name.ensureIndex({a: 1}, {unique: true});
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 1,
        "numIndexesAfter" : 2,
        "ok" : 1
}

To create unique key/index, we need to make use of ensureIndex() method – first parameter being the field name to be made as unique key along with it’s value 1 or -1. 1 signifies ascending order, -1 signifies descending order. The second parameter {unique: true}, specifies that the key/index must be unique key/index, like that of “_id”.

Duplicate key error on our Unique Key!

> db.name.find()
{ "_id" : ObjectId("53d8cb85bbfe6d81d0bcc366"), "a" : 1 }
 
> db.name.insert({a: 1});
WriteResult({
        "nInserted" : 0,
        "writeError" : {
                "code" : 11000,
                "errmsg" : "insertDocument :: caused by :: 11000 E11000 
                            duplicate key error index: foo.name.$a_1  dup key: { : 1.0 }"
        }
})

Now if we try to insert duplicate values into field “a” it throws duplicate key error.