Sparse Index: MongoDB

All sparse indexes are unique index, but not all unique indexes are sparse! There are situations where we want to create unique index on key/field which is not present in all documents. Wherever the key is not present, it’s value will be treated as NULL. If more than 1 document has NULL value to the key we want to make as unique, then it violates unique key rule. In such situations we can make use of Sparse index and create unique key on only those documents which has the key in it.

sparse-index-unique-key-mongodb

example: database name
company: collection name

Insert documents

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
MongoDB shell version: 2.6.1
connecting to: test
> use example
switched to db example
 
> db.company.insert({name: "Apple", product: "iPhone"});
WriteResult({ "nInserted" : 1 })
 
> db.company.insert({name: "Motorola", product: "Smart Watch"});
WriteResult({ "nInserted" : 1 })
 
> db.company.insert({name: "Technotip"});
WriteResult({ "nInserted" : 1 })
 
> db.company.insert({name: "Google"});
WriteResult({ "nInserted" : 1 })
 
 
> db.company.find().pretty()
{
        "_id" : ObjectId("53d9f5125d1942042b4e092b"),
        "name" : "Apple",
        "product" : "iPhone"
}
{
        "_id" : ObjectId("53d9f52d5d1942042b4e092c"),
        "name" : "Motorola",
        "product" : "Smart Watch"
}
{ "_id" : ObjectId("53d9f5385d1942042b4e092d"), "name" : "Technotip" }
{ "_id" : ObjectId("53d9f53f5d1942042b4e092e"), "name" : "Google" }

We have 4 documents in “company” collection. First 2 documents has “name” and “product” keys. Last 2 documents has only “name” key.

Sparse Index – Unique Key: MongoDB


[youtube https://www.youtube.com/watch?v=6pxI5j6TZRU]

YouTube Link: https://www.youtube.com/watch?v=6pxI5j6TZRU [Watch the Video In Full Screen.]



Duplicate key error

1
2
3
4
5
6
7
8
9
> db.company.ensureIndex({product: 1}, {unique: true});
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 1,
        "ok" : 0,
        "errmsg" : "E11000 duplicate key error index: example.company.$product_1
  dup key: { : null }",
        "code" : 11000
}

Since last 2 documents do not have “product” key, its value will be treated as “NULL”. Since both these documents have value of “product” as “NULL”, trying to create unique key on “product” throws duplicate key error.

Sparse Index

1
2
3
4
5
6
7
> db.company.ensureIndex({product: 1}, {unique: true, sparse: true});
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 1,
        "numIndexesAfter" : 2,
        "ok" : 1
}

This creates a sparse index on “product” key/field.

system.indexes collection

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
> db.system.indexes.find().pretty()
{
        "v" : 1,
        "key" : {
                "_id" : 1
        },
        "name" : "_id_",
        "ns" : "example.company"
}
{
        "v" : 1,
        "unique" : true,
        "key" : {
                "product" : 1
        },
        "name" : "product_1",
        "ns" : "example.company",
        "sparse" : true
}

The “system.indexes” collection shows that the key “product” is a unique key and is a sparse index.

sort() method

1
2
3
4
5
6
7
8
9
10
11
12
13
> db.company.find().sort({product: 1}).pretty();
{ "_id" : ObjectId("53d9f5385d1942042b4e092d"), "name" : "Technotip" }
{ "_id" : ObjectId("53d9f53f5d1942042b4e092e"), "name" : "Google" }
{
        "_id" : ObjectId("53d9f52d5d1942042b4e092c"),
        "name" : "Motorola",
        "product" : "Smart Watch"
}
{
        "_id" : ObjectId("53d9f5125d1942042b4e092b"),
        "name" : "Apple",
        "product" : "iPhone"
}

This command sorts all the documents in lexicographical order. The output includes all the documents. Those documents which do not have “product” key in them are listed first.

Basic Cursor

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
> db.company.find().sort({product: 1}).explain()
{
        "cursor" : "BasicCursor",
        "isMultiKey" : false,
        "n" : 4,
        "nscannedObjects" : 4,
        "nscanned" : 4,
        "nscannedObjectsAllPlans" : 4,
        "nscannedAllPlans" : 4,
        "scanAndOrder" : true,
        "indexOnly" : false,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "millis" : 0,
        "server" : "Satish-PC:27017",
        "filterSet" : false
}

Since the command is even listing/sorting the documents which do not have “product” key in them, it’s simply making use of Basic Cursor – which do not help in optimizing the command/query performance.

hint() method

1
2
3
4
5
6
7
8
9
10
11
> db.company.find().sort({product: 1}).hint({product: 1}).pretty();
{
        "_id" : ObjectId("53d9f52d5d1942042b4e092c"),
        "name" : "Motorola",
        "product" : "Smart Watch"
}
{
        "_id" : ObjectId("53d9f5125d1942042b4e092b"),
        "name" : "Apple",
        "product" : "iPhone"
}

hint() method tells the mongoDB server to operate only on the sparse key.. hence only retrieving and operating on the documents which has the sparse key in them.

Btree Cursor

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
> db.company.find().sort({product: 1}).hint({product: 1}).explain()
{
        "cursor" : "BtreeCursor product_1",
        "isMultiKey" : false,
        "n" : 2,
        "nscannedObjects" : 2,
        "nscanned" : 2,
        "nscannedObjectsAllPlans" : 2,
        "nscannedAllPlans" : 2,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "millis" : 0,
        "indexBounds" : {
                "product" : [
                        [
                                {
                                        "$minElement" : 1
                                },
                                {
                                        "$maxElement" : 1
                                }
                        ]
                ]
        },
        "server" : "Satish-PC:27017",
        "filterSet" : false
}

After using hint() method, mongoDB server is only looking for documents which has the sparse index key in them, so it can directly look for the sparse index key inside “system.indexes” collection, hence it’s using Btree Cursor – and is efficient.

Cursor Object: MongoDB

Lets have a deeper look into the MongoDB cursor object.

cursor-object-mongodb

Documents in our collection
test database, names collection.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
> use test
switched to db test
> show collections
names
system.indexes
> db.names.find();
{ "_id" : ObjectId("53be5d4604cc1cb0a7bfc3c0"), "name" : "Alia" }
{ "_id" : ObjectId("53be5d5204cc1cb0a7bfc3c1"), "name" : "Bebo" }
{ "_id" : ObjectId("53be5d5904cc1cb0a7bfc3c2"), "name" : "Chameli" }
{ "_id" : ObjectId("53be5d6104cc1cb0a7bfc3c3"), "name" : "Dev D" }
{ "_id" : ObjectId("53be5d6804cc1cb0a7bfc3c4"), "name" : "Emli" }
{ "_id" : ObjectId("53be5d8604cc1cb0a7bfc3c5"), "name" : "Farhan" }
{ "_id" : ObjectId("53be5d9204cc1cb0a7bfc3c6"), "name" : "Gangs" }
{ "_id" : ObjectId("53be5d9904cc1cb0a7bfc3c7"), "name" : "Hum" }
{ "_id" : ObjectId("53be5e3704cc1cb0a7bfc3c8"), "name" : 25 }

We have 8 documents with name in alphabetical order, and 1 document with name as 25. So in total we have 9 documents in our names collection.

Establishing Cursor

1
2
3
4
5
6
7
8
9
10
11
12
> var cur = db.names.find();
 
> cur
{ "_id" : ObjectId("53be5d4604cc1cb0a7bfc3c0"), "name" : "Alia" }
{ "_id" : ObjectId("53be5d5204cc1cb0a7bfc3c1"), "name" : "Bebo" }
{ "_id" : ObjectId("53be5d5904cc1cb0a7bfc3c2"), "name" : "Chameli" }
{ "_id" : ObjectId("53be5d6104cc1cb0a7bfc3c3"), "name" : "Dev D" }
{ "_id" : ObjectId("53be5d6804cc1cb0a7bfc3c4"), "name" : "Emli" }
{ "_id" : ObjectId("53be5d8604cc1cb0a7bfc3c5"), "name" : "Farhan" }
{ "_id" : ObjectId("53be5d9204cc1cb0a7bfc3c6"), "name" : "Gangs" }
{ "_id" : ObjectId("53be5d9904cc1cb0a7bfc3c7"), "name" : "Hum" }
{ "_id" : ObjectId("53be5e3704cc1cb0a7bfc3c8"), "name" : 25 }

Look at the contents of our cursor object cur.

hasNext() and next() methods on Cursor

1
2
3
4
5
6
7
8
9
> var cur = db.names.find();
> cur.hasNext();
true
> cur.next()
{ "_id" : ObjectId("53be5d4604cc1cb0a7bfc3c0"), "name" : "Alia" }
> cur.next()
{ "_id" : ObjectId("53be5d5204cc1cb0a7bfc3c1"), "name" : "Bebo" }
> cur.next()
{ "_id" : ObjectId("53be5d5904cc1cb0a7bfc3c2"), "name" : "Chameli" }

If there are any documents to iterate inside cursor object, then hasNext() will return true orelse it’ll return false. If hasNext() returns true, then we can iterate through the documents using next() method on the cursor object.

sort() method on Cursor Object

1
2
3
4
5
6
7
8
9
10
11
12
> var cur = db.names.find();
 
> cur.sort({"name": -1});
{ "_id" : ObjectId("53be5d9904cc1cb0a7bfc3c7"), "name" : "Hum" }
{ "_id" : ObjectId("53be5d9204cc1cb0a7bfc3c6"), "name" : "Gangs" }
{ "_id" : ObjectId("53be5d8604cc1cb0a7bfc3c5"), "name" : "Farhan" }
{ "_id" : ObjectId("53be5d6804cc1cb0a7bfc3c4"), "name" : "Emli" }
{ "_id" : ObjectId("53be5d6104cc1cb0a7bfc3c3"), "name" : "Dev D" }
{ "_id" : ObjectId("53be5d5904cc1cb0a7bfc3c2"), "name" : "Chameli" }
{ "_id" : ObjectId("53be5d5204cc1cb0a7bfc3c1"), "name" : "Bebo" }
{ "_id" : ObjectId("53be5d4604cc1cb0a7bfc3c0"), "name" : "Alia" }
{ "_id" : ObjectId("53be5e3704cc1cb0a7bfc3c8"), "name" : 25 }

We could modify the cursor object using methods like sort(), limit() and skip(). In above example, we are modifying cursor object using sort() method, and we are sorting it in reverse lexicographical order on the name field.

limit() method on Cursor Object

1
2
3
4
5
> var cur = db.names.find();
> cur.limit(3);
{ "_id" : ObjectId("53be5d4604cc1cb0a7bfc3c0"), "name" : "Alia" }
{ "_id" : ObjectId("53be5d5204cc1cb0a7bfc3c1"), "name" : "Bebo" }
{ "_id" : ObjectId("53be5d5904cc1cb0a7bfc3c2"), "name" : "Chameli" }

We could limit the output/result using limit() method.

Chaining method on Cursor Object

1
2
3
4
5
6
7
8
> var cur = db.names.find();
 
> cur.sort({"name": -1}).limit(5).skip(2);
{ "_id" : ObjectId("53be5d8604cc1cb0a7bfc3c5"), "name" : "Farhan" }
{ "_id" : ObjectId("53be5d6804cc1cb0a7bfc3c4"), "name" : "Emli" }
{ "_id" : ObjectId("53be5d6104cc1cb0a7bfc3c3"), "name" : "Dev D" }
{ "_id" : ObjectId("53be5d5904cc1cb0a7bfc3c2"), "name" : "Chameli" }
{ "_id" : ObjectId("53be5d5204cc1cb0a7bfc3c1"), "name" : "Bebo" }

Here we chain the methods sort(), limit() and skip(). We are sorting in reverse lexicographical order on the name field, then skipping the first 2 documents and then limiting the result/output to 5 documents.

The order in which these 3 methods are applied are: First sort, then skip and then limit.
Also note that, these methods modify cursor object at the server side and not on client site.

Cursor Object: MongoDB


[youtube https://www.youtube.com/watch?v=_LexQW0vSlQ]

YouTube Link: https://www.youtube.com/watch?v=_LexQW0vSlQ [Watch the Video In Full Screen.]



explain() method on Cursor Object

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
> db.names.find().explain();
{
        "cursor" : "BasicCursor",
        "isMultiKey" : false,
        "n" : 9,
        "nscannedObjects" : 9,
        "nscanned" : 9,
        "nscannedObjectsAllPlans" : 9,
        "nscannedAllPlans" : 9,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "millis" : 0,
        "server" : "PC:27017",
        "filterSet" : false
}

explain() method shows that find() returns a basic cursor. More on explain() method in coming videos.

Programmatic way of printing Cursor Object content

1
2
3
4
5
6
7
8
9
10
11
12
> var cur = db.names.find();
 
> while(cur.hasNext()) printjson(cur.next());
{ "_id" : ObjectId("53be5d4604cc1cb0a7bfc3c0"), "name" : "Alia" }
{ "_id" : ObjectId("53be5d5204cc1cb0a7bfc3c1"), "name" : "Bebo" }
{ "_id" : ObjectId("53be5d5904cc1cb0a7bfc3c2"), "name" : "Chameli" }
{ "_id" : ObjectId("53be5d6104cc1cb0a7bfc3c3"), "name" : "Dev D" }
{ "_id" : ObjectId("53be5d6804cc1cb0a7bfc3c4"), "name" : "Emli" }
{ "_id" : ObjectId("53be5d8604cc1cb0a7bfc3c5"), "name" : "Farhan" }
{ "_id" : ObjectId("53be5d9204cc1cb0a7bfc3c6"), "name" : "Gangs" }
{ "_id" : ObjectId("53be5d9904cc1cb0a7bfc3c7"), "name" : "Hum" }
{ "_id" : ObjectId("53be5e3704cc1cb0a7bfc3c8"), "name" : 25 }

While loop executes until cur.hasNext() returns true. Until cur.hasNext() is true, cur.next() keeps printing next document in the cursor cur.