Sparse Index: MongoDB


All sparse indexes are unique index, but not all unique indexes are sparse! There are situations where we want to create unique index on key/field which is not present in all documents. Wherever the key is not present, it’s value will be treated as NULL. If more than 1 document has NULL value to the key we want to make as unique, then it violates unique key rule. In such situations we can make use of Sparse index and create unique key on only those documents which has the key in it.

sparse-index-unique-key-mongodb

example: database name
company: collection name

Insert documents

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
MongoDB shell version: 2.6.1
connecting to: test
> use example
switched to db example
 
> db.company.insert({name: "Apple", product: "iPhone"});
WriteResult({ "nInserted" : 1 })
 
> db.company.insert({name: "Motorola", product: "Smart Watch"});
WriteResult({ "nInserted" : 1 })
 
> db.company.insert({name: "Technotip"});
WriteResult({ "nInserted" : 1 })
 
> db.company.insert({name: "Google"});
WriteResult({ "nInserted" : 1 })
 
 
> db.company.find().pretty()
{
        "_id" : ObjectId("53d9f5125d1942042b4e092b"),
        "name" : "Apple",
        "product" : "iPhone"
}
{
        "_id" : ObjectId("53d9f52d5d1942042b4e092c"),
        "name" : "Motorola",
        "product" : "Smart Watch"
}
{ "_id" : ObjectId("53d9f5385d1942042b4e092d"), "name" : "Technotip" }
{ "_id" : ObjectId("53d9f53f5d1942042b4e092e"), "name" : "Google" }

We have 4 documents in “company” collection. First 2 documents has “name” and “product” keys. Last 2 documents has only “name” key.

Sparse Index – Unique Key: MongoDB



YouTube Link: https://www.youtube.com/watch?v=6pxI5j6TZRU [Watch the Video In Full Screen.]



Duplicate key error

1
2
3
4
5
6
7
8
9
> db.company.ensureIndex({product: 1}, {unique: true});
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 1,
        "ok" : 0,
        "errmsg" : "E11000 duplicate key error index: example.company.$product_1
  dup key: { : null }",
        "code" : 11000
}

Since last 2 documents do not have “product” key, its value will be treated as “NULL”. Since both these documents have value of “product” as “NULL”, trying to create unique key on “product” throws duplicate key error.

Sparse Index

1
2
3
4
5
6
7
> db.company.ensureIndex({product: 1}, {unique: true, sparse: true});
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 1,
        "numIndexesAfter" : 2,
        "ok" : 1
}

This creates a sparse index on “product” key/field.

system.indexes collection

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
> db.system.indexes.find().pretty()
{
        "v" : 1,
        "key" : {
                "_id" : 1
        },
        "name" : "_id_",
        "ns" : "example.company"
}
{
        "v" : 1,
        "unique" : true,
        "key" : {
                "product" : 1
        },
        "name" : "product_1",
        "ns" : "example.company",
        "sparse" : true
}

The “system.indexes” collection shows that the key “product” is a unique key and is a sparse index.

sort() method

1
2
3
4
5
6
7
8
9
10
11
12
13
> db.company.find().sort({product: 1}).pretty();
{ "_id" : ObjectId("53d9f5385d1942042b4e092d"), "name" : "Technotip" }
{ "_id" : ObjectId("53d9f53f5d1942042b4e092e"), "name" : "Google" }
{
        "_id" : ObjectId("53d9f52d5d1942042b4e092c"),
        "name" : "Motorola",
        "product" : "Smart Watch"
}
{
        "_id" : ObjectId("53d9f5125d1942042b4e092b"),
        "name" : "Apple",
        "product" : "iPhone"
}

This command sorts all the documents in lexicographical order. The output includes all the documents. Those documents which do not have “product” key in them are listed first.

Basic Cursor

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
> db.company.find().sort({product: 1}).explain()
{
        "cursor" : "BasicCursor",
        "isMultiKey" : false,
        "n" : 4,
        "nscannedObjects" : 4,
        "nscanned" : 4,
        "nscannedObjectsAllPlans" : 4,
        "nscannedAllPlans" : 4,
        "scanAndOrder" : true,
        "indexOnly" : false,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "millis" : 0,
        "server" : "Satish-PC:27017",
        "filterSet" : false
}

Since the command is even listing/sorting the documents which do not have “product” key in them, it’s simply making use of Basic Cursor – which do not help in optimizing the command/query performance.

hint() method

1
2
3
4
5
6
7
8
9
10
11
> db.company.find().sort({product: 1}).hint({product: 1}).pretty();
{
        "_id" : ObjectId("53d9f52d5d1942042b4e092c"),
        "name" : "Motorola",
        "product" : "Smart Watch"
}
{
        "_id" : ObjectId("53d9f5125d1942042b4e092b"),
        "name" : "Apple",
        "product" : "iPhone"
}

hint() method tells the mongoDB server to operate only on the sparse key.. hence only retrieving and operating on the documents which has the sparse key in them.

Btree Cursor

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
> db.company.find().sort({product: 1}).hint({product: 1}).explain()
{
        "cursor" : "BtreeCursor product_1",
        "isMultiKey" : false,
        "n" : 2,
        "nscannedObjects" : 2,
        "nscanned" : 2,
        "nscannedObjectsAllPlans" : 2,
        "nscannedAllPlans" : 2,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "millis" : 0,
        "indexBounds" : {
                "product" : [
                        [
                                {
                                        "$minElement" : 1
                                },
                                {
                                        "$maxElement" : 1
                                }
                        ]
                ]
        },
        "server" : "Satish-PC:27017",
        "filterSet" : false
}

After using hint() method, mongoDB server is only looking for documents which has the sparse index key in them, so it can directly look for the sparse index key inside “system.indexes” collection, hence it’s using Btree Cursor – and is efficient.

Leave a Reply

Your email address will not be published. Required fields are marked *