Remove Duplicate Documents: MongoDB

We learnt how to create unique key/index using {unique: true} option with ensureIndex() method. Now lets see how we can create unique key when there are duplicate entries/documents already present inside the collection.

Insert documents

> db.system.indexes.find()
{ "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "foo.test" }
 
 
> db.test.insert({name: "Satish", age: 27});
WriteResult({ "nInserted" : 1 })
 
> db.test.insert({name: "Kiran", age: 28});
WriteResult({ "nInserted" : 1 })
 
> db.test.insert({name: "Satish", age: 27});
WriteResult({ "nInserted" : 1 })

Here we have 3 documents. First and the last document has same value for “name” and “age” fields.

dropDups() To Remove Duplicate Documents: MongoDB

YouTube Link: https://www.youtube.com/watch?v=aQXdtDWKBiU [Watch the Video In Full Screen.]

Creating unique key on field “name”

> db.test.ensureIndex({name: 1}, {unique: true});
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 1,
        "ok" : 0,
        "errmsg" : "E11000 duplicate key error index: foo.test.$name_1  dup key:
 { : \"Satish\" }",
        "code" : 11000
}

This creates error, as the collection “test” already has duplicate entries/documents.

Create Unique Key by dropping random duplicate entries

> db.test.ensureIndex({name: 1}, {unique: true, dropDups: true});
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 1,
        "numIndexesAfter" : 2,
        "ok" : 1
}
 
> db.test.find();
{ "_id" : ObjectId("53d8f1268019dce2ce61eb86"), "name" : "Satish", "age" : 27 }
{ "_id" : ObjectId("53d8f12f8019dce2ce61eb87"), "name" : "Kiran", "age" : 28 }

dropDups() method retains only 1 document randomly and deletes/removes/drops all other duplicate entries/documents permanently.

Note: Since the documents are deleted randomly and can not be restored, you need to be very careful while making use of dropDup() method.

Leave a Reply Cancel reply