MongoDB Insert Performance

Question 1

I have some data which are aggregated in chunks and later pushed into another collection.

The problem is that aggregated data pushing to another collection is very slow and takes from few hours and i can't figure out how i can optimize it because i can't find the bottleneck.

The machine on which MongoDB is running are single cluster with 24GB of RAM.

Basically whole process looks like this:

// Aggregated resutls which basically is about 17k entries
var cursor = db.collection.runCommand("aggregate" { ... });

Iterate over it:

cursor.result.forEach(function(data, index) {
 var inc = db.runCommand({ findAndModify: 'another_collection', query : { query }, update : { $inc: update } });
 if (inc.value == null) {
 var up = db.another_collection.update(query, { $push : push });
 if (up.nMatched == 0) {
 db.another_collection.insert({});
 }
 }
});

What i'm doing in previously described code, just push'ing values to array or just incrementing it. And finally result in second collection looks like this:

{
 "_id" : {
 "id_1" : "1",
 "id_2" : "2",
 },
 "value" :
 {
 "2_0" : [
 {
 "date" : ISODate("2014-10-01T00:00:00.000Z"),
 "val_1" : 1,
 "val_2" : 0,
 "count" : 1
 },
 {
 "date" : ISODate("2014-10-02T00:00:00.000Z"),
 "val_1" : 6,
 "val_2" : 2,
 "count" : 6
 }
 ],
 "2_1" : [
 {
 "date" : ISODate("2014-10-06T00:00:00.000Z"),
 "val_1" : 1,
 "val_2" : 0,
 "count" : 1
 },
 {
 "date" : ISODate("2014-10-07T00:00:00.000Z"),
 "val_1" : 6,
 "val_2" : 2,
 "count" : 6
 }
 ]
 }
}

And there aggregation performance is great, but the bottleneck is that aggregated data insertion in other collection and i can't figure out why it's so slow. Documents count in another_collection is about 1558995 and groving.

I have done some researches like an I/O Stat, MongoStat, MongoTop:

I/O stat:

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
xvda 0.00 0.00 2.00 45.00 0.01 0.09 4.09 0.07 1.45 6.00 1.24 1.19 5.60
xvdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-0 0.00 0.00 2.00 18.00 0.01 0.09 9.60 0.06 3.00 6.00 2.67 2.80 5.60
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

MongoStat:

insert query update delete getmore command flushes mapped vsize res faults locked db idx miss % qr|qw ar|aw netIn netOut conn time 
 *0 1 39 *0 1 160|0 0 198g 403g 14.5g 1 db:28.9% 0 1|0 1|0 207k 231k 67 15:58:02 
 *0 *0 38 *0 0 178|0 0 198g 403g 14.6g 1 db:22.4% 0 2|0 1|0 235k 172k 67 15:58:03 
 *0 *0 34 *0 0 198|0 0 198g 403g 14.7g 0 db:68.1% 0 2|0 0|1 232k 186k 67 15:58:04 
 *0 *0 42 *0 0 270|0 0 198g 403g 14.6g 0 db:132.6% 0 0|0 0|1 298k 218k 67 15:58:05 
 *0 6 49 *0 5 183|0 0 198g 403g 14.7g 2 db:141.0% 0 0|0 0|0 292k 4m 67 15:58:06 
 *0 4 60 *0 4 495|0 0 198g 403g 14.7g 7 db:55.1% 0 0|0 0|1 411k 599k 67 15:58:07 
 *0 11 44 *0 10 296|0 0 198g 403g 14.5g 1 db:40.4% 0 0|0 1|0 299k 906k 67 15:58:08 
 *0 7 37 *0 6 184|0 0 198g 403g 14.5g 3 db:20.4% 0 3|1 0|1 221k 589k 67 15:58:09 
 *0 8 26 *0 9 139|0 0 198g 403g 14.5g 2 db:41.4% 0 0|0 1|2 157k 775k 67 15:58:10 
 *0 7 44 *0 7 245|0 0 198g 403g 14.5g 1 db:25.2% 0 1|0 1|0 285k 684k 67 15:58:11 
insert query update delete getmore command flushes mapped vsize res faults locked db idx miss % qr|qw ar|aw netIn netOut conn time 
 *0 2 50 *0 2 222|0 0 198g 403g 14.5g 2 db:25.8% 0 2|0 1|0 249k 341k 67 15:58:12 
 *0 11 38 *0 9 151|0 0 198g 403g 14.6g 6 db:22.1% 0 2|0 1|0 224k 774k 67 15:58:13 
 *0 3 31 *0 3 147|0 0 198g 403g 14.7g 1 db:15.0% 0 2|0 0|1 215k 369k 67 15:58:14 
 1 5 30 *0 5 134|0 0 198g 403g 14.5g 1 db:30.9% 0 0|0 2|3 200k 495k 67 15:58:15 
 *0 1 14 *0 1 64|0 0 198g 403g 14.5g 0 db:24.3% 0 3|0 0|1 83k 197k 67 15:58:16

MongoTop:

 ns total read write 2014年11月23日T14:00:34
 db.system.namespaces 2751ms 2751ms 0ms
 db.another_collection 233ms 0ms 233ms
 ns total read write 2014年11月23日T14:00:35
 db.system.namespaces 2858ms 2858ms 0ms
 db.another_collection 285ms 0ms 285ms

Any help would be very appreciated!

UPDATE

Well i have figured out that findAndModify does'nt have indexes, but when i added indexes I/O %util increased up to 100.

Next step i guess should be try to implement Bulk operations ( http://docs.mongodb.org/manual/reference/method/js-bulk/ ) or update disk drives to SSD.

Question 2

You want to use bulk write operations instead of individual updates and inserts.

// BEFORE your iteration loop
var bulk = db.another_collection.initializeUnorderedBulkOp();
// INSIDE your iteration loop
// instead of db.another_collection.update(...)
bulk.find(query).update(yourUpdateDocumentHere);
// instead of db.another_collection.insert(...)
bulk.insert(yourDocumentToInsertHere);
// AFTER your iteration loop
bulk.execute()

This should speed up the write operations dramatically.

EDIT: as for the bottleneck, extending arrays is a costly operation, since it forces MongoDB to relocate documents often, since the padding will be exceeded. As a rule of thumb: if you have an array which you have to extend often, there is something wrong with your data model. More often than not, it is worth to revise it.

Question 3

Thank you for suggestions! But as you can see i'm using condition statements to determine whenever i should insert, increment or update. I'm a little bit confused how i can to combine these actions using bulk write operations.

Question 4

As written: you simply use bulk.find(...).update() each time you used db.another_collection.update(), and bulk.insert() each time you used db.another_collection.insert(). Your logic stays the same... I don't get what additional information you need. Please clarify.

Question 5

Okay, but does it gonna work if i need result from query by which i decide execute another one or not? Ie: var res = bulk.find(...).update() and use it if res <..> bulk.insert(yourDocumentToInsertHere); ?

Question 6

What is your query on findandmodify? can you execute an explain for the findandmodify?

Question 7

@qutwala: You could do a bulk upset instead. An upset either updates a document (if already present) or creates one using the update document of the query.

score 1 · Answer 1 · 2014-11-23 15:32:49Z

1

You want to use bulk write operations instead of individual updates and inserts.

// BEFORE your iteration loop
var bulk = db.another_collection.initializeUnorderedBulkOp();
// INSIDE your iteration loop
// instead of db.another_collection.update(...)
bulk.find(query).update(yourUpdateDocumentHere);
// instead of db.another_collection.insert(...)
bulk.insert(yourDocumentToInsertHere);
// AFTER your iteration loop
bulk.execute()

This should speed up the write operations dramatically.

EDIT: as for the bottleneck, extending arrays is a costly operation, since it forces MongoDB to relocate documents often, since the padding will be exceeded. As a rule of thumb: if you have an array which you have to extend often, there is something wrong with your data model. More often than not, it is worth to revise it.

Share

Improve this answer

edited Nov 23, 2014 at 15:38

answered Nov 23, 2014 at 15:32

Markus W Mahlberg's user avatar

Markus W Mahlberg Markus W Mahlberg

3,3011 gold badge15 silver badges19 bronze badges

10

Thank you for suggestions! But as you can see i'm using condition statements to determine whenever i should insert, increment or update. I'm a little bit confused how i can to combine these actions using bulk write operations.

deividaspetraitis
– deividaspetraitis

2014年11月23日 16:26:52 +00:00
Commented Nov 23, 2014 at 16:26
As written: you simply use bulk.find(...).update() each time you used db.another_collection.update(), and bulk.insert() each time you used db.another_collection.insert(). Your logic stays the same... I don't get what additional information you need. Please clarify.

Markus W Mahlberg
– Markus W Mahlberg

2014年11月23日 16:37:43 +00:00
Commented Nov 23, 2014 at 16:37
Okay, but does it gonna work if i need result from query by which i decide execute another one or not? Ie: var res = bulk.find(...).update() and use it if res <..> bulk.insert(yourDocumentToInsertHere); ?

deividaspetraitis
– deividaspetraitis

2014年11月23日 18:02:41 +00:00
Commented Nov 23, 2014 at 18:02
What is your query on findandmodify? can you execute an explain for the findandmodify?

Antonios
– Antonios

2014年11月23日 18:17:51 +00:00
Commented Nov 23, 2014 at 18:17
@qutwala: You could do a bulk upset instead. An upset either updates a document (if already present) or creates one using the update document of the query.

Markus W Mahlberg
– Markus W Mahlberg

2014年11月24日 11:49:39 +00:00
Commented Nov 24, 2014 at 11:49

| Show 5 more comments

Stack Exchange Network

MongoDB Insert Performance

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

MongoDB Insert Performance

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions