I have some data which are aggregated in chunks and later pushed into another collection.
The problem is that aggregated data pushing to another collection is very slow and takes from few hours and i can't figure out how i can optimize it because i can't find the bottleneck.
The machine on which MongoDB is running are single cluster with 24GB of RAM.
Basically whole process looks like this:
// Aggregated resutls which basically is about 17k entries
var cursor = db.collection.runCommand("aggregate" { ... });
Iterate over it:
cursor.result.forEach(function(data, index) {
var inc = db.runCommand({ findAndModify: 'another_collection', query : { query }, update : { $inc: update } });
if (inc.value == null) {
var up = db.another_collection.update(query, { $push : push });
if (up.nMatched == 0) {
db.another_collection.insert({});
}
}
});
What i'm doing in previously described code, just push'ing values to array or just incrementing it. And finally result in second collection looks like this:
{
"_id" : {
"id_1" : "1",
"id_2" : "2",
},
"value" :
{
"2_0" : [
{
"date" : ISODate("2014-10-01T00:00:00.000Z"),
"val_1" : 1,
"val_2" : 0,
"count" : 1
},
{
"date" : ISODate("2014-10-02T00:00:00.000Z"),
"val_1" : 6,
"val_2" : 2,
"count" : 6
}
],
"2_1" : [
{
"date" : ISODate("2014-10-06T00:00:00.000Z"),
"val_1" : 1,
"val_2" : 0,
"count" : 1
},
{
"date" : ISODate("2014-10-07T00:00:00.000Z"),
"val_1" : 6,
"val_2" : 2,
"count" : 6
}
]
}
}
And there aggregation performance is great, but the bottleneck is that aggregated data insertion in other collection and i can't figure out why it's so slow.
Documents count in another_collection
is about 1558995 and groving.
I have done some researches like an I/O Stat, MongoStat, MongoTop:
I/O stat:
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
xvda 0.00 0.00 2.00 45.00 0.01 0.09 4.09 0.07 1.45 6.00 1.24 1.19 5.60
xvdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-0 0.00 0.00 2.00 18.00 0.01 0.09 9.60 0.06 3.00 6.00 2.67 2.80 5.60
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
MongoStat:
insert query update delete getmore command flushes mapped vsize res faults locked db idx miss % qr|qw ar|aw netIn netOut conn time
*0 1 39 *0 1 160|0 0 198g 403g 14.5g 1 db:28.9% 0 1|0 1|0 207k 231k 67 15:58:02
*0 *0 38 *0 0 178|0 0 198g 403g 14.6g 1 db:22.4% 0 2|0 1|0 235k 172k 67 15:58:03
*0 *0 34 *0 0 198|0 0 198g 403g 14.7g 0 db:68.1% 0 2|0 0|1 232k 186k 67 15:58:04
*0 *0 42 *0 0 270|0 0 198g 403g 14.6g 0 db:132.6% 0 0|0 0|1 298k 218k 67 15:58:05
*0 6 49 *0 5 183|0 0 198g 403g 14.7g 2 db:141.0% 0 0|0 0|0 292k 4m 67 15:58:06
*0 4 60 *0 4 495|0 0 198g 403g 14.7g 7 db:55.1% 0 0|0 0|1 411k 599k 67 15:58:07
*0 11 44 *0 10 296|0 0 198g 403g 14.5g 1 db:40.4% 0 0|0 1|0 299k 906k 67 15:58:08
*0 7 37 *0 6 184|0 0 198g 403g 14.5g 3 db:20.4% 0 3|1 0|1 221k 589k 67 15:58:09
*0 8 26 *0 9 139|0 0 198g 403g 14.5g 2 db:41.4% 0 0|0 1|2 157k 775k 67 15:58:10
*0 7 44 *0 7 245|0 0 198g 403g 14.5g 1 db:25.2% 0 1|0 1|0 285k 684k 67 15:58:11
insert query update delete getmore command flushes mapped vsize res faults locked db idx miss % qr|qw ar|aw netIn netOut conn time
*0 2 50 *0 2 222|0 0 198g 403g 14.5g 2 db:25.8% 0 2|0 1|0 249k 341k 67 15:58:12
*0 11 38 *0 9 151|0 0 198g 403g 14.6g 6 db:22.1% 0 2|0 1|0 224k 774k 67 15:58:13
*0 3 31 *0 3 147|0 0 198g 403g 14.7g 1 db:15.0% 0 2|0 0|1 215k 369k 67 15:58:14
1 5 30 *0 5 134|0 0 198g 403g 14.5g 1 db:30.9% 0 0|0 2|3 200k 495k 67 15:58:15
*0 1 14 *0 1 64|0 0 198g 403g 14.5g 0 db:24.3% 0 3|0 0|1 83k 197k 67 15:58:16
MongoTop:
ns total read write 2014年11月23日T14:00:34
db.system.namespaces 2751ms 2751ms 0ms
db.another_collection 233ms 0ms 233ms
ns total read write 2014年11月23日T14:00:35
db.system.namespaces 2858ms 2858ms 0ms
db.another_collection 285ms 0ms 285ms
Any help would be very appreciated!
UPDATE
Well i have figured out that findAndModify does'nt have indexes, but when i added indexes I/O %util increased up to 100.
Next step i guess should be try to implement Bulk operations ( http://docs.mongodb.org/manual/reference/method/js-bulk/ ) or update disk drives to SSD.
1 Answer 1
You want to use bulk write operations instead of individual updates and inserts.
// BEFORE your iteration loop
var bulk = db.another_collection.initializeUnorderedBulkOp();
// INSIDE your iteration loop
// instead of db.another_collection.update(...)
bulk.find(query).update(yourUpdateDocumentHere);
// instead of db.another_collection.insert(...)
bulk.insert(yourDocumentToInsertHere);
// AFTER your iteration loop
bulk.execute()
This should speed up the write operations dramatically.
EDIT: as for the bottleneck, extending arrays is a costly operation, since it forces MongoDB to relocate documents often, since the padding will be exceeded. As a rule of thumb: if you have an array which you have to extend often, there is something wrong with your data model. More often than not, it is worth to revise it.
-
Thank you for suggestions! But as you can see i'm using condition statements to determine whenever i should insert, increment or update. I'm a little bit confused how i can to combine these actions using bulk write operations.deividaspetraitis– deividaspetraitis2014年11月23日 16:26:52 +00:00Commented Nov 23, 2014 at 16:26
-
As written: you simply use
bulk.find(...).update()
each time you useddb.another_collection.update()
, and bulk.insert() each time you useddb.another_collection.insert()
. Your logic stays the same... I don't get what additional information you need. Please clarify.Markus W Mahlberg– Markus W Mahlberg2014年11月23日 16:37:43 +00:00Commented Nov 23, 2014 at 16:37 -
Okay, but does it gonna work if i need result from query by which i decide execute another one or not? Ie: var
res = bulk.find(...).update()
and use itif res <..> bulk.insert(yourDocumentToInsertHere);
?deividaspetraitis– deividaspetraitis2014年11月23日 18:02:41 +00:00Commented Nov 23, 2014 at 18:02 -
What is your query on findandmodify? can you execute an explain for the findandmodify?Antonios– Antonios2014年11月23日 18:17:51 +00:00Commented Nov 23, 2014 at 18:17
-
@qutwala: You could do a bulk upset instead. An upset either updates a document (if already present) or creates one using the update document of the query.Markus W Mahlberg– Markus W Mahlberg2014年11月24日 11:49:39 +00:00Commented Nov 24, 2014 at 11:49