10

I am using mongodb to persist a very big-size (90G), which has nearly 40,000,000 items. I read and parse this file and insert all items into mongodb (my programming language is perl, batch_insert, and I map one item to one mongodb document).

Before I insert, I have already pre-created indexes (about 10 index keys). I find that insert speed cannot meet my need (200 to 400 items per second). I know that too many index keys will definitely slow down my insert, especially when the size of collection becomes quite big. So I wonder if I can index them after I have dumped all the data into db.

Can anyone tell me if this way is available, and if it can definitely save my time?

Mat
10.3k4 gold badges44 silver badges40 bronze badges
asked Dec 19, 2013 at 4:33
3
  • Why don't you just try it out? Commented Dec 22, 2013 at 10:05
  • Well,it is 90G data.I have to make a very good research before I dump it to db.A try is not allowed on such big-size data! Commented Dec 23, 2013 at 8:40
  • 1
    Try with a subset. You'll never know how long it takes on your hardware if you don't test anyway. Commented Dec 23, 2013 at 8:42

2 Answers 2

10

Yes, you can index them after you have imported (there will then only be the default _id index on the collection). This is also recommended because the resulting indexes will be more compact and more efficient (for similar reasons foreground vs background indexing is preferred if you can afford to do it). It will take some time to complete though, especially with 10 indexes to build.

To build after the import, simply do not define any indexes until after your import is complete, then use the ensureIndex() command to create the required indexes afterwards (with the usual caveat that such index creation will be resource intensive). For more information:

http://docs.mongodb.org/manual/core/index-creation/

answered Dec 31, 2013 at 19:14
3

I was having the same problem with a big collection. After more than two days of heavy importing data process what i suggest to do in order to get the collection up and running as soon as possible is next :

  1. Create the collection empty without index and import just the data.
  2. Select the single field indexes and create them.
  3. Create all the multiple field indexes if you have. (This was specially slow for me having indexes with more than 5 fields, but with single or two fields are pretty fast).
answered Jul 29, 2016 at 13:29

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.