3
\$\begingroup\$

I need to query MongoDB and check if an item exists or not:

class MongoDBPipeline(object):
 def __init__(self):
 connection = pymongo.MongoClient(
 settings['MONGODB_SERVER'],
 settings['MONGODB_PORT']
 )
 db = connection[settings['MONGODB_DB']]
 self.collection = db[settings['MONGODB_COLLECTION']]
 def process_item(self, item, spider):
 dup_check = self.collection.find({'image':item['image']}).count()
 if dup_check == 0 : 
 self.collection.insert(dict(item))
 log.msg("Image added to MongoDB database!",
 level=log.DEBUG, spider=spider)
 print "image Added!"
 else:
 print "Image Exist" 
 return item

For now it works, but I want to know if I am doing it the right way or if there is a better solution.

SuperBiasedMan
13.5k5 gold badges37 silver badges62 bronze badges
asked Sep 30, 2015 at 20:46
\$\endgroup\$

2 Answers 2

2
\$\begingroup\$

Method

The fastest way to check if an item into a MongoDB is unique (and if it isn't, not insert it) is to create a unique index on the related columns and catch the error upon insertion time.

Watch more here.

MongoDB - Features

Maybe this would be OT, but I would like to let you know all this:

MongoDB is inconsistent by default. The documentation claims "strong consistency," but the default implementation considers an operation "complete" as soon as it is queued in the send buffer of a client, even before it has been seen by any node. There is a big discrepancy between the discussion in Mongo blogs and the default implementation in the code. You have to take extra measures to ensure that an update has propagated to all replicas.

I'll list you the advantages and the disadvantages, just to let you make an idea, I don't like so much MongoDB.


Advantages

  • Lightning fast.

  • You can perform rich queries, can create on the fly indexes with a single command.

  • Lightning fast.

  • Replication is very easy.

Disadvantages

  • Indexes take up a lot of RAM. They are B-tree indexes and if you have many, you can run out of system resources really fast.

  • Very unreliable. No single server durability. If something crashes while it's updating 'table-contents' - you lose all you data. Repair takes a lot of time, but usually ends up in 50-90% data loss if you aren't lucky. So only way to be fully secure is to have 2 replicas in different datacentres.


Obviously, there could be some alternative to MongoDB.

Jamal
35.2k13 gold badges134 silver badges238 bronze badges
answered Nov 14, 2015 at 12:34
\$\endgroup\$
1
\$\begingroup\$

You can create an index on that field:

db.collection.createIndex({"field":"",{unique:true})

Mongo will throw an error if the same value is being inserted.

Jamal
35.2k13 gold badges134 silver badges238 bronze badges
answered Oct 15, 2015 at 10:40
\$\endgroup\$
2
  • \$\begingroup\$ Please write some code to help user better. \$\endgroup\$ Commented Oct 15, 2015 at 12:32
  • \$\begingroup\$ I have written a shell query \$\endgroup\$ Commented Oct 15, 2015 at 14:01

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.