3
\$\begingroup\$

Most of my slow queries on my server are due to a single "find" query. I have a collection that represents website pages. I want to be able to look up a page in the collection by URL. The caveat is that a website might have URLs that redirect to it.

I've modeled my schema (with Node.js/Mongoose) like this:

{
 'url': { type: String, index: {unique: true, dropDups: true} },
 'redirects': { type: Array, index: true },
 // ... other stuff
 }

In other words, one entry might have the URL http://codereview.stackexchange.com/ but also have an array of 5 other URLs (such as bit.ly URLs) that redirect to it.

Later, I want to query the collection to see if we have an entry which matches an arbitrary set of URLs. I don't know if the URLs are redirected URLs or what -- I just want to find all matches in my collection that represent these URLs.

So I do this:

// urls is an array of URL strings we want to find...
model.find({$or: [{'url': {$in: urls}}, {'redirects': {$in: urls}}]}).lean().exec( ... );

Unfortunately, I might be looking for around 200 URLs at once, so this query sometimes takes> 1 second.

Is there further optimization I can do, or would it be better to split the query into multiple queries, capping the search-size each time?

Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Mar 6, 2013 at 15:49
\$\endgroup\$

1 Answer 1

3
\$\begingroup\$

I haven't used MongoDB, but I have a few suggestions based on experiences with optimization and other databases.

  1. Index hashes of the URLs and search for those instead. Using a simple MD5 hash would probably speed up searching with the cost of dealing with false positives (unlikely but possible).

  2. Store every URL as a top-level object and add a redirectsTo attribute. This alleviates the need to index or search the redirects collections. While now you need to perform two queries--one for the original URLs and another for the redirected URLs--this could end up being faster if only a small percentage of URLs are redirects.

    Here's some psuedocode to clarify what I mean:

    all = []
    found = model.find({'url': {$in: urls}})...
    redirects = []
    for each item in found
     if item.redirectsTo
     redirects += item.redirectsTo
     else
     all += item
    redirected += model.find({'url': {$in: redirects}})...
    for each item in redirected
     all += item
    
  3. Update your question with the higher-level problem you're trying to solve. I often find that when I'm trying to optimize slow task T to solve problem P, the better approach is to rethink P.

answered Mar 6, 2013 at 16:24
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.