Most of my slow queries on my server are due to a single "find" query. I have a collection that represents website pages. I want to be able to look up a page in the collection by URL. The caveat is that a website might have URLs that redirect to it.
I've modeled my schema (with Node.js/Mongoose) like this:
{
'url': { type: String, index: {unique: true, dropDups: true} },
'redirects': { type: Array, index: true },
// ... other stuff
}
In other words, one entry might have the URL http://codereview.stackexchange.com/
but also have an array of 5 other URLs (such as bit.ly URLs) that redirect to it.
Later, I want to query the collection to see if we have an entry which matches an arbitrary set of URLs. I don't know if the URLs are redirected URLs or what -- I just want to find all matches in my collection that represent these URLs.
So I do this:
// urls is an array of URL strings we want to find...
model.find({$or: [{'url': {$in: urls}}, {'redirects': {$in: urls}}]}).lean().exec( ... );
Unfortunately, I might be looking for around 200 URLs at once, so this query sometimes takes> 1 second.
Is there further optimization I can do, or would it be better to split the query into multiple queries, capping the search-size each time?
1 Answer 1
I haven't used MongoDB, but I have a few suggestions based on experiences with optimization and other databases.
Index hashes of the URLs and search for those instead. Using a simple MD5 hash would probably speed up searching with the cost of dealing with false positives (unlikely but possible).
Store every URL as a top-level object and add a
redirectsTo
attribute. This alleviates the need to index or search theredirects
collections. While now you need to perform two queries--one for the original URLs and another for the redirected URLs--this could end up being faster if only a small percentage of URLs are redirects.Here's some psuedocode to clarify what I mean:
all = [] found = model.find({'url': {$in: urls}})... redirects = [] for each item in found if item.redirectsTo redirects += item.redirectsTo else all += item redirected += model.find({'url': {$in: redirects}})... for each item in redirected all += item
Update your question with the higher-level problem you're trying to solve. I often find that when I'm trying to optimize slow task
T
to solve problemP
, the better approach is to rethinkP
.