I've got a database table which will grow in size by about 5000 rows a hour. For a key that I would be querying by, the query will grow in size by about 1 row every hour. I would like a web page to show the latest rows for a key, 50 at a time (this is configurable). I would like to try and implement memcache to keep database activity low for reads.
If I run a query and create a cache result for each page of 50 results, that would work until a new entry is added. At that time, the page of latest results gets new result and the oldest results drops off. This cascades down the list of cached pages causing me to update every cache result. It seems like a poor design.
I could build the cache pages backwards, then for each page requested I should get the latest 2 pages and truncate to the proper length of 50. I'm not sure if this is good or bad?
Ideally, the mechanism I use to insert a new row would also know how to invalidate the proper cache results.
Has someone already solved this problem in a widely acceptable way? What's the best method of doing this?
EDIT:
If my understanding of the MYSQL query cache is correct, it has table level granularity in invalidation. Given the fact that I have about 5000 updates before a query on a key should need to be invalidated, it seems that the database query cache would not be used. MS SQL caches execution plans and frequently accessed data pages, so it may do better in this scenario.
My query is not against a single table with TOP N. One version has joins to several tables and another has sub-selects.
Also, since I want to cache the html generated table, I'm wondering if a cache at the web server level would be appropriate? Is there really no benefit to any type of caching? Is the best advice really to just allow a website site query to go through all the layers and hit the database every request?
4 Answers 4
Unless I'm misunderstanding the question, I don't think that this is an appropriate scenario for caching.
Cached data normally has at least one of the following attributes (usually all of them):
- Expensive to retrieve or compute;
- Highly static - may change occasionally but very rarely;
- Non-critical - OK if the requester sees stale data.
It doesn't sound like any of these apply to your situation.
- The query is a simple
SELECT
, probablyTOP N
, just an index seek; - It changes very frequently;
- Your requirements indicate that immediate updates are required.
So why are you caching? Caching isn't a panacea; oftentimes it can actually make performance worse, if the cache memory could be better used for some other purpose.
Databases do their own caching. As long as the DB server has plenty of memory then it may cache the entire table in memory if it's frequently queried; the performance of that will be just as good as your cache if not better.
Some further thoughts/suggestions:
If stale data is OK, then the simplest solution would be to use a fixed interval (i.e. expiration). This method is used very effectively in hundreds of thousands of sites and systems. You can either force an update on expiration or just wait until it's requested again.
If you're concerned about conflicts between reads and writes, then (a) don't be, until you've profiled it, and (b) if it really is an issue then instead of trying to cache it, just use a redundant table or a
NOLOCK
hint.
If you need to invalidate the cache every single time a row is added/changed then you have completely defeated the purpose of an application cache, and are now trying to implement an in-memory database. Please don't do this unless you have an extremely good reason for it.
The volume of rows you are dealing with is very low - less than 10,000 per year.
Implementing a caching mechanism for this would overly complicate something the database can do very quickly and easily, especially with the right indexes in place.
Is there a specific reason you are trying to implement memcache for this scenario?
MySQL's built in query cache will actually work well here. Since your table doesn't change very often it will cache you results nicely and reduce the database read activity anyway.
Note: I have assumed MySQL since that's where I usually see memcache implemented :)
Edit: Based on the updated details, I would still suggest going with a straight database solution. The load on the database for reading only really becomes an issue in high volume environments. Good indexing and query optimisation will usually provide good performance in many environments.
If you do need to take the memcache path, I would suggest that you don't try to micro-manage the pages in the cache.
Each insert can check memcache - if the insert introduces the new entry for the key (which you have said happens about once every hour) then it should invalidate the entire cache relating to that table.
When someone requests any page of the results, you would check memcache. If the results for that page are already there then use them. If not, the run a query specifically to get just that page, cache the page details in memcache and return the results.
With this approach, the cache management is simple and you only regenerate the cache page the first time it is requested after an invalidation. Following requests will use the cache until the next invalidation. This approach will also mean only caching data for pages which are actually requested.
-
I incorrectly specified the volume and edited the question to reflect that. 1 per hour is for a specific key, the table grows much faster.McLeopold– McLeopold2011年05月23日 22:47:19 +00:00Commented May 23, 2011 at 22:47
Use the database cache
It'll handle this low volume with ease. Just use the following query:
select top 50 from your_view
-
-
Just use a paged query as normal. You don't need caching at all for your scenario.Gary– Gary2011年05月24日 06:18:31 +00:00Commented May 24, 2011 at 6:18
Another trick that works in many database is a composite index. Usually composite indexes perform well if the query is done in the same order as the order of columns in the index. In your case, the index would be on .
This way, the index is built like (key1,row1) (key1, row2) (key2, row3) (key2, row5) (key3, row6) and so on. The index will find it very easy to query based on the prefix key here - and the top / latest element fetch would be lightning quick. If you go down this route, ensure your db is actually picking this composite index for this top query.
-
I wouldn't exactly call this a "trick", it's more like a fundamental concept of indexing. And most of the time, the order of predicates in the query doesn't matter at all, since the optimizer will rearrange them as needed.Aaronaught– Aaronaught2011年05月24日 00:23:48 +00:00Commented May 24, 2011 at 0:23
-
By order of querying I meant if your first reduction parameter and second are of the same order as the keys in a composite indexSub S– Sub S2011年05月24日 01:29:45 +00:00Commented May 24, 2011 at 1:29
TOP N
query is actually significant?