Just for the example:
I have a system which needs to fetch social media timeline on a specific time interval based on user preferences.
Let's say we have user A, B, C, and so on.
Let's say again we have time interval options from every 5 minutes to every 60 minutes, a user can choose the interval as they want.
- User A wants it to fetch every 5 minutes.
- User B wants it to fetch every 18 minutes.
- User B wants it to fetch every 36 minutes.
etc.
Current approach
First I defined the social media
table like this.
- id
- userId
- twitterUsername
- interval // int, store interval in minutes
- lastRunAt
- createdAt
- updatedAt
Interval
here is to store the user preference and lastRunAt
is used to flag when's the row last run.
So, to run this I create a cron job which will run every minutes to execute the function to fetch social media data. Let say fetchSocialMedia()
.
Inside fetchSocialMedia()
, I do a select query to fetch data from social media
table, then loop it and inside loop I check the interval
+ lastRunAt
to compare it with current time. If the interval
+ lastRunAt
has exceeded the current time, so it's time for the row to fetch the social media timeline, otherwise just skip it.
fetchSocialMedia()
fetchSocialMedia() {
const socialMedias = SocialMedia.findAll();
for(let i=0; i<socialMedias.length; i++) {
const item = socialMedias[i]
// Ignore the syntax, just focus on the approach :)
const compare = (item.lastRunAt + item.interval) < time now
if (compare) {
// do fetch the timeline here
}
}
}
Add to cronjob to run every minutes.
cron.schedule('* * * * *', () => {
fetchSocialMedia()
});
If you have a chance to make it better or If you want to create it from scratch, what would you do?
My concern is about the performance of this system, especially if we have thousands or more of social media data in the table.
1 Answer 1
First (this wasn't part of your question but is really important to get a clear understanding) think about naming.
social media
is a misnomer, this is actually a kind of schedule or repeating task table. Choose an appropriate name.Use the database's services properly. Instead of reading all entries and testing each whether it should be run again, use a where clause to select only those that should be run now. This would be simplified if you don't store
lastRunAt
butnextTimeToRun
.You might want to run a daemon process instead of triggering processing every minute. The daemon can select the minimum
nextTimeToRun
after running tasks, and use that to sleep appropriately before processing the next entry. If you have many entries, though, this probably would lead to increased activity, so you might want it to sleep for at least 30 seconds or so to ensure it does not run too often.This should actually be the very first point but I mention it last so it sticks: Concern yourself with performance when and only when you have performance issues. Then start with measuring, analyzing measurements, comparing various alternative approaches before settling for a solution. Premature optimization rarely works well.