Designing a scheduled based system using Node.js

Question 1

Just for the example:

I have a system which needs to fetch social media timeline on a specific time interval based on user preferences.

Let's say we have user A, B, C, and so on.

Let's say again we have time interval options from every 5 minutes to every 60 minutes, a user can choose the interval as they want.

User A wants it to fetch every 5 minutes.
User B wants it to fetch every 18 minutes.
User B wants it to fetch every 36 minutes.

etc.

Current approach

First I defined the social media table like this.

- id
- userId
- twitterUsername
- interval // int, store interval in minutes
- lastRunAt
- createdAt
- updatedAt

Interval here is to store the user preference and lastRunAt is used to flag when's the row last run.

So, to run this I create a cron job which will run every minutes to execute the function to fetch social media data. Let say fetchSocialMedia().

Inside fetchSocialMedia(), I do a select query to fetch data from social media table, then loop it and inside loop I check the interval + lastRunAt to compare it with current time. If the interval + lastRunAt has exceeded the current time, so it's time for the row to fetch the social media timeline, otherwise just skip it.

fetchSocialMedia()

fetchSocialMedia() {
 const socialMedias = SocialMedia.findAll();
 
 for(let i=0; i<socialMedias.length; i++) {
 const item = socialMedias[i]
 // Ignore the syntax, just focus on the approach :)
 
 const compare = (item.lastRunAt + item.interval) < time now
 if (compare) {
 // do fetch the timeline here
 }
 }
}

Add to cronjob to run every minutes.

cron.schedule('* * * * *', () => {
 fetchSocialMedia()
});

If you have a chance to make it better or If you want to create it from scratch, what would you do?

My concern is about the performance of this system, especially if we have thousands or more of social media data in the table.

Question 2

First (this wasn't part of your question but is really important to get a clear understanding) think about naming. social media is a misnomer, this is actually a kind of schedule or repeating task table. Choose an appropriate name.
Use the database's services properly. Instead of reading all entries and testing each whether it should be run again, use a where clause to select only those that should be run now. This would be simplified if you don't store lastRunAt but nextTimeToRun.
You might want to run a daemon process instead of triggering processing every minute. The daemon can select the minimum nextTimeToRun after running tasks, and use that to sleep appropriately before processing the next entry. If you have many entries, though, this probably would lead to increased activity, so you might want it to sleep for at least 30 seconds or so to ensure it does not run too often.
This should actually be the very first point but I mention it last so it sticks: Concern yourself with performance when and only when you have performance issues. Then start with measuring, analyzing measurements, comparing various alternative approaches before settling for a solution. Premature optimization rarely works well.

score 3 · Answer 1 · 2022-05-16 10:29:08Z

First (this wasn't part of your question but is really important to get a clear understanding) think about naming. social media is a misnomer, this is actually a kind of schedule or repeating task table. Choose an appropriate name.
Use the database's services properly. Instead of reading all entries and testing each whether it should be run again, use a where clause to select only those that should be run now. This would be simplified if you don't store lastRunAt but nextTimeToRun.
You might want to run a daemon process instead of triggering processing every minute. The daemon can select the minimum nextTimeToRun after running tasks, and use that to sleep appropriately before processing the next entry. If you have many entries, though, this probably would lead to increased activity, so you might want it to sleep for at least 30 seconds or so to ensure it does not run too often.
This should actually be the very first point but I mention it last so it sticks: Concern yourself with performance when and only when you have performance issues. Then start with measuring, analyzing measurements, comparing various alternative approaches before settling for a solution. Premature optimization rarely works well.

Stack Exchange Network

Designing a scheduled based system using Node.js

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Designing a scheduled based system using Node.js

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions