Iterating over a range of dates in a shell script

Question 1

I have been working on a script that can be used to automatically pull the batch ids that were processed 4 days ago and which need to be processed tomorrow on the server. The batch IDs from 3 days ago need to be processed the day after tomorrow, etc.

The script only needs to output the processing dates for all batch IDs from the last 4 days and send the output via email with the format below.

Please have a look and let me know of any improvement we can make, especially to the performance.

#!/bin/bash
matchdate () { date +%Y%m%d --date "1ドル" }
touch $result
dbhost='192.168.0.1'
dbuser='test_db'
dbpass='temp#100'
dbschema='demo'
batchdate=$(matchdate "4 days ago")
current_date=$(matchdate "today")
while [[ $batchdate != $current_date ]]; do
 batchdate=$(matchdate "$batchdate + 1 day")
 batchwupos=$(date -d "$batchdate" +%Y-%m-%d)
 processdate=$(date +%d-%m-%Y -d "$matchdate + 1 day")
 #echo "$batchdate and $batchwupos"
 recordpull=`mysql -h $dbhost -u $dbuser -p $dbpass -e "use $dbschema;SELECT il.BatchID FROM tbl_batch_table il WHERE il.batch_name like '%$batchdate%' and il.batch_name like '%$batchwupos%' and il.status='SUCCESS'"`
batch_result="`cat $recordpull | tr -s ' ' | cut -d " " -f2- | tr '\n' ',' | sed 's/,$//' | sed 's/........//' $recordpull`"
echo -e "\n Below are the expected batches which we will processing on server on specified date." >> $result
echo -e "\n $processdate : $batch_result" >> $result
done

Here are the expected batches which will process on the server on the specified date:

12-09-2016: (10642, 10643, 10644, 10646, 10647)
13-09-2016: (10648, 10649, 10654, 10655, 10656, 10659)
14-09-2016: (10657, 10658, 10661, 10665, 10666)
15-09-2016: (10668, 10669, 10670, 10671)

Question 2

You should fix your indentation to be the same level within the while loop, otherwise it is a bit confusing.

Question 3

Start with the query and work backwards. A poorly-written, poorly-performing query is not a good foundation to build on.

Here is your query (re-formatted to more readable)

SELECT il.BatchID
FROM tbl_batch_table AS il
WHERE
 il.batch_name LIKE '%$batchdate%'
 AND il.batch_name LIKE '%$batchwupos%'
 AND il.status='SUCCESS'

This query will not be able to leverage an index on batch_name because you are using a LIKE condition that starts with a wild card character.

Can you split apart the batch date information in your DB table into it's own column? That would give you the ability to do things like:

Query on a date range AND use index to do so.
Perform date-based math (adding/subtracting days)

Imagine this sort of query:

SELECT
 il.batch_date,
 /* Create calculated field representing processing date four days
 after batch date */
 DATE_ADD(il.batch_date, INTERVAL 4 DAY) AS processing_date,
 il.BatchID
FROM tbl_batch_table AS il
WHERE
 /* Filter for records batched in last 4 days */
 il.batch_date >= DATE_SUB(CURDATE(), INTERVAL 4 DAY)
 AND il.status='SUCCESS'
/* Order your results */
ORDER BY il.batch_date ASC, il.BatchID ASC

Here you can leverage an index on batch_date for WHERE clause and ordering. You can also totally eliminate having multiple queries as you are getting records for the entire range of dates at once.

If you wanted to go with the suggestion from other answer and aggregate information for each date with concatenated results, you could add GROUP BY and GROUP CONCAT to this query like follows:

SELECT
 /* Create calculated field representing processing date four days
 after batch date */
 DATE_ADD(il.batch_date, INTERVAL 4 DAY) AS processing_date,
 GROUP_CONCAT(il.BatchID) AS batch_ids
FROM tbl_batch_table AS il
WHERE
 /* Filter for records batched in last 4 days */
 il.batch_date >= DATE_SUB(CURDATE(), INTERVAL 4 DAY)
 AND il.status='SUCCESS'
GROUP BY processing_date
/* Order your results */
ORDER BY il.batch_date ASC, il.BatchID ASC

Question 4

Performance

For each day, the script executes a MySQL query, and a long pipeline of commands to format the query results. The more processes (commands) you execute per iteration, the slower the program. 3 calls to date, mysql, and a pipeline of 6 commands. Yes this is going to be slow.

Improvement ideas:

Don't parse dates using date. A few calls would be fine, but when you need to make many calls, it's better to look for other alternatives. In your case, there's a very good alternative: do all data computations inside the MySQL. The result will be faster, because it will be done in a single process, not in multiple process calls. It will be better, because you won't depend on GNU date, making the script more portable. Change the loop to count from -4 to 0, and rewrite the date calculations in MySQL.
Don't use shell scripting to concatenate rows of a MySQL query result. See the GROUP_CONCAT function. Also, you can suppress the column headers in the output using the -N flag.

The fastest solution will be to rewrite the entire shell script as a MySQL query, if possible.

Portability

Originally the question was tagged with bash, sh, unix, shell. Some of these are were inappropriate, here's the reasoning:

The script is using features not available in sh (the [[ command) so I dropped that
The script relies heavily on GNU date features that are typically not available in popular unix flavors such as BSD and Solaris, but typically available in Linux systems. So I replaced unix with linux

Coding style

The bad indentation immediately jumps in the eye, and it's the easiest to fix. Consistently indent the statements in a while loop.

These statements use different logic in the ordering of parameters:

batchwupos=$(date -d "$batchdate" +%Y-%m-%d)
processdate=$(date +%d-%m-%Y -d "$matchdate + 1 day")

By using consistent ordering, these lines become easier to read:

batchwupos=$(date +%Y-%m-%d -d "$batchdate")
processdate=$(date +%d-%m-%Y -d "$matchdate + 1 day")

The `...` is obsolete, use $(...) instead.

Question 5

Thanks Janos for suggestion regarding GROUP_CONCAT i have to look into it.

Mike Brant Mike Brant 9,85814 silver badges24 bronze badges · Accepted Answer · 2016-09-16 20:39:16Z

Start with the query and work backwards. A poorly-written, poorly-performing query is not a good foundation to build on.

Here is your query (re-formatted to more readable)

SELECT il.BatchID
FROM tbl_batch_table AS il
WHERE
 il.batch_name LIKE '%$batchdate%'
 AND il.batch_name LIKE '%$batchwupos%'
 AND il.status='SUCCESS'

This query will not be able to leverage an index on batch_name because you are using a LIKE condition that starts with a wild card character.

Can you split apart the batch date information in your DB table into it's own column? That would give you the ability to do things like:

Query on a date range AND use index to do so.
Perform date-based math (adding/subtracting days)

Imagine this sort of query:

SELECT
 il.batch_date,
 /* Create calculated field representing processing date four days
 after batch date */
 DATE_ADD(il.batch_date, INTERVAL 4 DAY) AS processing_date,
 il.BatchID
FROM tbl_batch_table AS il
WHERE
 /* Filter for records batched in last 4 days */
 il.batch_date >= DATE_SUB(CURDATE(), INTERVAL 4 DAY)
 AND il.status='SUCCESS'
/* Order your results */
ORDER BY il.batch_date ASC, il.BatchID ASC

Here you can leverage an index on batch_date for WHERE clause and ordering. You can also totally eliminate having multiple queries as you are getting records for the entire range of dates at once.

If you wanted to go with the suggestion from other answer and aggregate information for each date with concatenated results, you could add GROUP BY and GROUP CONCAT to this query like follows:

SELECT
 /* Create calculated field representing processing date four days
 after batch date */
 DATE_ADD(il.batch_date, INTERVAL 4 DAY) AS processing_date,
 GROUP_CONCAT(il.BatchID) AS batch_ids
FROM tbl_batch_table AS il
WHERE
 /* Filter for records batched in last 4 days */
 il.batch_date >= DATE_SUB(CURDATE(), INTERVAL 4 DAY)
 AND il.status='SUCCESS'
GROUP BY processing_date
/* Order your results */
ORDER BY il.batch_date ASC, il.BatchID ASC

Stack Exchange Network

Iterating over a range of dates in a shell script

2 Answers 2

Performance

Portability

Coding style

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Iterating over a range of dates in a shell script

2 Answers 2

Performance

Portability

Coding style

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions