I have been working on a script that can be used to automatically pull the batch ids that were processed 4 days ago and which need to be processed tomorrow on the server. The batch IDs from 3 days ago need to be processed the day after tomorrow, etc.
The script only needs to output the processing dates for all batch IDs from the last 4 days and send the output via email with the format below.
Please have a look and let me know of any improvement we can make, especially to the performance.
#!/bin/bash
matchdate () { date +%Y%m%d --date "1ドル" }
touch $result
dbhost='192.168.0.1'
dbuser='test_db'
dbpass='temp#100'
dbschema='demo'
batchdate=$(matchdate "4 days ago")
current_date=$(matchdate "today")
while [[ $batchdate != $current_date ]]; do
batchdate=$(matchdate "$batchdate + 1 day")
batchwupos=$(date -d "$batchdate" +%Y-%m-%d)
processdate=$(date +%d-%m-%Y -d "$matchdate + 1 day")
#echo "$batchdate and $batchwupos"
recordpull=`mysql -h $dbhost -u $dbuser -p $dbpass -e "use $dbschema;SELECT il.BatchID FROM tbl_batch_table il WHERE il.batch_name like '%$batchdate%' and il.batch_name like '%$batchwupos%' and il.status='SUCCESS'"`
batch_result="`cat $recordpull | tr -s ' ' | cut -d " " -f2- | tr '\n' ',' | sed 's/,$//' | sed 's/........//' $recordpull`"
echo -e "\n Below are the expected batches which we will processing on server on specified date." >> $result
echo -e "\n $processdate : $batch_result" >> $result
done
Here are the expected batches which will process on the server on the specified date:
- 12-09-2016: (10642, 10643, 10644, 10646, 10647)
- 13-09-2016: (10648, 10649, 10654, 10655, 10656, 10659)
- 14-09-2016: (10657, 10658, 10661, 10665, 10666)
- 15-09-2016: (10668, 10669, 10670, 10671)
2 Answers 2
Start with the query and work backwards. A poorly-written, poorly-performing query is not a good foundation to build on.
Here is your query (re-formatted to more readable)
SELECT il.BatchID
FROM tbl_batch_table AS il
WHERE
il.batch_name LIKE '%$batchdate%'
AND il.batch_name LIKE '%$batchwupos%'
AND il.status='SUCCESS'
This query will not be able to leverage an index on batch_name
because you are using a LIKE
condition that starts with a wild card character.
Can you split apart the batch date information in your DB table into it's own column? That would give you the ability to do things like:
- Query on a date range AND use index to do so.
- Perform date-based math (adding/subtracting days)
Imagine this sort of query:
SELECT
il.batch_date,
/* Create calculated field representing processing date four days
after batch date */
DATE_ADD(il.batch_date, INTERVAL 4 DAY) AS processing_date,
il.BatchID
FROM tbl_batch_table AS il
WHERE
/* Filter for records batched in last 4 days */
il.batch_date >= DATE_SUB(CURDATE(), INTERVAL 4 DAY)
AND il.status='SUCCESS'
/* Order your results */
ORDER BY il.batch_date ASC, il.BatchID ASC
Here you can leverage an index on batch_date for WHERE clause and ordering. You can also totally eliminate having multiple queries as you are getting records for the entire range of dates at once.
If you wanted to go with the suggestion from other answer and aggregate information for each date with concatenated results, you could add GROUP BY and GROUP CONCAT to this query like follows:
SELECT
/* Create calculated field representing processing date four days
after batch date */
DATE_ADD(il.batch_date, INTERVAL 4 DAY) AS processing_date,
GROUP_CONCAT(il.BatchID) AS batch_ids
FROM tbl_batch_table AS il
WHERE
/* Filter for records batched in last 4 days */
il.batch_date >= DATE_SUB(CURDATE(), INTERVAL 4 DAY)
AND il.status='SUCCESS'
GROUP BY processing_date
/* Order your results */
ORDER BY il.batch_date ASC, il.BatchID ASC
Performance
For each day, the script executes a MySQL query,
and a long pipeline of commands to format the query results.
The more processes (commands) you execute per iteration,
the slower the program.
3 calls to date
, mysql
, and a pipeline of 6 commands.
Yes this is going to be slow.
Improvement ideas:
Don't parse dates using
date
. A few calls would be fine, but when you need to make many calls, it's better to look for other alternatives. In your case, there's a very good alternative: do all data computations inside the MySQL. The result will be faster, because it will be done in a single process, not in multiple process calls. It will be better, because you won't depend on GNU date, making the script more portable. Change the loop to count from -4 to 0, and rewrite the date calculations in MySQL.Don't use shell scripting to concatenate rows of a MySQL query result. See the
GROUP_CONCAT
function. Also, you can suppress the column headers in the output using the-N
flag.
The fastest solution will be to rewrite the entire shell script as a MySQL query, if possible.
Portability
Originally the question was tagged with bash, sh, unix, shell. Some of these are were inappropriate, here's the reasoning:
- The script is using features not available in sh (the
[[
command) so I dropped that - The script relies heavily on GNU date features that are typically not available in popular unix flavors such as BSD and Solaris, but typically available in Linux systems. So I replaced unix with linux
Coding style
The bad indentation immediately jumps in the eye, and it's the easiest to fix.
Consistently indent the statements in a while
loop.
These statements use different logic in the ordering of parameters:
batchwupos=$(date -d "$batchdate" +%Y-%m-%d) processdate=$(date +%d-%m-%Y -d "$matchdate + 1 day")
By using consistent ordering, these lines become easier to read:
batchwupos=$(date +%Y-%m-%d -d "$batchdate")
processdate=$(date +%d-%m-%Y -d "$matchdate + 1 day")
The `...`
is obsolete, use $(...)
instead.
-
\$\begingroup\$ Thanks Janos for suggestion regarding GROUP_CONCAT i have to look into it. \$\endgroup\$Amit Alone– Amit Alone2016年09月16日 13:51:46 +00:00Commented Sep 16, 2016 at 13:51
while
loop, otherwise it is a bit confusing. \$\endgroup\$