High CPU utilisation for Logs Manager shell script

Question 1

I have written a shell script to manage the tcpdump pcap logs and syslog files in my Linux board, so as to maintain the disk usage to maximum of 70%.

The script checks for the disk usage every minute and as soon as the disk goes higher than 70%, it deletes all the oldest files that are not in use until the disk usage goes back to 70%.

#!/bin/ash
#Gives the file that is currently in use by the process
logmgr_get_current_file_name() {
 PROC_NAME=1ドル
 instance=2ドル
 pid=$(ps | grep $PROC_NAME | grep -v grep | awk 'NR=='$instance'{print 1ドル}')
 fds=$(ls /proc/$pid/fd)
 for fd in $fds;
 do
 last=$fd
 done
 current_file=$(ls /proc/$pid/fd/$last -l | awk 'BEGIN {FS="->"} {printf 2ドル}')
}
#checks whether the file we are deleting is in use by the process 
logmgr_is_file_in_use() {
 num_of_instances=1
 logmgr_get_current_file_name 1ドル $num_of_instances
 if [ $current_file = 2ドル ]; then
 return 1
 else
 return 0
 fi
}
# iterate through all possible instances of tcpdump
# for each instance .. get the file in use
# check if [ $file_in_use == $to_delete ] ; then
# return 1
# else
# return 0
logmgr_get_files_in_use() {
 i=1
 is_file_inuse=0
 num_of_instances=$(ps | grep tcpdump| grep -v grep | wc -l)
 while [ $i -le $num_of_instances ];
 do
 logmgr_get_current_file_name 1ドル $i
 if [ "$current_file" = "2ドル" ]; then
 is_file_inuse=1
 break
 fi
 i=`expr $i + 1`
 done
 return $is_file_inuse
}
logmgr_delete_oldest() {
 oldest=$(ls -lht 2ドル | awk 'END{print 9ドル}')
 to_delete=2ドル$oldest
 if [ "1ドル" != "tcpdump" ] ; then
 logmgr_is_file_in_use 1ドル $to_delete
 ret_val=$?
 else
 logmgr_get_files_in_use 1ドル $to_delete "tcpdump"
 ret_val=$?
 fi
 #if the file is not in use, then delete it
 if [ $ret_val -eq 0 ]; then
 echo "$to_delete deleted"
 rm -fr $to_delete
 sleep $cpu_relaxation_secs
 fi
}
logmgr_get_disk_usage() {
 diskusage=$(df -h 1ドル | awk 'END {print 5ドル}' | awk 'BEGIN {FS="%"} {print 1ドル}')
 echo $diskusage
}
 ##requires repeated execution######
logmgr_run() {
 sleep_secs=0
 while [ 1 ] ;
 do
 syslog_dir_is_ok=0
 tcpdump_dir_is_ok=0
 syslog_disk_usage=`logmgr_get_disk_usage $syslog_dir`
 tcpdump_disk_usage=`logmgr_get_disk_usage $tcpdump_dir`
 if [ $syslog_disk_usage -le $NORMAL_DISK_USAGE ] ; then
 syslog_dir_is_ok=1
 fi
 if [ $tcpdump_disk_usage -le $NORMAL_DISK_USAGE ] ; then
 tcpdump_dir_is_ok=1
 fi
 if [ $syslog_dir_is_ok -eq 0 ] && [ $tcpdump_dir_is_ok -eq 0 ] ; then
 break
 fi
 if [ $syslog_dir_is_ok -eq 0 ] ; then
 num_files=`ls -lht $syslog_dir | wc -l`
 logmgr_delete_oldest "syslogd" $syslog_dir $num_files
 fi
 if [ $tcpdump_dir_is_ok -eq 0 ] ; then
 num_files=`ls -lht $tcpdump_dir | wc -l`
 logmgr_delete_oldest "tcpdump" $tcpdump_dir $num_files
 fi
 done
}
##Default values##
max_file_safe=100
cpu_relaxation_secs=1
timer_secs=60 # one minute
KEEP_CHECKING=1
NORMAL_DISK_USAGE=20
# read the configuration file
syslog_dir=`uci get syslog.slog.directory`
syslog_dir=`dirname $syslog_dir`"/"`basename $syslog_dir`"/"
tcpdump_dir=`uci get tcpdump.tcpdump.log_dir`
tcpdump_dir=`dirname $tcpdump_dir`"/"`basename $tcpdump_dir`"/"
#while [ $KEEP_CHECKING -eq 1 ]
while :
do
 logmgr_run
 sleep $timer_secs
done

The script is sequential and works perfectly fine but the CPU utilisation gets higher to about 75% while running the script. I tried to remove all the redundancies in script but still not successful in achieving the desired result.

I am sure there are ways with which I can achieve this, but am not sure how. Please suggest some ways to optimise this script and reduce my CPU utilisation.

Question 2

If your CPU is rising to 75% and staying there (not bouncing up and down), then it suggest that your script is spinning and not hitting any of your sleep statements. There's a few things that may be worth looking into.

The while loop in logmgr_run keeps going until enough files have been cleared out. If this doesn't happen, then it'll never get to the 60 second sleep.
logmgr_delete_oldest will only sleep if it deletes a file, which only happens if the file isn't in use.

I haven't tested it, and it has been a while since I did script programming, but it looks to me like your algorithm breaks down if the oldest file in either folder is being used at the same time that you breach the usage threshold. It can't delete the oldest file, so it just keeps trying over and over again without succeeding or sleeping.

Rather than having a loop that keeps trying to remove the last file when the threshold is breached it might be better to calculate the amount of files that need to be removed in order to come back under the threshold and try to remove several files in one go / or to skip files if they are in use and move on to the next file rather than stopping and trying the same file again next time.

It may also be worthwhile extending the cleanup to clear a bit more than you need, so that you don't immediately trigger another cleanup next time you go through the loop. So, for example if you breach 70% usage, clear down the files to 65% or 60% so that there is some growth space.

Your script also has some declared an unused variable, which you should probably either be using (or remove):

max_file_safe=100

Question 3

Thanks for your valuable suggestions. Yes, I had plenty of files to be deleted, so it was always looping around to delete the files and was not going for 60 seconds sleep. Now I am using a counter for it so that after a defined number of files, it goes back to sleep and restart the logmgr_run again after some time. Also I am using bulk deletion if number of files is more . and skip the currently in use files.

Question 4

forgot to mention hat the CPU is reduced by 10% with this.

Question 5

The script can and should be simplified.

Use the exit code of commands

The exit code of a function is the exit code of the last statement. With that in mind, this function can be written much simpler:

logmgr_is_file_in_use() {
 num_of_instances=1
 logmgr_get_current_file_name 1ドル $num_of_instances
 if [ $current_file = 2ドル ]; then
 return 1
 else
 return 0
 fi
}

Like this:

logmgr_is_file_in_use() {
 num_of_instances=1
 logmgr_get_current_file_name 1ドル $num_of_instances
 ! [ $current_file = 2ドル ]
}

Simplify `grep` -> `grep` -> `awk`

awk can do much of what grep can. When you use awk at the end of a pipeline, consider using it more to reduce the number of processes in the pipeline.

Take for example this:

pid=$(ps | grep $PROC_NAME | grep -v grep | awk 'NR=='$instance'{print 1ドル}')

This can be written as:

pid=$(ps | awk -v proc="$PROC_NAME" -v inst=$instance '0ドル ~ proc && NR == inst {print 1ドル}')

Avoid `ls` when you can use globs

This code and the rest that depends on it can be rewritten without an ls:

fds=$(ls /proc/$pid/fd)
for fd in $fds;
do
 last=$fd
done

Like this:

last=
for fd in /proc/$pid/fd/*; do
 last=$fd
done
[ -e "$last" ] || last=

The last line is in case there is no matching file, otherwise the value of last might look like /proc/123/fd/*, which is not a file.

Notice that I emptied last before the loop, to avoid potential bugs due to a pre-existing value.

Never ever parse the output of `ls`

This is a very clumsy way to find the target of a symbolic link.

current_file=$(ls /proc/$pid/fd/$last -l | awk 'BEGIN {FS="->"} {printf 2ドル}')

It seems you're using the ash shell, which should have a readlink command that can do this much cleaner:

current_file=$(readlink /proc/$pid/fd/$last)

Use consistent techniques

I can see multiple equivalent techniques to accomplish the same thing:

Execute sub-shell using $(...) or `...`
Infinite loop using while :; do or while [ 1 ]; do

Be consistent. Use the same technique throughout, and the code will be easier. Of course, pick the better technique. In these examples, the first of the pairs are the better.

Question 6

I did not even know about readlink command :D . Very helpful scripting techniques. Thanks a lot !!

forsvarir forsvarir 11.8k7 gold badges39 silver badges72 bronze badges · Accepted Answer · 2016-09-22 15:56:00Z

If your CPU is rising to 75% and staying there (not bouncing up and down), then it suggest that your script is spinning and not hitting any of your sleep statements. There's a few things that may be worth looking into.

The while loop in logmgr_run keeps going until enough files have been cleared out. If this doesn't happen, then it'll never get to the 60 second sleep.
logmgr_delete_oldest will only sleep if it deletes a file, which only happens if the file isn't in use.

I haven't tested it, and it has been a while since I did script programming, but it looks to me like your algorithm breaks down if the oldest file in either folder is being used at the same time that you breach the usage threshold. It can't delete the oldest file, so it just keeps trying over and over again without succeeding or sleeping.

Rather than having a loop that keeps trying to remove the last file when the threshold is breached it might be better to calculate the amount of files that need to be removed in order to come back under the threshold and try to remove several files in one go / or to skip files if they are in use and move on to the next file rather than stopping and trying the same file again next time.

It may also be worthwhile extending the cleanup to clear a bit more than you need, so that you don't immediately trigger another cleanup next time you go through the loop. So, for example if you breach 70% usage, clear down the files to 65% or 60% so that there is some growth space.

Your script also has some declared an unused variable, which you should probably either be using (or remove):

max_file_safe=100

Thanks for your valuable suggestions. Yes, I had plenty of files to be deleted, so it was always looping around to delete the files and was not going for 60 seconds sleep. Now I am using a counter for it so that after a defined number of files, it goes back to sleep and restart the logmgr_run again after some time. Also I am using bulk deletion if number of files is more . and skip the currently in use files.

Stack Exchange Network

High CPU utilisation for Logs Manager shell script

2 Answers 2

Use the exit code of commands

Simplify `grep` -> `grep` -> `awk`

Avoid `ls` when you can use globs

Never ever parse the output of `ls`

Use consistent techniques

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

High CPU utilisation for Logs Manager shell script

2 Answers 2

Use the exit code of commands

Simplify grep -> grep -> awk

Avoid ls when you can use globs

Never ever parse the output of ls

Use consistent techniques

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

Simplify `grep` -> `grep` -> `awk`

Avoid `ls` when you can use globs

Never ever parse the output of `ls`