I am trying to see how I can speed up the below script that reports disk usage.
The timed find
command towards the end is the problematic line that I am trying to speed up. This script is run on directories that have over 6-7TB of data, and it takes 16-18hrs. However, I want it to run in under 8hrs. Can someone please suggest alternate ways to modify this script?
# -disk_check.csh takes dir name as a mandatory argument and an options <num> or -verbose as a second argument.
# Ex1: disk_check <dir_name> - Reports out the disk usage per user and the total disk consumption
# Ex2: disk_check <dir_name> -verbose -Along with the above, it also lists all files by size in the given directory
# Ex3: disk_check <dir_name> -<num> -Similar to Ex2, But here it reports out the top <num> files by size in the given directory
if ($#argv == 0) then
echo " Error : Dir path missing"
echo " Syntax : disk_check <dir-name> <verbose>"
echo " verbose gives a list of all files per individual sorted by size"
exit 0
endif
set cwd = $argv[1]
if ($cwd =~ "-help") then
echo " Error : Dir path missing"
echo " Syntax : disk_check <dir-name> <-verbose>"
echo " -verbose gives a list of all files per individual sorted by size"
exit 0
endif
if ($#argv > 1) then
set opt = $argv[2]
#echo "opt : $opt"
endif
if ( -d $cwd ) then
set ava = `df -h $cwd | tail -1 | awk '{print 1ドル'}`
set tot = `df -h $cwd | tail -1 | awk '{print 2ドル'}`
set ad = `df -h $cwd | tail -1 | awk '{print 3ドル'}`
set pcu = `df -h $cwd | tail -1 | awk '{print 4ドル'}`
echo ""
echo "Summary for dir ${cwd}: $tot Used (${pcu})"
echo "-----------------------------------------------------------------------------"
echo " Total Volume $ava"
echo " Available on disk $ad "
echo " Percentage used $pcu"
echo ""
echo "Summary by User:"
printf "%sUser%15sSize%10sCount\n" ""
echo "---------------------------------------------"
# This is the command that takes a long time:
time find $cwd -type f -printf "%u %s\n" | awk '{user[1ドル]+=2ドル;count[1ドル]++}; END{ for( i in user) printf "%s%-13s%5s%-0.2f%s%5s%7s\n","", i, "", user[i]/1024**3,"GB", "", count[i]}'| sort -nk2 -r
if ($#argv > 1) then
if ($opt =~ "-verbose") then
echo "\nDetail, Sorted by size"
printf " User%15sFile%15sSize\n" ""
echo "---------------------------------------------------"
find $cwd -type f -not -path '*/\.*' -printf "%-13u | %-50p | %-10s \n" | sort -nk5 -r
endif
1 Answer 1
Potential bugs
When I run the code without the -verbose
option like this:
disk_check.csh /tmp
I see this message on stderr
:
then: then/endif not found.
I see two potential places in the code where there could be a missing endif
.
Here is one:
if ( -d $cwd ) then
set ava = `df -h $cwd | tail -1 | awk '{print 1ドル'}`
set tot = `df -h $cwd | tail -1 | awk '{print 2ドル'}`
Here is another:
if ($opt =~ "-verbose") then
echo "\nDetail, Sorted by size"
printf " User%15sFile%15sSize\n" ""
Both areas of the code should be reviewed.
Note that this message is easy to miss if it is mixed in with all the other
expected output on stdout
.
DRY
It is great that you print out usage information. However, this code is nearly duplicated twice. I say "nearly" because the only difference I can see is:
verbose
vs. :
-verbose
Unfortunately, since you are using a shell scripting language with very limited programming capability, there is no clean way around this.
Comment
To reduce clutter, delete this commented-out code line:
#echo "opt : $opt"
Documentation
It is great that you added header comments to describe your code.
However, you mention an option you refer to as <num>
and -<num>
,
but it is unclear how it should be used. It does not seem to have any
affect for me. You should elaborate with more concrete examples.
The comments should mention the -help
option since the code uses it.
Explore related questions
See similar questions with these tags.
quota
to get a report. \$\endgroup\$find
command has already been identified as a performance problem. \$\endgroup\$find
,sort
andawk
call. Given the amount of data it's used on, I'm not surefind
is the only problem here. \$\endgroup\$