I'm working with a small bash code which is working fine but i'm just looking if there is better way to formulate this, In this code i'm looking for the Files between year 2002 and 2018 on the 7th column.
Below is the working code,
Script:
#!/bin/bash
# scriptName: Ftpcal.sh
FILE="/home/pygo/Cyberark/ftplogs_3"
AWK="/bin/awk"
GREP="/bin/grep"
USERS="`"$AWK" '7ドル >= "2002" && 7ドル <= "2018"' $FILE | "$AWK" '{print 3ドル}' | sort -u`"
for user in $USERS;
do
echo "User $user " | tr -d "\n";
"$AWK" '7ドル >= "2002" && 7ドル <= "2018"' "$FILE" | "$GREP" "$user" | "$AWK" '{ total += 4ドル}; END { print "Total Space consumed: " total/1024/1024/1024 "GB"}';
done | column -t
echo ""
echo "=============================================================="
"$AWK" '7ドル >= "2002" && 7ドル <= "2018"' "$FILE" | "$AWK" '{ total += 4ドル}; END { print "Total Space consumed by All Users: " total/1024/1024/1024 "GB"}';
echo ""
Actual data Result:
$ sh Ftpcal.sh
User 16871 Total Space consumed: 0.0905161GB
User 253758 Total Space consumed: 0.0750855GB
User 34130 Total Space consumed: 3.52537GB
User 36640 Total Space consumed: 0.55393GB
User 8490 Total Space consumed: 3.70858GB
User tx-am Total Space consumed: 0.18992GB
User tx-ffv Total Space consumed: 0.183137GB
User tx-ttv Total Space consumed: 17.2371GB
User tx-st Total Space consumed: 0.201205GB
User tx-ti Total Space consumed: 58.9704GB
User tx-tts Total Space consumed: 0.0762068GB
------------ snipped output --------------
==============================================================
Total Space consumed by All Users: 255.368GB
Sample data:
-rw-r--r-- 1 34130 14063436 Aug 15 2002 /current/focus-del/files/from_fix.v.gz
-rw-r--r-- 1 34130 14060876 Jul 12 2007 /current/focus-del/files/from1_fix.v.gz
-rw-r--r-- 1 34130 58668461 Feb 23 2006 /current/focus-del/files/from_1.tar.gz
-rw-r--r-- 1 34130 14069343 Aug 7 20017 /current/focus-del/files/from_tm_fix.v.gz
-rw-r--r-- 1 34130 38179000 Dec 7 20016 /current/focus-del/files/from_tm.gds.gz
-rw-r--r-- 1 34130 15157902 Nov 22 20015 /current/focus-del/files/from_for.tar.gz
-rw-r--r-- 1 34130 97986560 Nov 4 20015 /current/focus-del/files/from_layout.tar
Sample Result:
$ sh Ftp_cal.sh
User 34130 Total Space consumed: 0.0808321GB
==============================================================
Total Space consumed by All Users: 0.0808321GB
I'm okay with any better approach as a review process to make it more robust.
Thanks.
1 Answer 1
AWK="/bin/awk"
It's easier and more readable if you just set your PATH to something appropriate.
USERS="`"$AWK" '7ドル >= "2002" && 7ドル <= "2018"' $FILE | "$AWK" '{print 3ドル}' | sort -u`"
Backticks should almost always be replaced by $( ... )
, which is faster because it does not invoke a subshell.
Literal numbers should not be quoted. It happens to still do what you want in awk; in some languages it won't. A bad habit, easily avoided.
There's no need to invoke awk a second time to extract the third field. Simply pair the action {print 3ドル}
with the condition (7ドル >= ...
) that's already there.
It's good form to indent the body of a for
block (or any other block).
echo "User $user " | tr -d "\n";
To suppress a newline on echo
, use echo -n
.
column -t
This has some awkward consequences, like tabs inside of labels ("TotalTAB
Space") and unaligned numbers. printf
will give much prettier results. Both bash and awk provide it.
total/1024/1024/1024
Nothing wrong with this, as such, but 2**30
is useful shorthand for gigabyte.
==============================================================
Bash can generate sequences like this with the idiom printf "=%.0s" {1..62}
. The =
is the character and 62
is the count.
You're traversing the file three times and extracting the same information each time. This is going to get slow as the file grows. Awk has associative arrays: you can store a subtotal for each user, then iterate and print those subtotals at the end of the awk script, accomplishing the whole thing in one go.
Putting it all together:
/bin/awk -vusrfmt="User %-20s Total Space consumed: %11.6f GB\n" \
-vsumfmt=$( printf "=%.0s" {1..62} )"\nTotal Space consumed by All Users: %.6f GB\n" '
7ドル >= 2002 && 7ドル <= 2018 {
subtot[3ドル]+=4ドル
tot+=4ドル
}
END {
for (u in subtot) printf usrfmt, u, subtot[u] / 2**30
printf sumfmt, tot / 2**30
}'
-
\$\begingroup\$ "Backticks should almost always be replaced by $( ... )". Why just "almost always" and not "always" ? \$\endgroup\$janos– janos2019年04月20日 21:24:24 +00:00Commented Apr 20, 2019 at 21:24
-
\$\begingroup\$ I think backticks can improve readability versus nested
$( $ ( ) )
; consider something likex=$( printf %d $( wc -l $file ) )
; replacing the inner parens with backticks is okay there. \$\endgroup\$Oh My Goodness– Oh My Goodness2019年04月21日 00:44:06 +00:00Commented Apr 21, 2019 at 0:44