4
\$\begingroup\$

I would like you to take a look at this simple script I wrote that's supposed to emulate the Unix sort [FILE] | uniq -cd command. What's different about my script though is that it also lists line numbers telling the user where occurrences of duplicate lines are located in the file. Please, tell me what you think as well as what parts of the script, if any, you think I should rework to make it better.

################################################################################
# File name: uniq.awk
# ===================
#
# Find and report all occurrences of duplicate lines in a text file.
#
#
# Usage: awk -f uniq.awk [FILE]
#
################################################################################
{
 x = lines[0ドル]["count"]++; # Count the number of occurrences of a line
 lines[0ドル]["NR"][x] = NR; # Also save the number lines
 # Find the length of the longest line to make it the column width
 if (x > 0) {
 if (length(0ドル) > max) {
 max = length(0ドル);
 }
 }
}
END {
 # If the file contains no lines to process, that is, it's empty,
 # return an exit status code of 1 to indicate the fact.
 if (!(NR > 0)) {
 exit 1;
 }
 # Prepare the format string
 # Column #1: number of occurrences of the line
 # Column #2: line itself
 # Column #3: line numbers where all the lines are located
 fmt_s = "%s: %" max "-s (%s)\n";
 for (i in lines) {
 if (lines[i]["count"] > 1) {
 for (j = 0; j < lines[i]["count"]; j++) {
 s = s lines[i]["NR"][j] ", ";
 }
 # Get rid of the trailing comma and space
 s = substr(s, 1, length(s) - 2);
 printf(fmt_s, lines[i]["count"], i, s);
 s = "";
 }
 }
}

Test:

$ cat > data
car
baby
car
man
woman
woman
key
woman
$
$ cat -n data
 1 car
 2 baby
 3 car
 4 man
 5 woman
 6 woman
 7 key
 8 woman
$ 
$ awk -f uniq.awk data
2: car (1, 3)
3: woman (5, 6, 8)
asked Aug 26, 2016 at 9:52
\$\endgroup\$

1 Answer 1

2
\$\begingroup\$

I don't know about Awk, but for this section of code:

END {
# If the file contains no lines to process, that is, it's empty,
# return an exit status code of 1 to indicate the fact.
if (!(NR > 0)) {
 exit 1;
}

You should print out an appropriate message to the terminal instead of just exiting with exit code 1. This, in my opinion, provides users with better feedback.

Jamal
35.2k13 gold badges134 silver badges238 bronze badges
answered Sep 16, 2016 at 7:27
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.