2
\$\begingroup\$

Consider the following CSV file:

A; B; ;
B; ; A;
C; ; E F;
D; ; E;
E; C; ;

The fields:

  • 1ドル: the jname. A unique id of the entry.
  • 2ドル: a " "(space)-separated list of incond.
  • 3ドル: a " "(space)-separated list of outcond.

For the "link" A-B to be valid, jname A must define B as outcond, and job B must define A as incond.

In the above example, D-E is not a valid "link" because E doesn't define D as incond. C-F is not a valid "link" because F doesn't exist.

A cond is not valid if the link it forms is not valid. The script must detect all non valid conds and which jobs are infected.

#!/usr/bin/awk -f
BEGIN {
 FS=" *; *";
 delim = "-";
 conds[""]=0;
}
{
 icnd_size = split(2,ドル incond_list, " ");
 for (i=1; i<=icnd_size; ++i) {
 conds[incond_list[i] delim 1ドル]++;
 }
 ocnd_size = split(3,ドル outcond_list, " ");
 for (i=1; i<=ocnd_size; ++i) {
 conds[1ドル delim outcond_list[i]]--;
 }
}
END {
 for (i in conds) {
 sz = split(i, answer, delim);
 if (conds[i] == 1) {
 j = answer[2];
 c = answer[1];
 inorout = "INCOND";
 }
 if (conds[i] == -1) {
 j = answer[1];
 c = answer[2];
 inorout = "OUTCOND";
 }
 if (conds[i] != 0)
 print "Invalid", inorout, c, "on job", j;
 }
}

The script works, although I do not have large data to test against. I see 2 problems with it:

  1. the script will break if some cond has the character delim in the name
  2. the script might break (and/or return false positives) if a line is inserted twice or if two lines have the same jname.

I could use any tip on addressing the two problems, as well as any critique of the code, it's literally my first Awk code.

janos
113k15 gold badges154 silver badges396 bronze badges
asked Mar 5, 2012 at 3:46
\$\endgroup\$
2
  • 1
    \$\begingroup\$ Not really a CSV file is it! C => Comma => ','. You have a SSC. Semicolon separated file. \$\endgroup\$ Commented Mar 5, 2012 at 17:09
  • \$\begingroup\$ way late on this one, if you're still interestest, are you trying to make a tsort substitute here? Good luck. \$\endgroup\$ Commented Mar 11, 2016 at 17:43

1 Answer 1

1
\$\begingroup\$

Your main questions

The script works, although I do not have large data to test against.

You don't necessarily need a large dataset. It's better to think of all possible corner cases. For example, your sample data demonstrates failures of OUTCOND but not of INCOND. Also, although there is an example of more than one outgoing links, but there is no example of more than one incoming links. There are not too many interesting cases, if you add examples for all them, then you can be fairly confident in your solution.

  1. The script will break if some cond has the character delim in the name

If you want to be really safe, you could add a sanity check for that, and raise an error when such name is found, for example by calling exit with a non-zero value.

  1. The script might break (and/or return false positives) if a line is inserted twice or if two lines have the same jname.

Ditto.

Simplify

Many things can be simplified in this code.

The conds[""]=0; is unnecessary, you can simply delete that line.

Instead of this:

icnd_size = split(2,ドル incond_list, " ");
for (i=1; i<=icnd_size; ++i) {
conds[incond_list[i] delim 1ドル]++;
}

You don't really need the return value of split, because instead of a counting loop, you can use a more idiomatic for-each loop:

split(2,ドル inconds, " ");
for (i in inconds) {
 conds[inconds[i] delim 1ドル]++;
}

The same goes for outconds as well.

Mutually exclusive if statements

These if statements cannot be both true at the same time:

if (conds[i] == 1) {
 # ...
}
if (conds[i] == -1) {
 # ...
}

So they should be chained together with an else if.

Formatting

Instead of this:

for (i=1; i<=ocnd_size; ++i) {
conds[1ドル delim outcond_list[i]]--;
}

It would be better to write like this:

for (i = 1; i <= ocnd_size; ++i) {
 conds[1ドル delim outcond_list[i]]--;
}

Naming

Some of the names are not so great. For example sz, i, j, c in the END block. sz is actually unnecessary, and I would rename the others to pair, job, and cond, respectively.

Putting it together

Consider this alternative implementation:

#!/usr/bin/awk -f
BEGIN {
 FS = " *; *";
 delim = "-";
}
{
 split(2,ドル inconds, " ");
 for (i in inconds) {
 conds[inconds[i] delim 1ドル]++;
 }
 split(3,ドル outconds, " ");
 for (i in outconds) {
 conds[1ドル delim outconds[i]]--;
 }
}
END {
 oformat = "Invalid %s %s on job %s\n";
 for (pair in conds) {
 split(pair, parts, delim);
 if (conds[pair] == 1) {
 job = parts[2];
 cond = parts[1];
 inorout = "INCOND";
 } else if (conds[pair] == -1) {
 job = parts[1];
 cond = parts[2];
 inorout = "OUTCOND";
 }
 if (conds[pair] != 0) print "Invalid", inorout, cond, "on job", job;
 }
}
answered Jan 12, 2017 at 21:14
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.