The Linux and Unix Menagerie: duplicate

Showing posts with label duplicate. Show all posts

Friday, September 19, 2008

Combinations Vs. Permutations On Linux and Unix

Hey There,

Here's a little something to finish the week off and tie up some loose ends. You may have noticed in our Perl script to maximize guaranteed matches in any give number pool that we did all of our work using permutations, which we then went through the trouble of sorting and removing duplicates. Probably a few people out there were thinking: "Why permutations if 1,2,3 and 3,1,2 are going to be considered equal? Isn't that just a bunch of extra work?" The answers to those questions are "why not" and "yes" ;)

If you feel like taking the time now, to save yourself time in the future, it would be well worth your while to replace the permutation subroutine in that script with one that generates all possible combinations instead. The difference between permutations and combinations is very relevant here because, really, we wanted all possible combinations but the author's rampant paranoia and fear of missing something caused him to generate all permutations and then weed that pool down to all possible combinations.

And what is the difference between permutations of a string and combinations of a string (assuming each alphanumeric character in the string is an element of the string)? Very simply: Permutations are all the possible ways "all" elements of the string can be translated, while combinations are all the possible unique groupings of any (and all) the elements that a string contains.

For example: The permutations of a string "abc1" would be composed of (in whatever order you want):

1 c b a
c 1 b a
1 b c a
b 1 c a
c b 1 a
b c 1 a
1 c a b
c 1 a b
1 a c b
a 1 c b
c a 1 b
a c 1 b
1 b a c
b 1 a c
1 a b c
a 1 b c
b a 1 c
a b 1 c
c b a 1
b c a 1
c a b 1
a c b 1
b a c 1
a b c 1

while the combinations of a string "abc1" would be composed of:

a
abc1
ab
1
c
bc1
abc
b
bc
c1

Here's wishing you all an enjoyable weekend. Hope this little script helps you with your combination generation (no permutations today :)

Cheers,

Creative Commons License

This work is licensed under a
Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License

#!/usr/bin/perl

#
# combos.pl
#
# 2008 - Mike Golvach - eggi@comcast.net
#
# Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License 
#

print "Enter a string... any string: ";
chomp($string = (<STDIN>));

my %results;
for my $i ( 0 .. length $string ) {
 for my $j ( 0 .. length $string ) {
 $results{ substr $string, $i, $j } = 1;
 }
}

delete $results{''};

$results = keys %results;
print "\nYour string $string resulted in $results combinations!\n\n";
foreach (keys %results) {
 print "$_\n";
}

, Mike

Please note that this blog accepts comments via email only . See our Mission And Policy Statement for further details.

Posted by Mike Golvach at 12:37 AM

combination, duplicate, linux, perl, permutation, script, sort, unix

Thursday, March 20, 2008

Generating Only Unique Content From Two Somewhat Similar Files

Hey There,

I see this, in one variation or another, on the message boards from time to time. So, touching back on our earlier post regarding making our own diff to deal with directory permissions, today we're going to look at a script that performs another function not built in to the standard "diff" command on Linux or Unix, and fairly simply implemented in ksh or bash.

Our script today, takes the input of two files (generally text files) of different sizes; however, they can be of equal size. It doesn't seem to make a difference ;) The only restriction on the usage of the script we're presenting today is that, assuming both files are of unequal size, the smaller of the two files should be listed as the primary argument to the script (we'll call it "rdiff") and the larger file should be the secondary argument, like so:

host # ./rdiff smallfile largefile

and, of course, if they're of equal size:

host # ./rdiff file file

The output of the script will be a file named "Unique.out.smallfile.largefile" with "smallfile" and "largefile" being the values of the file names passed to the script on the command line.

Basically, our script uses sed and grep to determine if lines in the smaller file are duplicated in the larger file. If those lines are duplicated, they are then removed from the final output, so that your "unique" output file only includes lines that existed exclusively in the small and large file. All duplicate information is removed.

Enjoy and cheers,

Creative Commons License

This work is licensed under a
Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License

#/bin/ksh

#
# rdiff - find unique content in two files
#
# 2008 - Mike Golvach - eggi@comcast.net
#
# Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License 
#

if [ $# -ne 2 ]
then
 echo "Usage: 0ドル SmallFile LargeFile"
 exit
fi

FileA=1ドル
FileB=2ドル

cp $FileB ${FileB}.old

for x in `<$FileA`
do
 grep ^${x}$ $FileB >/dev/null 2>&1
 if [ $? -eq 0 ]
 then
 echo $x >>tmpfile
 fi
done

for x in `<tmpfile`
do
 sed "/^$x$/d" $FileB >>newtmpfile
 mv newtmpfile Unique.out.${FileB}.${FileA}
done

rm tmpfile

, Mike

linux unix internet technology

[フレーム]

Posted by Mike Golvach at 12:37 AM

bash, diff, duplicate, files, grep, ksh, linux, script, sed, unfiles, unix

Looking For A Good Read While You're Sitting Around Doing Nothing?

Give Your Self A Tiny Break Today!

The Linux and Unix Menagerie

Friday, September 19, 2008

Combinations Vs. Permutations On Linux and Unix

Thursday, March 20, 2008

Generating Only Unique Content From Two Somewhat Similar Files

Bookmark Us!

LXer - Linux News Feed

Linux And Unix Resources

Blog Archive

Top Post-Label Index