skip to main | skip to sidebar
Showing posts with label array. Show all posts
Showing posts with label array. Show all posts

Friday, January 9, 2009

Questions Regarding Perl References In Linux And Unix

Hey There,

Today's post is a follow up to our post, from earlier this week, regarding understanding Perl variable references . The feedback was generally positive, but one particular question was asked by quite a few people. And that question was:

If there's nothing new under the sun, then how can there be more things in heaven and earth than are dreamt of in anyone's philosophy?

To be quite honest, I have no idea. I quit pondering the meaning of "it all" when I was introduced to nihilism at the age of 18, which, consequently, fit in perfectly with my intensive practical application of Onanism. My philosophy was both fun "and" meaningless ;)

But, that wasn't the real question. I just went off on a tangent there for a moment, as it's my wont to do. Apologies for any trauma caused by the high-brow self-abuse joke. It has nothing to do with Linux, Unix or Perl unless you, like Sammy Hagar, consider it all mental masturbation ;)

The real question on people's minds was this:

What purpose, at all, do references in Perl serve, and why would I ever want to complicate things by using them?

This is a valid question; especially if you aren't hard-core into Perl or only use it as a work aide. Quite frankly, I, myself, rarely have any use for references. But here's a quick rundown of a few reasons why it's good to know them and understand how they work (Note that, in Perl, references and pointers - as they're more commonly known in programming languages like C - are terms that often get used interchangeably. For our purposes, they both mean the same thing).

1. For folks who've been using Perl since version 4 and earlier, references were an absolute "must" to understand if you wanted to do any semi-complex work with subroutines. At that point in time, you could only pass scalar variables as arguments to a subroutine. This made it impossible for you to, for instance, pass an array, hash or more complex variable as an argument to your subroutine. Thus, the reference made it possible for you to pass an array or hash as an argument to your subroutine, since the Perl reference to either was (and is) always a scalar variable. I think this is the one issue that is really being addressed when folks question the usefulness of Perl references these days. As the below scriptlet shows, the days of only being able to pass a scalar variable to your subroutines are long since gone. The language has been improved upon and, now, you can pass arrays, hashes, etc, directly to your subroutines as arguments, without having to make use of scalar references to them (more on that in point 2). Check out the following for a demonstration of how Perl can now handle these types of arguments (a few of you actually already have this in your mail, as I put it together to demonstrate this principle earlier this week in response to several emails - I aim to please :):

host # cat shell.pl
#!/usr/bin/perl

$bfile = "what the heck";
@bfile = qw(what the heck);

scalarsub($bfile);
arraysub(@bfile);

sub scalarsub {

my $file=shift;
print "SCALAR: F $file\n";
}

sub arraysub {

my @file=@_;
foreach $item (@file) {
print "ARRAY: F $item\n";
}
}
--- test run of script ---------
host # ./shell.pl
SCALAR: F what the heck
ARRAY: F what
ARRAY: F the
ARRAY: F heck


As you can see, above, today's Perl doesn't require you to reference non-scalar variables if you want to pass them as arguments to a subroutine.

2. References still do have a place in Perl, even when it comes to subroutines. If it sounds like I'm contradicting what I just wrote, please allow me to dig myself further into a hole... I mean, explain ;)

Consider, if you will, the following situation in which making use of Perl references would actually save you considerable time and, probably, a gray hair or two. If you have a simple subroutine (like in our scriptlet above), it is very easy to pass it several scalar arguments (assuming no restrictions on the amount of arguments you can pass) and, logically, any amount, and/or combination, of different types of variables. But, what happens when you call your subroutine with arguments like this?:

@array1 = qw(1 2 3 4 5);
@array2 = qw(1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19);
@array3 = qw(1 2 3);
mynewsubroutine(@array1, @array2, @array3);
@_ = 1 2 3 4 5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 1 2 3;
<-- within your subroutine.
my (@array1,@array2,@array3) = @_;

Now, while there are more than several clever ways you can get around parsing this input (which is one of the great things about Perl), it can be difficult to process this sort of input within the subroutine; especially if the arrays, themselves, are of variable and/or undetermined size (perhaps @array1 contains 5 scalar variables, @array2 contains 19 scalar variables and @array3 contains an additional 3 scalar variables - we'll, again, stay away from passing arrays of hashes of arrays for now ;). This can constitute a lot of work on the programmer's part to craft the subroutine in such a way that it handles all of the input and keeps it separated, correctly, into the three groups, since the totality of the arguments passed to a subroutine can be referenced by @_ and, if three separate arrays are passed, @_ will contain all 3 of them (27 scalar variables) and you can't separate them by index @_[0], etc, since the input to your subroutine is, in effect, one large array, rather than a collection of 3 smaller arrays. In effect, and in reality, your subroutine would put all of the members of all of the 3 arrays into @array1.

Hopefully, I'm not drifting too far off of the reservation, here, as I fear I'm wandering into the land of overly convoluted explanation again ;) The advantage to you, the programmer, in this instance, would be that you could use Perl references to pass scalar references to each array to your subroutine. As difficult as it would be to manage the above three arrays using straight passing to your subroutine (you would have to determine their size before passing them in and then keep track of that by passing even more information, not to mention re-allocating those members back into the subroutine's local arrays), it would be a bit easier to pass your subroutine three references to your three arrays, like so:

mynewsubroutine(\@array1, \@array2, \@array3);

Now, the reason this makes life easier for you is that you're only passing your subroutine 3 scalar variables, rather than an indeterminate amount of any other kind. Inside the subroutine, you can easily work with the variables using the dereferencing techniques we went over in our understanding Perl references post. For instance, you could do "foreach" on all 3 arrays separately, and process the internal array variables that way, effectively making the size of any of those arrays inconsequential (less work for you), like so:

my $array1 = shift;
foreach $thingy (@{$array1}) {
print "Thingy $thingy\n";
}


I'm oversimplifying to a certain degree, but this is getting out of hand just swimmingly in spite of my best efforts ;)

3. Although I promised I wouldn't go here, I must (but I'll keep it theoretical so we can dispense with it post-haste). If you ever do any super-complex programming, like building matrices, doing crazy manipulation of filehandles (remember that "any" Perl data type can be referenced) or things along the nature of manipulating arrays of hashes of multiple nested arrays of hashes, using references can make it a lot simpler for you to keep your head from exploding (which is a lousy way to end the day ;)

Hopefully, these answers have been somewhat helpful, and/or illustrative regarding the question at hand (why use references at all, and are they even really necessary?) If you're interested in reading more on the subject, check out Chapter 18 of bjnet.edu.cn's introduction to Perl . It's written very well and is probably simpler to follow than my meandering over-explanations ;) Plus, all the other chapters are available online, so you could download them all and have yourself a very nice free Perl reference in however long it takes you to download the HTML.

Cheers to all, and thank you for your response to the post that spawned this one. There's nothing quite so rewarding as knowing that I'm actually writing something at least a few people are really interested in reading :)

, Mike




Discover the ClickBank affiliate program that pays 100% commission!



Please note that this blog accepts comments via email only . See our Mission And Policy Statement for further details.

Wednesday, January 7, 2009

Understanding Perl Variable References On Linux And Unix.

Hey there,

Today we're going to take a look at a part of Perl that a lot of folks shy away from; mostly because (from my experience) they feel it's too abstract a notion or too complicated to understand. For today, I'm referring to Perl references ;) And here's the thing; nothing could be farther from the truth. It's just about as simple as the sentence preceding the last. When I referred to Perl references I was, for the most part, laying the foundation for easily understanding the entire concept. If you attack the problem semantically, and try not to think of it as a bunch of backslashes and arrows and symbols, it makes perfect sense :) If I'm wrong, and this post leaves you reeling in confusion and pain, please let me know so I'll stop being so cavalier with my prose :)

So, let's take a look at Perl references and how they can be used, most basically, in a step-by-step fashion; from the simplest of beginnings to the not-so-complex middle (We'll leave references to hashes of arrays of references to other hashes for some other day ;)

For all of these examples, we'll use command line examples, so you can cut and paste them to try them out, rather than pretend that we're inside a Perl script.

1. A simple way to look at Perl References: The basis of any Perl reference is the variable, or value that you're referring to. At the most basic level, any variable assignment is a reference. For instance, look at these basic statements:

host # perl -e '$a = "bob";'
host # perl -e '@a = qw(bob joe);'
host # perl -e '%a = (bob => joe);


These are all just simple variable assignments (with $ indicating a scalar variable, @ representing an array and % representing a hash). However, you can think of them as references (which will make the transition to understanding textbook references much more smooth. The variable $a, for example, has an assigned value of the string "bob." So, if you look at that in a different way, the $a variable refers to the string "bob," or (another way) the variable $a is NOT the string "bob," but a reference to that string (or scalar) value.

BTW, if this part of the post is beyond where you're at with Perl, take a look back at some of our older posts on simple arithmetic and simple variables in Perl that deal with these more basic principles. There should be enough links on those two pages to connect you to all the other ones on this site. If not, the blog search feature (although it's very generous in its interpretations -- search for the letter "a" to see what I mean ;) should help you find what you need.

2. Looking at actual Perl References: A textbook Perl reference is the same thing as we discussed in point 1, except taken up (or out) one meta-level. So instead of having the relationship of reference ($a) and referent ("bob") that we had before, we're going to assign one scalar variable a reference to any of the three variables from before, rather than from the variables directly to the values. So, to reference any of these three we could do the following (note that for this basic lesson, the Perl reference will always be a scalar since, at its core, it always is; even if that scalar value is a part of a larger array or hash). The symbol that denotes that you're setting your variable's value to a reference is the backslash (\) character:

host # perl -e '$a_ref1 = \$a;'
host # perl -e '$a_ref2 = \@a;'
host # perl -e '$a_ref3 = \%a;'


So now we have three very simple Perl references. $a_ref1 has the value of a reference to the $a scalar variable, $a_ref2 has the value of a reference to the @a array and $a_ref3 has the value of a reference to the %a hash. (Note that you can have a Perl variable refer to itself, although the uses for this are somewhat limited and generally not necessary for basic Perl scripting. Ex: $a_ref4 = \$a_ref4 <-- $a_ref4 has the value of a reference to itself.

3. Extracting values from Perl References: This is just as easy as extracting values from regular variables, except, as before, you have think one more hop. Whereas, with a regular variable, you would extract the value of that variable directly, with a Perl reference, you need to extract the value of the variable that is being referenced by your reference. It sounds worse than it is ;) For instance, if we accept that the scalar variable $a is equal to "bob," we know that we can extract the value of $a by doing the following (as before):

host # perl -e '$a = "bob";print "$a\n";'

Whereas, if we create a reference (another scalar variable) to the variable $a, and call that $a_ref1, we need to extract the value from the variable that we are referencing. A simple and comfortable approach to extracting this value would be the following:

host # perl -e '$a = "bob"; $a_ref1 = \$a; print "${$a_ref1}\n";'

In this instance we've simply peeled the onion, so to speak (insert your favorite peelable vegetable or fruit here ;). In order to extract the variable of $a from the Perl reference $a_ref1 variable, we just stripped it layer by layer. To deconstruct the print statement above, we'll go backward from the statement we used to print the value of the $a_ref1 Perl reference:

a. ${$a_ref1} is what we call to print the value of the variable $a.

b. ${$a_ref1} is actually equal to ${\$a} since $a_ref1's value is a reference to $a (as denoted by "$a_ref1 = \$a;")

c. ${\$a} is equal to ${a} since the we're dealing directly with the referent. $a_ref1 (the variable with the value of the reference actually points to a hex address in memory (usually associated with a Perl file type). You can see the difference in the output of the two commands below:

host # # perl -e '$a = "bob"; $a_ref1 = \$a; print "$a_ref1\n";'
SCALAR(0x2e250)
<-- This is the hexadecimal memory space that the $a_ref1 reference refers to. Your results may vary :)

perl -e '$a = "bob"; $a_ref1 = \$a; print "${\$a}\n";'
bob
<-- This is the value of the referenced variable $a, which we know (from before) is equal to "bob" ($a = "bob" from what seems like so far up the page ;)

d. And, even though we don't need to tell you this, just for completeness' sake: ${a} (or $a - same thing) equals "bob".

4. Extracting values from Perl References that aren't scalar: Finally, some good news :) The principles above apply to all sorts of variable dereferencing. So, for instance, if you wanted to extract the value of the array reference $a_ref2, you could get it by doing:

host # perl -e '@a = qw(bob joe); $a_ref2 = \@a; print "@{$a_ref2}\n";'
bob joe
<-- The whole thing
host # perl -e '@a = qw(bob joe); $a_ref2 = \@a; print "@{$a_ref2}[0]\n";'
bob
<-- array index 0
host # perl -e '@a = qw(bob joe); $a_ref2 = \@a; print "@{$a_ref2}[1]\n";'
joe
<-- array index1

and the same basic principle applies to hashes (%{$a_ref3} would get you all those values). Basically, all you need to do to extract the value of a one-level-deep Perl Reference is to wrap the reference-variable in a curly brackets and preface that with the appropriate symbol ($ for scalar, @ for array, % for hash, etc).

5. What to do if you have no idea what kind of Perl Reference you're dealing with: Fortunately, there exists - in the very heart of Perl - a function to deal with just this sort of predicament. It's called, for some strange reason, "ref" ;) On many systems, doing something like this:

host # perl -e '@a = qw(bob joe); $ref_type = ref(\@a); print "$ref_type\n";'
ARRAY


is all you need to do to get back the type of reference you're dealing with (Obviously, we knew it was an array since we're doing these self-contained command line scripts, but you could use the ref function against any Perl Reference and get the value from it. One thing to note about the ref function is that it doesn't always work as expected. For instance, if you call the function ref on a straight-up scalar, array or hash variable, it should return "undefined." This is normal, since those straight-up variables are "not" references. However, sometimes, even when you are dealing with a reference, you won't get any feedback on your command line. This isn't to say that ref doesn't know what kind of reference you're working with; just that it's not in the mood to tell you ;)

You can get around this little hassle pretty simply by just writing a simple type-check. So if you run the following:

host # perl -e '%a = (bob => joe); $ref_type = ref(\%a); print "$ref_type\n";'

and you don't get the return of

HASH

as you would expect to, you can figure out what the return from ref was anyway. The two most basic ways to do this range from cowboy to academic ;)

a. Cowboy: Just print the variable that points to the reference, like we did above, to get the hexadecimal address (instead of the value of the referenced variable), since this is accompanied by the reference type:

host # perl -e '%a = (bob => joe); $ref_type = \%a; print "$ref_type\n";'
HASH(0x2e26c)


b. Academic: Use a simple if-condition to test and see what kind of output ref returns

host # perl -e '%a = (bob => joe); $ref_type = ref(\%a); if ($ref_type eq "HASH") {print "HaSH FOUND!\n"};'
HaSH FOUND!


Of course, you could check just to see if the value is even "defined," since, if it isn't, you're not dealing with a reference. 99% of the time, Perl will do the right thing and tell you what the ref function returns. For all I know, the 1% of the time it doesn't work for me is because I completely screwed up ;)

Perl also deals with references to a lot of different file and object types to which these same basic principles apply. So, if you're dealing with a pipe or another type of file or variable, you can still use the principles above to help you out. And, for simplicity's sake (until you get used to being utterly confused while in a state of mostly-understanding ;), for every level of referencing that gets added on, you just need to derefence that many times backward (as shown above) to make your way back to the original value of the original variable(s)!

And that wraps up that :)

Hope that helps shed some light on basic Perl References and, again, I'd love to hear what you think about this post; especially with regards to how you felt about it (Was it too simplistic? Too Complicated? Hard to understand? Easy Peasy? ;)

Cheers,

, Mike




Discover the ClickBank affiliate program that pays 100% commission!



Please note that this blog accepts comments via email only . See our Mission And Policy Statement for further details.

Posted by Mike Golvach at 12:14 AM  

, , , , , , ,

Monday, September 15, 2008

Condensing Perl Scripts In Linux and Unix

Hey There,

Today, we're going to go back a bit (we'll be putting out the final script from our number pool series this Wednesday) and take a look at a subject we've visited before in posts on making our Thesaurus script better and improving our webserver access log parser . Today, we present a fairly simple Perl script that will parse any file consisting of rows of numbers and print out matches meeting certain criteria. This is a fairly banal concept (and most probably well-overdone ;), so the spin we're going to put on it for this 2-parter is to present the Perl script (Suitable for running on any Unix or Linux distro) we've written to do this, first, in its completely "lame" format.

Now, when we say "lame," we don't mean that it doesn't do what it's supposed to; only that it is written in an overly cumbersome and confusing manner and will probably make any Perl enthusiast nauseated at the mere sight of it ;) The script is fairly simple and only requires that you have a file to parse with it. That file should be of the format:

host # cat file
01 02 03 04 05 09
05 18 19 45 33
33 55 666 88 23 12
...


etc, etc, etc... In this version of the script, the "09" version of the number 9 is "hard-coded" and a simple "9" won't match. The script has 4 non-optional arguments that you'll be prompted for if you forget one, like so:

host # ./match.pl
Error Encountered! Invalid or incomplete options!
Usage: ./match.pl -h highNumber -f statFile -n numberOfCombos -m mimimumCombos


A regular execution would look like this:

host # ./match.pl -h 39 -f MYFILE -n 3 -m 2
13 29 19 matches 2 times:
13 19 26 29 36
13 15 19 22 29

13 30 12 matches 2 times:
11 12 13 29 30
05 12 13 21 30

13 31 16 matches 2 times:
01 13 16 28 31
13 16 20 22 31
...


This command line tells "match.pl" that it should only look for numbers (and combinations of numbers) from 1 - 39, that the file to parse is called MYFILE, that we want to match 3 digit combinations (like 01 02 19, etc, from above) and that we only want to get output from the program if those 3 digit combinations match 2 or more times.

Tomorrow, we'll have this script stripped down and revamped and I think you'll be surprised at the difference (not just in length ;) In the meantime, feast your eyes on this monstrosity and feel free to use it. As ugly as it looks, it does actually work :)

Cheers,


Creative Commons License


This work is licensed under a
Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License

#!/usr/bin/perl

#
# match.pl
#
# 2008 - Mike Golvach - eggi@comcast.net
#
# Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License
#

use Getopt::Std;

%options = ();

getopts("h:f:n:m:", \%options);

if ( defined $options{m} && defined $options{h} && defined $options{f} && defined $options{n} ) {
$highnum = $options{h};
$combos = $options{n};
$statfile = $options{f};
$minmatch = $options{m};
if ( ! -f $statfile ) {
usage("File $options{f} Does Not Exist!");
} elsif ( $combos > 6 ) {
usage("Only Combos Up To 6 Please!");
}
} else {
usage("Invalid or incomplete options!");
}


open(FILE, "<$statfile");
@file = <FILE>;
close(FILE);

for ( $lownum = 1; $lownum <= $highnum; $lownum++ ) {
if ( $lownum < 10 ) {
$padded_num = "0$lownum";
} else {
$padded_num = $lownum;
}
push(@numbers, "$padded_num");
}

for ( $times = 1; $times <= $combos; $times++) {
if ( $combos == 1 ) {
foreach $cnum1 (@numbers) {
chomp($cnum);
@match = grep(/$cnum1/, @file);
$match = @match;
if ( $match >= $minmatch ) {
print "$cnum1 matches $match times:\n @match\n";
}
}
} elsif ( $combos == 2 ) {
foreach $cnum1 (@numbers) {
foreach $cnum2 (@numbers) {
chomp($cnum);
if ( $cnum1 != $cnum2 ) {
@match = grep(/$cnum1/ && /$cnum2/, @file);
$match = @match;
if ( $match >= $minmatch ) {
print "$cnum1 $cnum2 matches $match times:\n @match\n";
}
}
}
}
} elsif ( $combos == 3 ) {
foreach $cnum1 (@numbers) {
foreach $cnum2 (@numbers) {
foreach $cnum3 (@numbers) {
chomp($cnum);
if ( $cnum1 != $cnum2 && $cnum1 != $cnum3 && $cnum2 != $cnum3 ) {
@match = grep(/$cnum1/ && /$cnum2/ && /$cnum3/, @file);
$match = @match;
if ( $match >= $minmatch ) {
print "$cnum1 $cnum2 $cnum3 matches $match times:\n @match\n";
}
}
}
}
}
} elsif ( $combos == 4 ) {
foreach $cnum1 (@numbers) {
foreach $cnum2 (@numbers) {
foreach $cnum3 (@numbers) {
foreach $cnum4 (@numbers) {
chomp($cnum);
if ( $cnum1 != $cnum2 && $cnum1 != $cnum3 && $cnum1 != $cnum4 && $cnum2 != $cnum3 && $cnum2 != $cnum4 && $cnum3 != $cnum4 ) {
@match = grep(/$cnum1/ && /$cnum2/ && /$cnum3/ && /$cnum4/, @file);
$match = @match;
if ( $match >= $minmatch ) {
print "$cnum1 $cnum2 $cnum3 $cnum4 matches $match times:\n @match\n";
}
}
}
}
}
}
} elsif ( $combos == 5 ) {
foreach $cnum1 (@numbers) {
foreach $cnum2 (@numbers) {
foreach $cnum3 (@numbers) {
foreach $cnum4 (@numbers) {
foreach $cnum5 (@numbers) {
chomp($cnum);
if ( $cnum1 != $cnum2 && $cnum1 != $cnum3 && $cnum1 != $cnum4 && $cnum1 != $cnum5 && $cnum2 != $cnum3 && $cnum2 != $cnum4 && $cnum2 != $cnum5 && $cnum3 != $cnum4 && $cnum3 != $cnum5 && $cnum4 != $cnum5 ) {
@match = grep(/$cnum1/ && /$cnum2/ && /$cnum3/ && /$cnum4/ && /$cnum5/, @file);
$match = @match;
if ( $match >= $minmatch ) {
print "$cnum1 $cnum2 $cnum3 $cnum4 $cnum5 matches $match times:\n @match\n";
}
}
}
}
}
}
}
} elsif ( $combos == 6 ) {
foreach $cnum1 (@numbers) {
foreach $cnum2 (@numbers) {
foreach $cnum3 (@numbers) {
foreach $cnum4 (@numbers) {
foreach $cnum5 (@numbers) {
foreach $cnum6 (@numbers) {
chomp($cnum);
if ( $cnum1 != $cnum2 && $cnum1 != $cnum3 && $cnum1 != $cnum4 && $cnum1 != $cnum5 && $cnum1 != $cnum6 && $cnum2 != $cnum3 && $cnum2 != $cnum4 && $cnum2 != $cnum5 && $cnum2 != $cnum6 && $cnum3 != $cnum4 && $cnum3 != $cnum5 && $cnum3 != $cnum6 && $cnum4 != $cnum5 && $cnum4 != $cnum6 && $cnum5 != $cnum6 ) {
@match = grep(/$cnum1/ && /$cnum2/ && /$cnum3/ && /$cnum4/ && /$cnum5/ && /$cnum6/, @file);
$match = @match;
if ( $match >= $minmatch ) {
print "$cnum1 $cnum2 $cnum3 $cnum4 $cnum5 $cnum6 matches $match times:\n @match\n";
}
}
}
}
}
}
}
}
}
}

sub usage {
$message = shift;
print "Error Encountered! $message\n";
print "Usage: 0ドル -h highNumber -f statFile -n numberOfCombos -m mimimumCombos\n";
exit(1);
}


, Mike




Please note that this blog accepts comments via email only . See our Mission And Policy Statement for further details.

Posted by Mike Golvach at 12:50 AM  

, , , , , ,

Monday, May 26, 2008

How To Fake Associative Arrays In Bash

Greetings,

As promised in our previous post on working with associative arrays in Linux and Unix, we're back to tackle the subject of associative arrays in bash. As was noted, we're using bash version 2.05b.0(1) and (to my knowledge) bash ( up to, and including, bash 3.2 ) does not directly support associative arrays yet. You can, of course, create one-dimensional (or simple index) arrays, but hashing key/value pairs is still not quite there.

Today we'll check out how to emulate that same functionality in bash that can be found in Perl and Awk. First we'll initialize our array, even though we don't necessarily have to:

host # typeset -a MySimpleHash

To begin with, we'll have to consider what bash already does for us and how we want that to change. For our first example, let's take a look at what happens if we just make assignments to a bash array with, first, a numeric and then an alpha value:

host # MySimpleHash["bob"]=15
host # echo ${MySimpleHash["bob"]}
15
host # MySimpleHash["joe"]="jeff"
host # echo ${MySimpleHash["joe"]}
jeff


This seems to be working out okay, but if we look at the values again, it seems that MySimpleHash["bob"] gets reassigned after we assign the alpha value "jeff"to the key "joe" :

host # echo ${MySimpleHash["bob"]}
jeff


This behaviour repeats itself no matter if we mix integers with strings. Bash can't handle this natively (but, it never claimed it could :)

host # MySimpleHash["bob"]="john"
host # echo ${MySimpleHash["bob"]}
john
host # MySimpleHash["joe"]="jeff"
host # echo ${MySimpleHash["joe"]}
jeff
host # echo ${MySimpleHash["bob"]}
jeff


This looks as though it's going to necesitate an "eval" nightmare much much worse than faking arrays in the Bourne Shell! Ouch! However, we might be able to get around it with a little bit of "laziness" if we just construct two parallel arrays. This, of course, would necessitate keeping the values in both arrays equal and consistent. That is, if "bob" is the third key in our associative array, and "joe" is "bob"'s value, both need to be at the exact same numeric index in each regular array. Otherwise translation becomes not-impossible, but probably a real headache ;)

To demonstrate we'll create a simple "3 key/value pair" associative array using the double-regular-array method, like so:

host # typeset -a MySimpleKeys
host # typeset -a MySimpleVals
host # MySimpleKeys[0]="abc";MySimpleKeys[1]="ghi";MySimpleKeys[2]="mno"
host # MySimpleVals[0]="def";MySimpleVals[1]="jkl";MySimpleVals[2]="pqr"


Now, we should be able to "fake" associative array behaviour by calling the common index from each array (In this fake associative array we have key/value pairs of abc/def, ghi/jkl and mno/pqr). Now that we have the key/value pairs set up, we need to set up the "associative array" so that we can request the values by the key names, rather than the numeric array index. We'll do this in a script later, so our command line doesn't get any uglier:

host # key=1ドル
host #for (( x=0 ; x < ${#MySimpleKeys[@]}; x++ ))
>do
> if [ $key == "${MySimpleKeys[$x]}" ]
> then
> echo "${MySimpleVals[$x]}"
> fi
>done


Testing this on the command line produces suitable, but pretty lame looking results:

host # ./MySimpleHash ghi
jkl


We're going to need to modify the script so that it takes its keys and just returns a value that doesn't need to be cropped (using "echo -n" will solve this nicely):

echo "${MySimpleVals[$x]}"

changes to

echo -n "${MySimpleVals[$x]}" <--- This is helpful if we want to use the result as a return value :)

Still, the shell is going to balk at whatever we do to try and pass an argument to the script like this:

host # ./MySimpleHash{ghi}
-bash: ./MySimpleHash{ghi}: No such file or directory


So, for now, we'll just make it so it's "almost" nice looking:

host # a=`./MySimpleHash {abc}`
host # echo $a
def


It'll get the job done, and could be used in a pinch, but is way too klunky to replace Awk or Perl's built-in associative array handling. Nevertheless, at least we have a way we can get around it if we "have" to :)

I've included the script written during this post below, but, if you want to check out a more complex (and, subsequently, much more elegant) script to fake associative arrays in Perl, you should take a look at this associative array hack for bash and also this bash hack that comes at the issue from a whole different angle.

I think you'll like either one of these scripts a lot better than this one, but hopefully we've both learned, at least a little bit, about the way associative arrays work (and how they differ from one dimensional arrays) in the process :)

Cheers,


Creative Commons License


This work is licensed under a
Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License

#!/bin/bash

# TerribleAAbashHack.sh
# What was I thinking?
#
# 2008 - Mike Golvach - eggi@comcast.net
#
# Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License
#

typeset -a MySimpleKeys
typeset -a MySimpleVals
MySimpleKeys[0]="abc";MySimpleKeys[1]="ghi";MySimpleKeys[2]="mno"
MySimpleVals[0]="def";MySimpleVals[1]="jkl";MySimpleVals[2]="pqr"

key=`echo $@|sed -e 's/{//g' -e 's/}//g'`

for (( x=0 ; x < ${#MySimpleKeys[@]}; x++ ))
do
if [ $key == "${MySimpleKeys[$x]}" ]
then
echo -n "${MySimpleVals[$x]}"
fi
done


, Mike

Tuesday, May 20, 2008

Tainted Perl On Linux or Unix - Helping You Protect You From Yourself

Hey there,

Generally, when you're writing a Perl script to help you automate any Unix or Linux tasks, you don't really need to worry about security. Aside from the fact that you could delete everything on your system or write an infinitely recursing loop that will chew up all the CPU... on second thought, thinking about security is probably a good idea most of the time ;) Super-heavy security checking isn't really necessary for small things, but is always a good idea to work toward when writing scripts to execute important system function and/or for the use of others.

This is where Perl's "Taint" comes into play. It's kind of like "-w"'s less-tolerant cousin. While running Perl with "-w" will print out all sorts of warnings if your code is suspect, Taint will shut you down. Some builds of Perl may have a "-t" option that acts more like the "-w" flag (stronger checking, but only prints out the warnings).

Perl's Taint mode can be added to any script by simply changing the shebang line from:

#!/usr/bin/perl

to

#!/usr/bin/perl -T

Not too much extra work to get it set up, although now you'll have more things to consider when you write your script ;) Taint mode will cause your script to fail, now, if it feels the script is not secure enough! Interestingly enough, Perl will generally turn Taint mode on automatically if you change any Perl script's permissions to setuid or setgid (which is when it's probably needed the most :)

One of the most important things Perl's Taint does is "not" allow external data (input) to be used in any routine, or action, that will affect other data (or whatever else) external to your script, unless you sanitize that input first.

For instance, the following would not be allowed by Perl Taint (Note that most actions that cause Taint errors are system calls, backtick operations or exec calls):

$variable = $ARGV[0]; <--- We've assigned the first argument on our script's command line to the $variable variable.
system("$variable"); <--- and now we're executing that argument as a command from within the Perl script!

Obviously, in this example (assuming the name of our script is PerlScript and it runs as a user of sufficient privilege), we could do something like the following and cause a big problem:

host # ./PerlScript "rm -rf *"

Ouch! This sort of thing is actually seen a lot in CGI programming, with the assumption being that the "nobody" user that most folks run their Web Server as, can't do all that much damage. Consider, however, that (in the "nobody" example) the "nobody" user probably does have permission to delete all of your html and cgi-bin files. That would be headache enough, even though you'd still get to keep your operating system ;)

Taint also acts to protect you comprehensively. In our limited example above, adding the -T flag would have protected us against that contaminated $variable variable. However, Taint will also work on arrays and hashes and, even better, will treat them as collections of scalar variables (which they, essentially, are). So you can, reasonably, end up in a situation where only some values in an array or hash are Tainted, while the rest of them are considered perfectly safe (or unTainted).

If you have the -T flag enabled, whenever Perl runs into a situation where the data (input or output) is considered Tainted, it will terminate the execution of your script with a brief explanation ( like a description of what variables are insecure and/or what insecure dependencies they have on other variables) describing why you might be in trouble if you run your script as-is.

A great thing for your script (and your security) is that it takes relatively little effort to sanitize a Tainted variable.

For instance, in our example above, $variable was considered Tainted as soon as it became assigned the value of $ARGV[0] (You wouldn't get the actual error until you tried to use that Tainted variable data). If you wanted to clean that variable before running the system call on the next line, you'd just need to process it.

So, while this chunk of code would be considered unsecure (or Tainted):

$variable = $ARGV[0];
system("$variable");


This chunk of sanitized code (actually, just with a sanitized $variable) would be considered secure (or unTainted):

$variable = $ARGV[0];
$variable =~ s/;//g;
system("$variable");


Obviously, there's a lot more to go into when it comes to Perl's Taint, but, hopefully, this has served as an easy-to-understand introduction. Now you can feel free to read the 50 pages of detailed specifications ;)

At the very least, you can use the Taint flag to check your Perl scripts while you write them, and then remove it when you want to put your work out there for everyone to use.

Best wishes,

, Mike

Friday, May 16, 2008

Porting Associative Arrays Between Perl, Shell and Awk

Hey There,

And, for the last time (this week) we're back to porting :) One thing you might have noticed, in the title of today's post, is that the C programming language has been dropped. We received some feedback, and also generally agree, that throwing C in with Perl, Shell and Awk creates a layer of confusion that distracts from the point of these posts. Which is, of course, to demonstrate how simply one can use concepts, and apply them easily, with a little translation, between three very useful Unix and Linux programming (or scripting) languages.

We'll leave the original posts alone, so if you go back to revisit our original post regarding simple string variables and its follow-up that dealt with simple arrays, please just ignore all that stuff about C. If there's sufficient interest at a later time, we'll write up a series of posts on porting between (for instance) Perl and C and try to contain the amount of collateral damage ;)

Today, we're going to look at associative arrays (which are generally referred to as "Hashes" in Perl and as "lookup tables" from time to time) and how we can define, add value to, and extract value from, them in Perl, bash and awk.

The main differences between a "regular" array and an associative array is, at its most basic, found in the way that the array is indexed and the way in which you can add and extract value from the array. This is why I noted that, in bash, the only type of arrays available (at this point) are "regular" or "one dimensional" arrays, which we looked at in our last post on array porting ( Please note that I'm stuck using version "2.05b.0(1)" and am unaware of any newer release (up to bash 3.2) which directly supports this functionality. I "am" looking forward to it, though :)

The good news is that there is a hack to get around this, and emulate associative arrays, if you want to drive yourself crazy. We will definitely follow up this post with a post regarding how to do that (it'll be a post in and of itself, so we'll not belabour that point any longer, here). Also, in awk, there's, technically, no such thing as a "regular array." If you dump an Awk array, there's no guarantee that the contents will come out in the order you expect (which is also true of the Perl hash) and you can also use keys with them, instead of being limited to indexes. This will begin to make sense, if it doesn't already, as we go along. Terminology being what it is, I may have upset a lot of people already ;)

And, now that I've dug myself a deep enough hole, let's get started with associative arrays ;)

The associative arrray is sometimes defined as a "lookup table" for good reason. Compared with "associative array" and "hash," it's a name that's more descriptive of how the associative array works. Basically, on the left hand side of any assignment (the variable before the value) you have the name of the associative array along with a "key," like: $bob{"key"} - On the right hand side of the equation you have a simple value. You can see how the term "lookup table" makes the most sense as, in this case, if you wanted to look up the value of "key" you could find it in the associative array named "bob" simply by naming it. The associative array can also be thought of as having "keys" and "values," rather than "indexes" and "values," like regular, one dimensional, arrays have.

Since Bash only supports one-dimensional arrays up to (at least) version 2.05b.0(1), none of today's lesson applies. We will, however, definitely follow up with a post dedicated to "faking" associative arrays with bash.

1. Defining, Initializing or Declaring an associative array. As was the case with simple variables and arrays, no explicit declaration of an associative array is absolutely necessary in either of our (now) two languages:

Ex: We need to define an associative array called MySimpleHash. Even though it's unnecessary, we could do so like this:

In Perl: Just type "%MySimpleHash;" - The % symbol indicates an associative array (or hash) in Perl, as opposed to the $ sign, which indicates a scalar (or simple, or string) variable and the @ symbol which indicates an array. Perl hashes can also be created by simply defining their elements on-the-fly.

In Awk: Just type "declare MySimpleHash" - Again, associative arrays in Awk can be created by referencing their components and work exactly like one-dimensional arrays.

2. Assigning values to the associative array. This is very straightforward in both of our two remaining languages:

Ex: We want to assign the keys "MySimpleKey0," "MySimpleKey1" and "MySimpleKey2" ( with values, in order, of "MySimpleValue0", "MySimpleValue1," and "MySimpleValue2") to the associative array named MySimpleHash (Note that any values that contain spaces should be quoted - it's actually good practice to quote any string that is a being used as a key or a value in an associative array. This is generally not necessary for integer values).

In Perl: Just type "$MySimpleHash{"MySimpleKey0"} = MySimpleValue0; $MySimpleHash{"MySimpleKey1"} = MySimpleValue1; $MySimpleHash{"MySimpleKey2"} = MySimpleValue2;" - Spaces between the variable, "=" sign and value are optional. Note that we have to use the $ symbol when referring to an element of a Perl hash, while we use the % symbol to refer to the entire hash (or associative array. Same thing. But you know that by now ;) Also, last thing, notice that the square brackets, used to refer to regular array indexes, are replaced by curly brackets.

In Awk: Just type "MySimpleHash["MySimpleKey0"] = "MySimpleValue0"; MySimpleHash["MySimpleKey1"] = "MySimpleValue1"; MySimpleHash["MySimpleKey2"] = "MySimpleValue2";" - Spaces between the variable, "=" sign and value are not, technically, necessary, but recommended. Also, note that all of the "MySimpleKey" keys, and "MySimpleValue" values, are placed within double quotes in the assignment. This is sometimes necessary for string values, but usually not always necessary for numeric values.

3. Extracting the value from your simple associative array. This is no longer "trivial," but still not too terribly difficult to do. Let's look some stuff up in these tables :)

Ex: We want to print the value of all of the keys of the MySimpleHash associative array. This is also fairly simple in our dwindling nation of two languages ;) Note that we will be iterating over the array key/value pairs in order to print them all out. This is where you'll notice one very specific characteristic of the associative array (or hash). The associative array does it's own indexing based on an internal algorithm and may not spit out the same values in the same order every time you dump the contents. This is done, by the programming language, for maximum efficiency with regards to information storage and retrieval (with the assumption that you'll be looking for a particular "key"'s value, I suppose).

In Perl: Just type "while (($Key, $Value) = each %MySimpleHash) { print "$Key equals $MySimpleHash{$Key}\n";}"
- Note that the $ character needs to precede the variable name when you want to retrieve any of an associative array's key or value elements. Also, when iterating over the hash, you need to refer to the hash directly with the % prefix (printing %MySimpleHash would print out all of that hash's elements - generally all squished together with no separating space) - The \n, indicating a carriage-return, line-feed or new-line isn't necessary, but is nice if you don't want your output on the same line as your next command prompt:

host # perl -e '$MySimpleHash{"MySimpleKey0"} = MySimpleValue0; $MySimpleHash{"MySimpleKey1"} = MySimpleValue1; $MySimpleHash{"MySimpleKey2"} = MySimpleValue2;while( ($key, $value) = each %MySimpleHash ) { print "SimpleKey: $key, SimpleValue: $value.\n";}'
SimpleKey: MySimpleKey0, SimpleValue: MySimpleValue0.
SimpleKey: MySimpleKey2, SimpleValue: MySimpleValue2.
SimpleKey: MySimpleKey1, SimpleValue: MySimpleValue1.


For a goof, let's print out the entire hash at once, so you can see how the indexing is done automatically by Perl (i.e. it might not come out in the same order that the data was logically entered by us) :

host # perl -e '$MySimpleHash{"MySimpleKey0"} = MySimpleValue0; $MySimpleHash{"MySimpleKey1"} = MySimpleValue1; $MySimpleHash{"MySimpleKey2"} = MySimpleValue2;print %MySimpleHash;print "\n";'
MySimpleKey0MySimpleValue0MySimpleKey2MySimpleValue2MySimpleKey1MySimpleValue1


And, if you can see in the crunch there, it has indeed ordered our key/value pairs as 0, 2, 1, instead of the 0, 1, 2 that you would expect from a regular array!

In Awk: Just Type "for (x in MySimpleHash) { print x "=" MySimpleHash[x]; }" - Note that the $ or % symbol "must not" precede the variable name, or key name, when you want to get the value. Note that, just like regular arrays, awk associative arrays need to be iterated over to be entirely printed out. Again, here, we should see how awk has decided to index our key/value pairs, which might be different than the order in which we entered them (you'll generally see this more if you mix numeric and alpha keys in your awk associative array (here, we seem to get lucky :) :

host # echo |awk '{MySimpleHash["MySimpleKey0"] = "MySimpleValue0"; MySimpleHash["MySimpleKey1"] = "MySimpleValue1"; MySimpleHash["MySimpleKey2"] = "MySimpleValue2";for (x in MySimpleHash) { print x "=" MySimpleHash[x]; }}'
MySimpleKey0=MySimpleValue0
MySimpleKey1=MySimpleValue1
MySimpleKey2=MySimpleValue2


An interesting side fact about awk associative arrays is that, even if they come out in the order you expect, they may not be indexed exactly as you indexed them. For instance, when you enter the first, second and third value, you would expect them to occupy indexes 0, 1 and 2 from an regular awk array. An awk associative array may have those values indexed as 0,44 and 117. Let's see what happens if we pull keys 0, 1 and 2 out from MySimpleHash:

host # echo |awk '{MySimpleHash["MySimpleKey0"] = "MySimpleValue0"; MySimpleHash["MySimpleKey1"] = "MySimpleValue1"; MySimpleHash["MySimpleKey2"] = "MySimpleValue2";print MySimpleHash[0] MySimpleHash[1] MySimpleHash[2];}'

host #


Nothing. Goose eggs; as expected. Since awk creates the association for you, the simple numeric indexing won't work to retrieve the value for, say, "MySimpleKey2." If you want to force numeric indexing, you just need to make sure that all of your indexes are numbers (which is how we emulated a regular array in our previous post on working with regular arrays). Note, also, that if you want to make sure that a number you enter as an index is treated like a key in a key/value pair, the simplest way to do this is to enclose it in "double quotes."

Sadly, even though this article ran a bit long, there are still a lot of "little things" about associative arrays (or hashes, lookup tables, or whatever you prefer to call them) that can be explored. Hopefully, you're encouraged to get down to the nitty-gritty and rise above the attempted basic-ness of this post (Now I'm starting to make up words ;)

In our next addition to this threaded series of posts, we'll start looking at programming constructs (loops, conditionals, etc) and explore how we can use the basic variable knowledge we've acquired so far to get some real work done now(after all, that's less work for us to do later ;)

Cheers,

, Mike

Wednesday, May 14, 2008

Working With Arrays - Porting Between Linux Or Unix Using Bash, Perl, C and Awk

Greetings,

Back to porting some more :) Building on our posts starting from the shebang line , followed, somewhat logically by a post on working with simple variables, today we're going to move on to the next step: Defining, populating and extracting the values from simple array variables.

The array variable is another basic building block of most shell scripts or code. Put simply, an array is just a collection of the simple variables we looked at in our last post on porting.

The technical definitions, especially as they apply to each of our four languages (bash, Perl, awk and C) are beyond the scope of this series of posts (for now). The details are important, but they're useless if you don't know the basics. To put it in another light, when you learn a foreign language (German, for example), it's more important that you understand basic concepts of the language, and how to use those simple phrases, than it is for you to be able to break down the subtle differences between the gender and tense of each word within the context of a sentence. The folks in Germany will know you need to go to the bathroom no matter how poor your grammar is, as long as you can spit out a few key words that indicate your need to find a restroom immediately ;)

So, let's get started with arrays:

Arrays, as we mentioned, are simply collections of simple variables. For instance, taking from our previous example, if we have a variable x, that has a value of y (x=y), then we have a simple variable. Arrays provide a way to group collections of simple variables. So you could have a number of variable/value pairs (x=y, a=b, c=d, e=f, etc), or simply a collection of values, that can all be referred to by one variable name: the array name (e.g. array b = x=y and a=b and c=d and e=f). More cryptic than a simple variable explanation, but (even if it's not right now) simple to understand once you get the hang of it.

1. Defining, Initializing or Declaring an array. As was the case with simple variables, with a simple array, except in C (of course), no explicit declaration of an array is absolutely necessary:

Ex: We need to define an array called MySimpleArray. This is trivial in all four languages:

In Bash: Just type "declare -a MySimpleArray" (again, you can also use "typeset -a"). This is not absolutely necessary, as you can create an array simply by defining a part of it (e.g. MySimpleArray[0]="bob" would create the MySimpleArray array with one value)

In Perl: Just type "@MySimpleArray;" - The @ sign indicates an array in Perl, as opposed to the $ sign, which indicates a scalar (or simple, or string) variable. Perl arrays can also be created by defining their elements.

In Awk: Just type "declare MySimpleArray" - Again, arrays in Awk can be created by referencing their components.

In C: You "need" to declare/initialize your array (and its size) before you can use it. As noted in our last post, a simple string variable, in C, is actually an array of the type "char."

So, just like when you declared the "simple variable" MySimpleVariable, you'll use the exact same syntax, since that was, technically, an array: "char *MySimpleArray;" (This, again, generally needs to be followed by a declaration of the size/memory-allocation-requirement of the string, like "MySimpleArray = (char *)malloc(8*sizeof(char));" for an 8 character array).

Also, in C, if you want to declare an integer array, you would do it in this fashion (although we're not going to drill too far into this since it pulls away from the commonality of all the other examples): "int MySimpleArray[8];" for an eight integer array.

2. Assigning values to the simple array. This is very straightforward in all of our four languages:

Ex: We want to assign the values "MySimpleValue0", "MySimpleValue1," and "MySimpleValue2" to the simple array named MySimpleArray (Note that any values that contain spaces should be quoted - it's actually good practice to quote any string that is a being used as a value in an array. This is generally not necessary for integer values). Note that our instructions for creation here today are based on simplicity, and not efficiency. There are quicker ways to define arrays all at once (and print them all at once, when we extract the values from the array variables), but we'll leave that for another time. Also note that, in most arrays, the first element is numbered 0, rather than 1.

In Bash: Just type "MySimpleArray[0]=MySimpleValue0; MySimpleArray[1]=MySimpleValue1; MySimpleArray[2]=MySimpleValue2" - Spaces between the variable, "=" sign and value are not permitted.

In Perl: Just type "$MySimpleArray[0] = MySimpleValue0; $MySimpleArray[1] = MySimpleValue1; $MySimpleArray[2] = MySimpleValue2;" - Spaces between the variable, "=" sign and value are optional. Note that we have to use the $ symbol when referring to an element of an array, while we use the @ symbol to refer to the entire array.

In Awk: Just type "MySimpleArray[0] = "MySimpleValue0"; MySimpleArray[1] = "MySimpleValue1"; MySimpleArray[2] = "MySimpleValue2"" - Spaces between the variable, "=" sign and value are not, technically, necessary, but recommended. Also, note that "MySimpleValue0," "MySimpleValue1," and "MySimpleValue2" are placed within double quotes in the assignment. This is sometimes necessary for string values, but usually not for numeric values.

In C: Just type: "MySimpleArray = "MySimpleValue";" If your array is not simply a char (as we're using in our example today), you do not need to use quotes. For an integer array, you would add values like this: "MySimpleArray[] = {0,1,2};" <--- Again, apologies if these C integer array side notes are distracting. Just ignore them ;)

3. Extracting the value from your simple array. It's time to collect :)

Ex: We want to print the value of the MySimpleArray elements. This is also fairly simple in all four languages:

In Bash: Just type "echo ${MySimpleArray[0]};echo ${MySimpleArray[1]};echo ${MySimpleArray[2]}" - Note that the $ character needs to precede the variable name when you want to get the value and that the {} brackets around the array name and subscript (in [] brackets) are required. Printing ${MySimpleArray[@]} would print out all elements.

host # echo ${MySimpleArray[0]};echo ${MySimpleArray[1]};echo ${MySimpleArray[2]}
MySimpleValue0
MySimpleValue1
MySimpleValue2


In Perl: Just type "print "$MySimpleArray[0] $MySimpleArray[1] $MySimpleArray[2]\n";" - Note that the $ character needs to precede the variable name when you want to get the individual value of an array element (printing @MySimpleArray would print out all elements) - The \n, indicating a carriage-return, line-feed or new-line isn't necessary, but is nice if you don't want your output on the same line as your next command prompt:

host # perl -e '@MySimpleArray[0] = MySimpleValue0; @MySimpleArray[1] = MySimpleValue1; @MySimpleArray[2] = MySimpleValue2;print "$MySimpleArray[0] $MySimpleArray[1] $MySimpleArray[2]\n";'
MySimpleValue0 MySimpleValue1 MySimpleValue2


In Awk: Just Type "print MySimpleArray[0],MySimpleArray[1],MySimpleArray[2]" - Note that the $ or @ symbol "must not" precede the variable name when you want to get the value. The comma in between the values ensures that a space will be printed between them for clarity's sake. Note that awk arrays need to be iterated over to be entirely printed out, and then extra care has to be taken if you want to get the variables out in the correct sequence (for another day) :

host # echo|awk '{MySimpleArray[0] = "MySimpleValue0"; MySimpleArray[1] = "MySimpleValue1"; MySimpleArray[2] = "MySimpleValue2";print MySimpleArray[0],MySimpleArray[1],MySimpleArray[2]}'
MySimpleValue0 MySimpleValue1 MySimpleValue2


In C: Just type "printf("%s\n", MySimpleArray);" to get the value for your character array. Note, again, that, for these posts, we're not going to get into the compilation part of creating a working C program:

host # ./c_program
MySimpleArray


And, now we've got two out of the three of the "basics" covered. In our next post on this subject, we'll take a look at the third most common variable/value type: The hash or associative array (which, as chance would have it, are technically what Awk arrays are :)

Best Wishes,

, Mike

Posted by Mike Golvach at 12:02 AM  

, , , , , , , , , , ,

Sunday, May 4, 2008

Reversing All Lines In A File On Linux Or Unix Using Perl

Good evening-day-morning-afternoon :)

In response to some positive feedback, and some additional questions, prompted by our earlier post on using Perl to mirror lines in a file on Linux or Unix , today we're dispensing with yet another Perl script to do almost the same thing, but with an extra twist. While our original mirror file script reversed each line, this one will attempt to do the same thing while also reversing the order of the lines. If you have a nervous condition you should probably quit reading this is, as it is only going to get more intense ;)

The good news is that I already know it works. The bad news is available at your local newsstand or in that meeting you should really try to blow off ;) Sorry... but only because the bad jokes are intentional.

So, when we're done crunching a file with today's script it will come out with all the lines reading from left to right and with the first line being the last, the second line being the second to last, and so on until the last line, which will be the first. End of story... or is it? ;)

Again, we're going to take a quick look at another useful function in this script. Since the rest of them are the same as in the script that prompted this one (split, join, undef and push), I should refer you to that post on mirroring file lines so that we don't waste too much space with duplicate content. It's interesting to look at the two side-by-side to really see the difference.

In today's post we'll check out the use of this one function (Note that, this time, we'll assume that the array @array consists of no members at all:

unshift - This function does the same thing as the "push" function we used in our last post, except that it will "unshift" a variable on to the "right side" of an array. The name isn't as intuitive as "push," but, if it helps, when you "shift" a variable from an array, you're pulling it out of the "left side." So, naturally, "unshifting" is adding to the "right side." ...No matter how I explain it, it will never make sense. It's just one of those words you have to accept on faith. Like "defenestration." <--- Apologies for the cheap-shot at Microsoft. That word is officially defined as "the act of throwing someone or something out of a window." It's only a few small steps to the jab I was going for ;)

Ex:
unshift(@array, "a");
<--- @array now equals "a"
unshift(@array, "b"); <--- @array now equals "a" "b"
unshift(@array, "c"); <--- @array now equals "a" "b" "c"

Again, enjoy, and I hope this helps you out with whatever you're doing that it might help you out with ;)

Cheers,

SAMPLE RUN:

host # cat words|head -3;cat words|tail -3
<--- The beginning and end of the file we're going to "reverse"
Aarhus
Aaron
Ababa
Zulu
Zulus
Zurich
host # ./mirror.pl words
host # ls
. .. reverse.pl words words.reverse
<--- Our newly created file is called "words.reverse"
host # cat words.reverse|head -3;cat words.reverse|tail -3 <--- And, once again (since I tested this a few times already), the reversal of the file seems to have worked!
hciruZ
suluZ
uluZ
ababA
noraA
suhraA

host # wc -l words* <--- Just double checking here to make sure that the number of characters in our original file and the reverse file are exactly the same, which they should be.
45378 words
45378 words.reverse
90756 total



Creative Commons License


This work is licensed under a
Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License

#!/usr/bin/perl

# reverse.pl - reverse each line, and its position in a file
#
# 2008 - Mike Golvach - eggi@comcast.net
#
# Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License
#

if ( $#ARGV != 0 ) {
print "Usage: 0ドル text_file\n";
exit(1);
}

$text_file=$ARGV[0];

open(TXT, "<$text_file");
@txt = <TXT>;
close(TXT);

foreach $ln (@txt) {
chomp($ln);
@ln = split(//, $ln);
$ln_len = @ln;
undef(@rev);
while ( $ln_len > 0 ) {
push(@rev, $ln[${ln_len}-1]);
$ln_len--;
}
$rev = join(/ /, @rev);
unshift(@reverse, "$rev");
}
open(RTXT, ">${text_file}.reverse");
foreach $backward (@reverse) {
print RTXT "$backward\n";
}
close(RTXT);
exit(0);


, Mike

Saturday, May 3, 2008

Perl Script To Mirror Lines In A File On Linux Or Unix

Good afternoon-morning-day-evening :)

In much the same vein as our previously posted script to do weak encryption with Octal Dump , today we're throwing out another Perl script to do something possibly equally worthless, but still somewhat entertaining ;) Since it's written in Perl, and uses that language's specific constructs (i.e. No "system" calls), it should run equally well on Linux or Unix. The logic in the script is simple enough that it should probably work going back a few major versions.

As much as this may not seem to be of any use to you now, knowing how to mirror (or reverse, since many people, including myself, don't have any mirror-friendly fonts ;) each line in a file can be beneficial. Although no one in the office is likely to come up to you and say "Mr. sysadmin, sir (of course, I'm exaggerating. People are much more formal than that normally ;), Can you print every single line of this report reading from right-to-left instead of the standard left-to-right?", it's even more unlikely that, if this were to happen, any business/war jargon would be used. I can't imagine something like this ever being "mission critical" or required for any sort of "code red" situation. If it ever is, you'll be really glad you know how to do this...

But, at a fundamental level, it's good to understand the basic functionality of Perl and how to deal with scalar variables (or string variables), arrays and how to muck around with them (or make them work for you ;) In this particular script, the building blocks of some very useful functions of Perl are employed (to a dubious end, I'll admit) and, hopefully, presented in an easy to understand fashion. Over time, we'll dig into every little thing there is to know (If that can be cranked out in one life time ;) with regards to each function. In the mean time, check out the use of these four functions (Note that, for all, we'll assume that the variable $variable is defined as "abcd" and the array @array consists of four members: a, b, c and d:

split - This function will take a scalar variable and "split" it, on a delimiter, into an array:

Ex: @array = split(//, $variable); <--- Now @array has four members (a, b, c and d) that it got from $variable

join - This function will take an array and "join" it into one scalar variable:

Ex: $variable = join(/ /, @array); <--- Now $variable equals "abcd" since it contains all 4 members of @array joined together by an empty delimiter "/ /" (that is, each character, or space, is considered a separate member of the array)

undef - This function will "undefine" our array, in this case. It can be used on scalar, array and hash variables as well.

Ex: undef(@array); <--- Now @array is not just empty, it isn't even defined. It may as well not exist.

push - This function will "push" a variable (scalar, array, hash, or references to same, and more -- getting way off-topic ;) onto the left side of an array. Subsequent pushes of extra variables are added from the left, so if you push three variables into an array (a, b and c, for instance) in one order, they'll actually end up in the array in the opposite order.

Ex:
push(@array, "a");
<--- @array now equals "a"
push(@array, "b"); <--- @array now equals "b" "a"
push(@array, "c"); <--- @array now equals "c" "b" "a"

In any event, enjoy, and I hope this helps you out if you're beginning to learn the Perl scripting language!

Best wishes,

SAMPLE RUN:

host # cat words|head -3;cat words|tail -3
<--- The beginning and end of the file we're going to "mirror"
Aarhus
Aaron
Ababa
Zulu
Zulus
Zurich
host # ./mirror.pl words
host # ls
. .. mirror.pl words words.mirror
<--- Our newly created file is called "words.mirror"
host # cat words.mirror|head -3;cat words.mirror|tail -3 <--- And (good deal), the mirroring seems to have worked!
suhraA
noraA
ababA
uluZ
suluZ
hciruZ
host # wc -l words*
<--- Just double checking here to make sure that the number of characters in our original file and the mirror file are exactly the same, which they should be.
45378 words
45378 words.mirror
90756 total



Creative Commons License


This work is licensed under a
Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License

#!/usr/bin/perl

#
# mirror.pl - print an entire file backward, line by line
#
# 2008 - Mike Golvach - eggi@comcast.net
#
# Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License
#

if ( $#ARGV != 0 ) {
print "Usage: 0ドル text_file\n";
exit(1);
}

$text_file=$ARGV[0];

open(TXT, "<$text_file");
@txt = <TXT>;
close(TXT);

open(RTXT, ">${text_file}.mirror");
foreach $ln (@txt) {
chomp($ln);
@ln = split(//, $ln);
$ln_len = @ln;
undef(@rev);
while ( $ln_len > 0 ) {
push(@rev, $ln[${ln_len}-1]);
$ln_len--;
}
$rev = join(/ /, @rev);
print RTXT "$rev\n";
}
close(RTXT);
exit(0);


, Mike

Wednesday, March 19, 2008

Converting Decimal Values To Binary Without Using Unpack

Howdy,

Yesterday we took a look at using Unix or Linux Perl's unpack function to convert binary numbers to decimal . Today we're going to look at doing the opposite conversion ( decimal to binary ) and we're going to do it without using unpack :)

Actually, this script would be a whole lot shorter if we were to use the unpack function, but I wanted to highlight that Perl (and most scripting languages) can be used to achieve whatever ends you need met in any number of ways. Generally, your possibilities are limited only by your imagination (and the most basic rules ;)

To achieve the results we want from today's script, we're taking the input (the decimal number) and breaking it down into an array of binary values. The array that gets produced can be fed to yesterday's script to convert binary back to decimal just to verify, if you want :)

The process of converting the decimal number to a binary one in this script, breaks down (at its most basic level) to using the Perl exponential operator (**) to create a massive array of possible binary values and then using that same exponential math to determine whether each valid binary value is a 0 or a 1, and building the array (the actual answer) from that.

I hope these scripts serve as a halfway decent example of how you can do two almost identical things in two almost completely different ways, and, perhaps, as an incentive to write each of the two scripts (yesterday's and today's) in the manner of the other (Not necessary, and not necessarily fun, but a good exercise nonetheless ;)

Best wishes,


Creative Commons License


This work is licensed under a
Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License

#!/usr/bin/perl

#
# db - Convert decimal values to binary
#
# 2008 - Mike Golvach - eggi@comcast.net
#
# Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License
#

print "Decimal Value:\t";
chomp ($decimal=<stdin>);
$x = 100;
$total = "0";
while ($x != -1) {
$y = 2 ** $x;
$total = $total + $y;
--$x;
}
$a = 100;
$b = 0;
$decbin{$a} = 2 ** $a;
while ($a != -1) {
if ($decimal > $total) {
print "Time to Upgrade, boy!\n";
exit;
}
$decbin{$a} = 2 ** $a;
if ($decimal >= $decbin{$a}) {
$decimal = $decimal - $decbin{$a};
$binarray[$b] = 1;
--$a;
$b++;
} else {
$binarray[$b] = 0;
--$a;
$b++;
}
}
while ($binarray[0] != 1) {
$tmp = shift(@binarray);
$tmp = "0";
--$x;
}
$binarray = join("", @binarray);
print "Binary Number:\t$binarray\n";



, Mike




[フレーム]

Subscribe to: Comments (Atom)
 

AltStyle によって変換されたページ (->オリジナル) /