perltutorial
jeffa
<h1>Map Tutorial: The Basics</h1>
<p>
The [map] built-in function allows you to build a new list from another list
while modifying the elements - simultaneously. Map takes a list and applies either
a <i>block of code</i> or an <i>expression</i> to that list to produce a new list. I will limit
the scope of this tutorial to the code block form.
<p>
When applied correctly, map
can produce lightning-fast transforms very efficiently. When abused, it
can produce some extremely obfuscated code, sacrificing readability and
maintainability (giving legacy coders unnecessary headaches).
<p>
[Vroom] has an excellent tutorial on [Complex Sorting] - in it he has some
more complex but [extremely] useful explanations of [map].
The purpose of this tutorial is to talk about
the easy stuff: to allow a programmer new to the concept to stick one toe in
the water at a time, so to speak.
<p>
<hr>
<font size=+1>Map: what is it good for?</font>
<p>
Example 1:<br>
Say that you have a text file that contains paragraphs of words. If you wanted
to create a list with each element being a single line, you can use:
<code>
open (FILE, "foo.txt");
my @lines = <FILE>;
close FILE;
</code>
But say that you wanted each element to contain a single word. As long as you
didn't care about punctuation, you can use the map function like so:
<code>
open (FILE, "foo.txt");
my @words = map { split } <FILE>;
close FILE;
</code>
Remember that [split] uses whitespace as its default delimiter, and
the special variable $_ as its default variable to split up. Line 2 can
be written as:
<code>
my @words = map { split(' ', $_) } <FILE>;
</code>
The choice to use default arguments is a trade-off between understandability and
laziness/elegance. Also, remember that a file handle can be taken in list context.
<p>
Example 2:<br>
Let's get rid of punctuation. First we need a suitable regular expression,
but before we can derive one, we need to decide if we should split first and
substitute second, or substitute first and then split. The former choice would
require more CPU cycles, because we are applying the regex to EACH word - the latter
is more efficient, because the regex gets applied to a WHOLE line, and if we use
the global modifier (g), Perl will quickly and efficiently apply the regex. If we
only care about periods, commas, exclamation points, and question marks, we can use
the substition operator like so:
<code>
s/[.,!\?]\B//g
</code>
Sorry, but the details of this regex are beyond the scope of the tutorial, be sure
and check out [root]'s tutorial on [String matching and Regular Expressions]. I will
tell you what it does, though: it turns <b>Hello World!</b> into <b>Hello World</b>
and it does so without removing punctuation from anacronyms like <b>J.A.P.H.</b> -
okay, okay, half-truth: <b>J.A.P.H.</b> becomes <b>J.A.P.H</b> - good enough for this
example (can anyone say "exercise for reader").
<p>
Moving on . . . now we can add this regex. The inner block of a map statement may
contain a number of statements separated by semi-colons. The statements are
interpreted left to right:
<code>
open (FILE, "foo.txt");
my @words = map { s/[.,!\?]\B//g; split; } <FILE>;
close FILE;
</code>
<p>
Example 3: (know what a function returns!)<br>
Let's say that we didn't want to split the line into words, we just wanted to
remove punctuation:
<code>
open (FILE, "foo.txt");
my @lines = map { s/[.,!\?]\B//g } <FILE>;
close FILE;
</code>
Uh-oh. What happened? If you try this, you will not receive the output you might
have expected. Instead, you will see numbers and/or blank lines. If the substitution
operator found no punctuation in a line it will return UNDEF, otherwise it will return
the number of substitutions on that line. It does NOT return the line itself. In cases
like this, the function or operator affects it's argument by reference. Split does not
work in this manner - it returns what was split off. Look at example 2 again - the
last thing that gets passed out of the map block is the return value of split. So, if we want to
return the line altered by a substitution, we will have to tell Perl so - like this:
<code>
open (FILE, "foo.txt");
my @lines = map { s/[.,!\?]\B//g; $_; } <FILE>;
close FILE;
</code>
Much better.
<p>
<hr>
<p>
<font size=+1>Map: what is it NOT good for?</font>
<p>
Remember, map returns a list - if you do not need a list, don't be tempted to
use map as an alternative to more traditional iteration constructs, such as
[for] and [foreach].
<p>
Also, some built-in functions, such as [chomp] and [reverse], can be applied to a list
AT ONCE, so to speak. For example, if you wanted to slurp the contents of a text file
into a list without the new lines, you might be tempted to use your new
knowledge like so:
<code>
open (FILE, "foo.txt");
my @lines = map { chomp; $_; } <FILE>;
close FILE;
</code>
(remember what we learned from example 3 - chomp returns
the numbers of newlines chomped off (1), so we have to
explicitly let Perl know we want the remaining value).
However, it turns out that chomp can do a much better job by itself:
<code>
open (FILE, "foo.txt");
my @lines = <FILE>;
chomp(@lines);
close FILE;
</code>
The second example actually runs faster. Why? Because Perl will
literally stuff the entire file into the array - no iteration needed. The same
goes for the chomp - Perl will not iterate through the list. By using a map statement,
however, you are forcing iteration to happen.
<p>
I used benchmark to time these two examples using '/usr/dict/words' as the input
file. Here were the results for 100 iterations:
<code>
Benchmark: timing 100 iterations of chomp, map...
chomp: 31 wallclock secs (29.17 usr + 0.54 sys = 29.71 CPU)
map: 37 wallclock secs (34.63 usr + 0.59 sys = 35.22 CPU)
</code>
<p>
Something else to consider is readability and maintainability. If you want your
code to be either, map statements might not be a good solution - let's face it, no
other language really has this one-liner of death implemented, and unless you like
watching ears bleed, keep it simple! (personally, I like watching ears bleed!)
<p>
Of course, there aren't too many obfuscated Perl scripts out there that don't use map.
Keep up the higher learning!