The power of the Perl ‘map’ function

Filed under: Perl tips — vincent @ 23:55

I’ve spent a few years writing Perl without ever looking carefully at the map function, but since I’ve discovered it, I use it often. I propose here some sample usage that I’ve found usefull. I’ll also show how to use the grep function, which is quite similar in usage.

Let’s say that you have a CSV file like this, with a header and some comments…

# City, Country, Continent
Bruxelles,Belgium,Europe
New-York,USA,North America
# I've never visited this one, but I'd like to...
Dakar,Senegal,Afrika
Namur,Belgium,Europe
Paris,France,Europe
Madrid,Spain,Europe

I’ll suppose that we give this as the standard entry to our scripts. If you wish to load it in a structure for treatment, you have various options. We’ll make an hash, where the keys are the city names, and each value is a list of the fields.

Extensive solution

An extensive solution, easy to read and maintain, but verbous:

#!/usr/bin/perl -w

use strict;

my %cities=();
my $line;
while($line=<STDIN>) {
	if ($line =~ m/^#/) { next; }
	chomp($line);
	my @fields=split(/,/,$line);
	$cities{$fields[0]}=\@fields;
}

# now use structure...

Shortened solution

As most perl user will know, this can be shortened by using the implicit variables. It’s more difficult to read for a Perl beginner, but should be as easy to read to most Perl programmers:

#!/usr/bin/perl -w

use strict;

my %cities=();
while(<>) {
	next if (m/^#/);
	chomp;
	my @fields=split(/,/);
	$cities{$fields[0]}=\@fields;
}

# now use structure...

Introducing the map function

Using the map and grep functions, we can shorten this:

#!/usr/bin/perl -w

use strict;

my %cities=();
map {
	chomp;
	my @fields=split(/,/);
	$cities{$fields[0]}=\@fields;
} grep {
	! m/^#/
} <>;

# now use structure...

It’s shorter, but requires some explanation. First of all, the grep and map functions provide a list context for their arguments. Therefore, the <> will load the whole input in once in a list of lines. Be carefull if the csv file is potentially long!

This list is then processed by the grep function, which apply to every element in turn the block of code { ! m/^#/ }. This block will receive the current element in the $_ variable, which is just the variable used by the regular expression (when none is explicitely provided with ~=, as is the case here).

Our expression is negated, to be true only for non-comment lines. Therefore the map function will receive every non-comment lines.

We won’t use the main function of map here, which is to transform a list in another one, but just a border effect, which is to evaluate the block of code given one for each element of the list.

Our block of code will split the input line, and fill the @cities array.

Other samples

With the same input, if we want a comma separated list of the European cities, we can do as follow. Do you see why? Read the comment from the last one to the first one!

#!/usr/bin/perl -w
use strict;
print join(',',	# 7.
	sort	# 6.
	map { $_->[0] }	# 5.
	grep { $_->[2] eq 'Europe' }	# 4.
	map { chomp; [ split(/,/) ] }	# 3.
	grep { ! m/^#/ }  # 2.
	<>	# 1.
	)."n";

Let me note here that this is quite expensive, ressource-wize. I wouldn’t recommend this for a production system, on large files. But for ‘use once and throw’ type of scripts, it can be usefull. Here is the explanation of the previous script:

  1. make a list of all lines from the CSV
  2. get rid of comments
  3. make a list of list-references
  4. let only european cities pass through
  5. get the city names
  6. sort them alphabetically
  7. make a comma separated string out of them

No Comments »

No comments yet.

RSS feed for comments on this post.

Leave a comment

Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>

(required)

(required)