|The stupid question is the question not asked|
RFC: Getting Started with PDL (the Perl Data Language)by lin0 (Curate)
|on Feb 02, 2007 at 20:06 UTC ( #598007=perlmeditation: print w/replies, xml )||Need Help??|
Greetings Fellow Monks,
I have written several posts related to the Perl Data Language (PDL) but I have not provided an introduction to PDL. I thought it was time to fix that. I started writing a tutorial on PDL. As I was writing, I notice that it would become too long for one post. So, I decided to split it up into three posts. This one will be an introduction to PDL and operations with Piddles. The second one will be on data visualization. And the third one will show you an example on data analysis. I hope you enjoy the tutorials and please help me improve them by sharing your insightful comments.
Fixed a typo. Fixed some problems with spoiler tags. Broke up long code lines
Table of Contents
Imagine you, Just Another Perl Hacker, were assigned to this new project that involves heavy numerical computation. Most of your peers recommend you to use C or C++. Some others recommend you the language of the Snake -sorry I forgot the name ;-). What can you do if you really want to keep using Perl? Using PDL is the answer. The Perl Data Language (PDL) is a package that gives Perl the ability to compactly store and speedily manipulate the large N-dimensional data sets that are common in scientific and other data intensive programming tasks. To achieve such a great performance, PDL uses C (and sometimes Fortran) to efficiently handle multidimensional data sets. For the rest of the tutorial, I will assume you have a working copy of PDL already installed. Once you have PDL installed, you can use it in Perl scripts by simply declaring: use PDL;. If you have not yet installed PDL, I will recommend you to have a look at the Appendix for some of the pre-requisites for having PDL installed without problems. And if you get into troubles, I recommend you to ask in one of the PDL mailing lists or here at the Monastery. Now, we are ready to start.
The Interactive Shell
A very useful interactive shell (perldl) is provided with your copy of PDL. To start perldl, you just have to type perldl in a terminal window (or a command prompt window if you are using a Windows system). perldl allows you to directly type PDL commands and see their results. One key feature of perldl is that it gives you access to online help. By typing help followed by a function name you get the online documentation for that function (note that you could alternatively type ? followed by the function name). With the online help and the PDL Cheat Sheet you can access most of the PDL documentation. But what happens when you don't know the name of the function and you cannot figure it out from the PDL Cheat Sheet? The command apropos comes to your help. By typing apropos followed by a concept will give you a list of functions that have that concept in the first line of their description (note that you could alternatively type ?? followed by the concept). Let us use those two commands:
What did you get?
<Reveal this spoiler or all spoilers in this node or all in this thread>
What did you get?
For more information on perldl, you can have a look at the perldl page of the PDL Documentation. If you are interested in configuring perldl to load modules, set shell variables, etc. during start-up, you might use the perldlrc configuration file as described in the thread of the PDL Cheat Sheet
Before moving to the next section, I have to recommend you an invaluable functionality of perldl: the online demos. Just type:
and you will see all the different demos that are available. For instance, let's try:
What did you get?
Playing with Piddles
One thing you need to know about PDL is that it introduces a new data structure usually called a “piddle”. Piddles are numerical arrays stored in column major order (meaning that the fastest varying dimension represent the columns following computational convention rather than the rows as mathematicians prefer). Even though, piddles look like Perl arrays, they are not. Unlike Perl arrays, piddles are stored in consecutive memory locations facilitating the passing of piddles to the C and FORTRAN code that handles the element by element arithmetic. One more thing to note about piddles is that they are referenced with a leading $. In the rest of this section I will use the interactive shell.
The easiest way to create piddles is by using the function pdl followed by the piddle. For example:
To print information about the piddles you just created, you can simply type:
The corresponding output is:
Another way of creating piddles is using functions such as zeroes (create a piddle with all the elements equal to zero), ones (create a piddle with all the elements equal to one), and identity (create a piddle with the elements in the main diagonal equal to one).
There are many more functions for creating piddles. Some of them are: random, grandom, randsym, sequence, xvals, yvals, zvals, xlinvals, ylinvals, zlinvals, rvals, axisvals, allaxisvals.
Piddles by default are of type double. But, they could be byte, ushort, short, float, long. You could also create piddles by doing a type conversion. For example,
This piddle requires less memory than $pdl_of_zeroes. What would its memory requirement be?
Doing Arithmetic Operations
The following Perl operators work in PDL as they do in Perl. The only difference is that in PDL, they act element-by-element on the whole piddle
Similarly, standard Perl mathematical functions also act element-by-element on the whole piddle.
Note that to perform a matrix multiplication, you should use the operator: x
Simply assign a null piddle:
a 'null' piddle is a kind of empty piddle, which can grow to appropriate dimensions to store a result.
Getting Piddles' Information
To get the number of elements in a piddle, you can use:
This last notation is called the method notation. Piddles are implemented as Perl objects. Objects can have internal functions called methods. Methods can only be used on the class of object they belong to. Many of PDL's functions are available as methods too. The method notation is very common when you have to gather information from piddles or when you have to access values in piddles as it makes more visual sense to have the method call after the variable.
To get the piddle dimensions as a Perl list, you can use:
To get formatted string with information about a piddle, you can use:
info allows you to specify the type of information you want to extract by using an optional argument with the format: "%<width><letter>". The width is optional and the letter is one of:
For example, to get the approximate memory consumption of $new_pdl_of_zeroes, you could use:
the output is:
Note the use of p. In perldl, p is a shortcut for print.
Accessing Values in Piddles
Before showing you how to access values in a piddle, let's create a new piddle:
To get the element (5,3) , as a scalar, you could use:
Now, let's access one or more element (or what is called a slice) at the time. One way of getting rectangular slices of piddles is using the slice function. For example, to access the last row and first two columns of $piddle, you could type:
To access all the rows and the last two columns of $piddle, you could type:
You can also access rectangular slices in a piddle by using NiceSlices (to use NiceSclices in a Perl script, you need to declare: use PDL::NiceSlice;). Here are some examples:
What can you do to access irregular slices? You can us the function dice. Here are some examples:
To get the elements that are simultaneously in columns 1 and 3 and rows 0 and 7, we type:
To get the elements that are simultaneously in all columns and rows 0, 1 and 8, we type:
One thing about slices is that they do not occupy additional memory space. They are just representations of portions of the original piddle. Modifying a slice will modify the original piddle. Note: to assign values to slices, you must use the operator: .=. For example:
The original piddle ($piddle) now has the following values:
To access a more general subset of a piddle, you could use the functions which and index. For example, if you wanted to assign a value of 255 to those elements of $piddle that originally had a value greater than 43 and less than 72, you could proceed as follows:
Note that which will return a one dimensional piddle with the indices of non-zero values of the mask you passed. Because our piddle is two dimensional, which flattens the piddle and then find the indices. which($piddle>43) will find the indices of the elements with a value greater than 43. which($piddle<72) will find the indices of the elements with a value less than 72. The function intersect finds the intersection of the two piddles.
Note that because which returns a one dimensional piddle, you have to flatten $piddle (by using the function flat) before using the function index. The new value of $piddle is:
Sample Perl Script
In this section, I present you a more elaborate example on how to access general subsets of a piddle. To have more fun, let's get one image from the web, read the image into a piddle and substitute the brightest pixels with very dark pixels (I chose to get the pixels with a value greater than 220 and to assign them a value of zero. However, you can certainly play with those numbers). Without further ado, here is the script.
To learn more:
PDL Module Dependencies
PDL is a Perl interface to a number of useful libraries for scientific programming. If you want a smooth installation (either using tools as CPAN.pm or PPM or doing a manual installation), it is a good idea to install all the required libraries before installing PDL. To help you identify what you would need, here is a list of pre-requisites:
You can find a more detailed list of pre-requisites at the PDL Project dependencies web page. For people using Debian or a Linux Distribution based on Debian, there is a list of dependencies available at Debian's site.
Back to Meditations