Description: | Many machine learning and data analysis tasks involve calculating distances between items. The Mahalanobis distance is a very popular distance because it is scale invariant. In this snippet, I present how to compute the Mahalanobis distance using the Perl Data Language. The inputs are two or three piddles (see comment below for a definition). The first piddle is a p-dimensional vector. The second piddle could be either a p-dimensional vector (when a third input is provided) or a matrix with N rows of p-dimensional vectors. If the second piddle is a matrix, the distance is computed between the center of the second piddle and the first piddle (if only two inputs are provided, the second piddle is used to compute the covariance needed to determine the Mahalanobis distance). The third piddle, which is optional, represents the covariance matrix of the distribution from which the two other piddles were drawn. Note: to compute the covariance matrix, I use the snippet presented in Computing Covariance Matrices with PDL What are Piddles? They are a new data structure defined in the Perl Data Language. As indicated in RFC: Getting Started with PDL (the Perl Data Language):
Cheers, |

#!/usr/bin/perl use warnings; use strict; use PDL; # ================================ # mahalanobis: # # $distance = mahalanobis( $x, $y, $cov ) # # computes the mahalanobis distance from a point # $x to another point $y (from the same # distribution) or from a point $x to # the centre of a group of values $y # # ================================ sub mahalanobis { my ( $x, $y, $cov, $diff, $dist ); if ( @_ < 3 ) { ( $x, $y ) = @_; $cov = covariance( $y ); } else { ( $x, $y, $cov ) = @_; } if ( $y->getdim(1) > 1 ) { $diff = $x - average( $y->xchg(0,1) ); } else { $diff = $x - $y; } my @dist = list( $diff x inv( $cov ) x transpose( $diff ) ); return $dist[0]; }

Replies are listed 'Best First'. | |
---|---|

Re: Computing the Mahalanobis distance with the Perl Data Language
by dmorgo (Pilgrim) on Aug 13, 2007 at 09:13 UTC |

Comment onComputing the Mahalanobis distance with the Perl Data LanguageDownloadCode