RFC: Presentation on Machine Learning with Perl

Greetings Fellow Monks,

Later this month, I will be giving a talk titled: Machine Learning Made Easy with Perl. A preliminary outline is:

Data gathering with Finance::YahooQuote
Data munging with Perl
Data visualization with PGPLOT and TriD
Data clustering with FCM implemented using PDL
Results visualization with PGPLOT

Data classification with SVM using a LIBSVM binary called with IPC::Open3

Data classification with Radial Basis Function Networks implemented using PDL

In each part, I plan to discuss the problem, the strategy to solve it, the choice of machine learning technique and the main configuration issues the participants need to understand to successfully deploy machine learning applications. I will also show snippets of the code used. For example:

For data gathering using Finance::YahooQuote:

#!/usr/bin/perl
use strict;
use warnings;

use Finance::YahooQuote;

my @symbols = ("IBM","DELL","GOOG","YHOO","MSFT","ORCL","SAP","COGN", 
+"BOBJ");
my @columns = ("Last Trade (Price Only)","Last Trade Date","Last Trade
+ Time","Day's Range","52-week Range","EPS Est. Next Year","P/E Ratio"
+,"PEG Ratio","Dividend Yield");

my $arrptr = getcustomquote(\@symbols, \@columns);

my $i = 0;
foreach my $symbol (@symbols){
    my @quotes = @{$arrptr->[$i++]};
    print "$symbol\t@quotes\n";
}
[download]

For the FCM:

use strict;
use warnings;

use PDL;
use PDL::NiceSlice;
# ================================
# fcm
# ( $performance_index, $prototypes, $current_partition_matrix) = 
#   fcm( $patterns, $partition_matrix, $fuzzification_factor,
#        $tolerance, $max_iter )
# ================================
sub fcm {
#
# fuzzy c means implementation
#
    my ( $patterns, $current_partition_matrix, $fuzzification_factor, 
+$tolerance, $max_iter ) = @_;
    my ( $number_of_patterns, $number_of_clusters ) = $current_partiti
+on_matrix->dims();
    my ( $prototypes, $performance_index );
    my $iter = 0;
    while (1) {
        # computing each prototype
        my $temporal_partition_matrix = $current_partition_matrix ** $
+fuzzification_factor;
        my $temp_prototypes = ($temporal_partition_matrix  x $patterns
+)->xchg(1,0) / sumover($temporal_partition_matrix);
        $prototypes = $temp_prototypes->xchg(1,0);

        # copying partition matrix
        my $previous_partition_matrix = $current_partition_matrix->cop
+y;

        # updating the partition matrix
        my $dist = zeroes($number_of_patterns, $number_of_clusters);
        for my $j (0..$number_of_clusters - 1){
            my $diff = $patterns - $prototypes(:,$j)->dummy(1, $number
+_of_patterns);
            $dist(:,$j) .= (sumover( $diff ** 2 )) ** 0.5;
        }

        my $temp_variable = $dist ** (-2/($fuzzification_factor - 1));
        $current_partition_matrix = $temp_variable / sumover($temp_var
+iable->xchg(1,0));

        #
        # Performance Index calculation
        #
        $temporal_partition_matrix = $current_partition_matrix ** $fuz
+zification_factor;
        $performance_index = sum($temporal_partition_matrix * ( $dist 
+** 2 ));

        # checking stop conditions
        my $diff_partition_matrix = $current_partition_matrix - $previ
+ous_partition_matrix;
        $iter++;
        if ( ($diff_partition_matrix->max < $tolerance) || ($iter > $m
+ax_iter) ) {
        last;
        }
        print "iter = $iter\n";
    }
    return ( $performance_index, $prototypes, $current_partition_matri
+x );
}
[download]

I expect the audience to be mainly Perl savvy people. However, the talk is open to all the people attending the conference. Therefore, some people in the audience might not be familiar with Perl.

The talk is scheduled to last 45 minutes. I plan to cover each part in about 10 minutes to leave between 5 and 10 minutes for questions and answers. I do not plan to explain the snippets in detail because I do not have enough time. However, I will make the code available for all those interested. My questions for you Fellow Monks are:

If you were attending this session, would you expect me to describe the code in detail?
Do you think it is a good strategy to concentrate on the machine learning part rather than on the Perl part?
What suggestion do you have in terms of points that I should (should not) cover?
Any other suggestions? thoughts?

Thank you,

lin0

Update: Fixed typo in header of FCM sub

Back to Meditations