Greetings Fellow Monks,
Later this month, I will be giving a talk titled: Machine Learning Made Easy with Perl. A preliminary outline is:
Part I: Exploratory Data Analysis
- Data gathering with Finance::YahooQuote
- Data munging with Perl
- Data visualization with PGPLOT and TriD
- Data clustering with FCM implemented using PDL
- Results visualization with PGPLOT
Part II: Decision Support Systems
- Data classification with SVM using a LIBSVM binary called with IPC::Open3
Part III: Pattern Recognition
- Data classification with Radial Basis Function Networks implemented using PDL
In each part, I plan to discuss the problem, the strategy to solve it, the choice of machine learning technique and the main configuration issues the participants need to understand to successfully deploy machine learning applications. I will also show snippets of the code used. For example:
For data gathering using Finance::YahooQuote:
#!/usr/bin/perl use strict; use warnings; use Finance::YahooQuote; my @symbols = ("IBM","DELL","GOOG","YHOO","MSFT","ORCL","SAP","COGN", +"BOBJ"); my @columns = ("Last Trade (Price Only)","Last Trade Date","Last Trade + Time","Day's Range","52-week Range","EPS Est. Next Year","P/E Ratio" +,"PEG Ratio","Dividend Yield"); my $arrptr = getcustomquote(\@symbols, \@columns); my $i = 0; foreach my $symbol (@symbols){ my @quotes = @{$arrptr->[$i++]}; print "$symbol\t@quotes\n"; }
For the FCM:
use strict; use warnings; use PDL; use PDL::NiceSlice; # ================================ # fcm # ( $performance_index, $prototypes, $current_partition_matrix) = # fcm( $patterns, $partition_matrix, $fuzzification_factor, # $tolerance, $max_iter ) # ================================ sub fcm { # # fuzzy c means implementation # my ( $patterns, $current_partition_matrix, $fuzzification_factor, +$tolerance, $max_iter ) = @_; my ( $number_of_patterns, $number_of_clusters ) = $current_partiti +on_matrix->dims(); my ( $prototypes, $performance_index ); my $iter = 0; while (1) { # computing each prototype my $temporal_partition_matrix = $current_partition_matrix ** $ +fuzzification_factor; my $temp_prototypes = ($temporal_partition_matrix x $patterns +)->xchg(1,0) / sumover($temporal_partition_matrix); $prototypes = $temp_prototypes->xchg(1,0); # copying partition matrix my $previous_partition_matrix = $current_partition_matrix->cop +y; # updating the partition matrix my $dist = zeroes($number_of_patterns, $number_of_clusters); for my $j (0..$number_of_clusters - 1){ my $diff = $patterns - $prototypes(:,$j)->dummy(1, $number +_of_patterns); $dist(:,$j) .= (sumover( $diff ** 2 )) ** 0.5; } my $temp_variable = $dist ** (-2/($fuzzification_factor - 1)); $current_partition_matrix = $temp_variable / sumover($temp_var +iable->xchg(1,0)); # # Performance Index calculation # $temporal_partition_matrix = $current_partition_matrix ** $fuz +zification_factor; $performance_index = sum($temporal_partition_matrix * ( $dist +** 2 )); # checking stop conditions my $diff_partition_matrix = $current_partition_matrix - $previ +ous_partition_matrix; $iter++; if ( ($diff_partition_matrix->max < $tolerance) || ($iter > $m +ax_iter) ) { last; } print "iter = $iter\n"; } return ( $performance_index, $prototypes, $current_partition_matri +x ); }
I expect the audience to be mainly Perl savvy people. However, the talk is open to all the people attending the conference. Therefore, some people in the audience might not be familiar with Perl.
The talk is scheduled to last 45 minutes. I plan to cover each part in about 10 minutes to leave between 5 and 10 minutes for questions and answers. I do not plan to explain the snippets in detail because I do not have enough time. However, I will make the code available for all those interested. My questions for you Fellow Monks are:
- If you were attending this session, would you expect me to describe the code in detail?
- Do you think it is a good strategy to concentrate on the machine learning part rather than on the Perl part?
- What suggestion do you have in terms of points that I should (should not) cover?
- Any other suggestions? thoughts?
Thank you,
Update: Fixed typo in header of FCM sub