http://qs321.pair.com?node_id=625123


in reply to Re: RFC: Presentation on Machine Learning with Perl
in thread RFC: Presentation on Machine Learning with Perl

Hi Trizor,

Thank you very much for your feedback. I really appreciate it!

I will address your comments one by one. Please, let me know if I miss something ;-)

Provide a reason for using Perl versus something else, and the modules you chose (I know several don't have alternatives).

About Perl, I want to show that Perl is a valid alternative for machine learning. I do not claim that Perl is the best option for every single application in which you might want to use machine learning. However, I claim that Perl can shine in different aspects, which is related to your second comment. The modules were selected to show different ways in which you can use Perl for machine learning (they represent only one way of the many ways to do things using Perl):

  1. For data gathering, visualization, and analysis (Part I). It really is easy to mine the web for data using Perl. Once you have the data, you can easily transform them to have a format that would facilitate further analysis. Perl also allows you to quickly plot the data to facilitate collaboration with the problem domain expert. The choice of Fuzzy C-Means (FCM) for data analysis has to do with my expertise in using it to make sense of data ;-) Writing a FCM implementation in Perl was one of the first things I did when learning Perl. So I am really proud of it :-)
  2. For Decision Support Systems (Part II). Here, instead of using one of the CPAN modules for Support Vector Machines (SVM), I decided to call a SVM binary using IO::Open3. The main reason for doing so, is that I want to show that you can easily call applications written in other languages using Perl. This is just other way of using Perl for machine learning: you do the data gathering and preparation using Perl and then you call an application written in another language. The data for this part consists of image data and clinical records of patients with Scoliosis that participated in one study we did at my University. Note: the data is not publicly available because we do not have ethics approval to do so. Our ethics approval is only for data analysis in our lab.
  3. For Pattern Recognition (Part III). The choice of writing my own radial basis function neural network code has to do with the fact that I like to learn by doing. Again, I translated some old code of mine to Perl. The data for this part comes from Environment Canada. The problem we wanted to solve was to classify storm cells in one of four possible classes: Hail, Rain, Tornado, Win. Note: this data is not publicly available. It belongs to Environment Canada.
Also make sure that the FCM algorithm gets accross despite any possible language barriers that may exist in your audience. I suggest showing a flowchart of the algorithm before the Perl implementation and then highlighting some of the stages within the Perl. Also check your function header for the fcm function, I don't think it is accurate.

Explaining the FCM should not be that hard considering that I have several years of experience presenting my research with it to general and scientific audiences. Regarding the function header, you are right, I will fix it as soon as I can.

Regarding the SVM part, try to explain SVM better than the wikipedia article. I just couldn't grok it so I don't have much else to say. Perhaps explain why you're using IPC::Open3 to talk to a library and not XS or Inline?

I will do my best! I like to explain the SVM comparing it with a neural networks classifier in solving a two-class classification problem. In particular, I like to stress that while the outputs of the neural network classifier are obtained using any plane that would separate the two classes, the outputs of the SVM are obtained using the plain that maximizes the separation between classes.

Regarding the use of IPC::Open3, I already explained that when answering your first set of comments.

The third part seems rather easy to understand if one has a basic knowledge of ANNs and how they're represented mathematically, the one major inconsistency I find is you talk a lot about doing things with Data, but what data will you use? Will it be the stock market data mined in the beginning of PartI for consistency or will you use simpler data later on to allow the points to shine through?

As I mentioned above, the data for Parts II and III are different from that in Part I. For Part II, I will use clinical data. For Part III, I will use weather data. In my experience, the data in Part II is the most complex one, then the one in Part III. The data in Part I is the simplest of the three.

Again, Trizor thank you for your comments.

Cheers,

lin0