I work in machine learning and use Perl for most of my scripting, but have never bothered to use CPAN's machine learning modules. First, you often need to do some additional linear algebra on your data (e.g. centering, finding eigenvalues, SVD, etc.), and these modules don't share a common matrix representation. The lack of a common format for compact storage and a rich library of numerical algorithms makes it hard to do things quickly in pure Perl. Second, many CPAN modules I've looked at seem to have been written either for their authors' edification or without caring about large datasets (e.g. Algorithm::SVMLight requires you to add your datapoints one at a time in bulky hash-refs), while most of the problems I care about involve huge amounts of data.
I think the PDL statistics paper someone else mentioned is the best "perl for statistics" resource I've seen. Depending on your problems and level of familiarity with the field, there may be some articles on Perl.com of interest. As much as I loathe Java, I would actually recommend Weka as an implementation of lots of machine learning algorithms that work well together. But unless PDL does what you want, I'd suggest something other than Perl (including CPAN modules) for your core algorithms.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|