Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

A common practice in machine learning is to preprocess the data before building a model. One popular preprocessing technique is data normalization. Normalization puts the variables in a restricted range (with a zero mean and 1 standard deviation). This is important to achieve efficient and precise numerical computation.

In this snippet, I present how to do data normalization using the Perl Data Language. The input is a piddle (see comment below for a definition) in which each column represents a variable and each row represents a pattern. The output is a piddle (in which each variable is normalized to have a 0 mean and 1 standard deviation), and the mean and standard deviation of the input piddle.

What are Piddles?

They are a new data structure defined in the Perl Data Language. As indicated in RFC: Getting Started with PDL (the Perl Data Language):

Piddles are numerical arrays stored in column major order (meaning that the fastest varying dimension represent the columns following computational convention rather than the rows as mathematicians prefer). Even though, piddles look like Perl arrays, they are not. Unlike Perl arrays, piddles are stored in consecutive memory locations facilitating the passing of piddles to the C and FORTRAN code that handles the element by element arithmetic. One more thing to note about piddles is that they are referenced with a leading $

Cheers,

lin0

#!/usr/bin/perl use warnings; use strict; use PDL; use PDL::NiceSlice; # ================================ # normalize # ( $output_data, $mean_of_input, $stdev_of_input) = # normalize( $input_data ) # # processess $input_data so that $output_data # has 0 mean and 1 stdev # # $output_data = ( $input_data - $mean_of_input ) / $stdev_of_input # ================================ sub normalize { my ( $input_data ) = @_; my ( $mean, $stdev, $median, $min, $max, $adev ) = $input_data->xchg(0,1)->statsover(); my $idx = which( $stdev == 0 ); $stdev( $idx ) .= 1e-10; my ( $number_of_dimensions, $number_of_patterns ) = $input_data->dims(); my $output_data = ( $input_data - $mean->dummy(1, $number_of_patterns) ) / $stdev->dummy(1, $number_of_patterns); return ( $output_data, $mean, $stdev ); }

In reply to Data Normalization with PDL by lin0

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (6)
As of 2024-04-16 22:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found