Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
Without going into too many specifics given the nature of my work, I'm using it to try to break down chemical spectra into identifiable components. When/if I get to publish this, I'll try to let folks know, though time in the peer-reviewed journal world is only an illusion... :-)

For those that are curious, principle component analysis or factor analysis or a number of other different names descibes a method for breaking down sets of data in key basis sets. It assumes that all experimental data is a linear combination of collected data, and thus, if your collected data is N units long with M total sets, you can use singular value decomposition to get M basis sets N units long, and a square M x M weight matrix. This is an 'exact' specification. However, we typically want only C components, with C << M. Because during singular value decomposition, we generate M eigenvalues, we can use empirical, statistical, or other methods to determine what C is, and which of those M basis sets are the most important.

Note that these basis sets may have any actual meaning; as jeroenes indicates, the method breaks out these basis sets as to attempt to minimize the variation of the data along one C-dimensional vector. However, there are ways to transform the data from the PCA basis set to a set of vectors that have some meaning. In my case, it's going from a basis set of spectra that represent no real substance to spectra of real substances; I can then get an idea of the composition of all the other non-basis set data that I started with.

As jeroenes also indicated, you can use the basis sets and weights to find out where clusters of data exist, and use those to guide the selection of basis sets and transformations to understand the data better.

It's a very elegant method for large-scale data analysis and very easy to do with help from computers (there's enough empirical analysis that has to be done that a human needs to guide the end decisions).

-----------------------------------------------------
Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain
It's not what you know, but knowing how to find it if you don't know that's important


In reply to Re: Re:{4} Maintainable code is the best code -- principal components by Masem
in thread Maintainable code is the best code by dragonchild

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (2)
As of 2022-01-17 22:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    In 2022, my preferred method to securely store passwords is:












    Results (52 votes). Check out past polls.

    Notices?