Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re: Re:{2} Maintainable code is the best code -- principal components

by Masem (Monsignor)
on Oct 03, 2001 at 08:03 UTC ( [id://116349]=note: print w/replies, xml ) Need Help??


in reply to Re:{2} Maintainable code is the best code -- principal components
in thread Maintainable code is the best code

Actually, I was also approproaching orthogonality from a principle component (PCAnalysis) standpoint (though for experimental data analysis).

Now to go over the heads of everyone else that has no idea what PCA is :-), the programming equivalent is that you have M 'overall functions' that your software will want to do. A good refactoring down to an orthogonal set in programming should result in N small functions, with N >> M. As jeroenes indicates, this is ill-defined from a PCA, as with PCA, you'd want to select a small number ( < M ) to approximate the job. However, unstated in the refactoring process is the fact that you should be thinking in the future and the past, and in reality, you might have P projects, each with M_sub_i (i = 1 to P) 'overall functions', such that the total of all functions over all projects past and present and future will result in M', with M' >> N >> M.

In plain text, you should be refactoring to find an orthogonal set of functions that are reusable for other problems, including functions that might have been created already, and ones that might be part of future programs. This is the same conclusion the parent thread reaches as well as numerous other texts on programming, for for those with a mathematical bent, there's some empricalness to it as well.

-----------------------------------------------------
Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain
It's not what you know, but knowing how to find it if you don't know that's important

Replies are listed 'Best First'.
Re:{4} Maintainable code is the best code -- principal components
by jeroenes (Priest) on Oct 03, 2001 at 09:16 UTC
    No wonder that your first reply reminded me of PCA.

    May I ask what kind of data you use PCA for?

    It apperently gets more popular these days. I have seen it used for genetic chimera analysis and DNA arrays as a prelude to clustering. When I started to use it, my mentor was very sceptic about how acceptable it would be. While the statistican who helps me told me it was a technique of about a century old so nobody should complain.

    Anyway, I use it for clustering analysis as well, but than for extracellularly recorded neuronal spike waveforms. So I sample spike waveforms from an electrode that was placed in a brain slice and turn a PCA routine loose on it. Mostly just the first two components are enough to get your clusters.

    Jeroen
    "We are not alone"(FZ)

      Without going into too many specifics given the nature of my work, I'm using it to try to break down chemical spectra into identifiable components. When/if I get to publish this, I'll try to let folks know, though time in the peer-reviewed journal world is only an illusion... :-)

      For those that are curious, principle component analysis or factor analysis or a number of other different names descibes a method for breaking down sets of data in key basis sets. It assumes that all experimental data is a linear combination of collected data, and thus, if your collected data is N units long with M total sets, you can use singular value decomposition to get M basis sets N units long, and a square M x M weight matrix. This is an 'exact' specification. However, we typically want only C components, with C << M. Because during singular value decomposition, we generate M eigenvalues, we can use empirical, statistical, or other methods to determine what C is, and which of those M basis sets are the most important.

      Note that these basis sets may have any actual meaning; as jeroenes indicates, the method breaks out these basis sets as to attempt to minimize the variation of the data along one C-dimensional vector. However, there are ways to transform the data from the PCA basis set to a set of vectors that have some meaning. In my case, it's going from a basis set of spectra that represent no real substance to spectra of real substances; I can then get an idea of the composition of all the other non-basis set data that I started with.

      As jeroenes also indicated, you can use the basis sets and weights to find out where clusters of data exist, and use those to guide the selection of basis sets and transformations to understand the data better.

      It's a very elegant method for large-scale data analysis and very easy to do with help from computers (there's enough empirical analysis that has to be done that a human needs to guide the end decisions).

      -----------------------------------------------------
      Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain
      It's not what you know, but knowing how to find it if you don't know that's important

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://116349]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (7)
As of 2024-03-29 00:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found