comment on

Dear all

My problem is a bioinformatics problem, I'm currently running a script that is processing 13,000 files which contain co-ordinates for mulit-chain proteins

It calculates the interaction between each of the residues, between each chain (if there are any interactions).

Thus, it is iterating not only through each chain pair, but through every possible residue pair (though not residues in the same chain of course) and seeing if they are close enough, and determining the kind of interaction if so.

My problem, which I fear is unavoidable, as it is compounded by the fact that I cannot avoid to MISS any possible interactions, is that some of the larger files take hours, even more than a day, to process.

At this rate, it can take nearly a year, to go through all the files, which is unfortunate.

So take for example, a 6 chain protein, with approx 3000 residues, that's approx 500 residues per chain. So in one chain pair, there's 500x500 iterations, which is 250,000 iterations, and because there's 6 chains, that's 15 possible chain pairs (avoiding repeats eg: AB == BA) so thats .25 shy of 4 million iterations.

I just wanted to know, what are the potential bottlenecks? One such file (larger than the above example) is still being processed after 1.5 days!

The way my program runs, is that, while it reads in the file, for every new residue it reads in, it iterates through the list currently in memory (avoiding residues in the same chain of course) to look for new interactions, and at the same time, is populating a database with the atomic and residual details, and the interactions if any. Is this a stupid way of doing it?

Cheers
Sam

In reply to Iteration speed by seaver

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Perl-Sensitive Sunglasses
	PerlMonks