Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re^4: Sorting Gigabytes of Strings Without Storing Them (bytes)

by BrowserUk (Patriarch)
on Dec 24, 2008 at 10:51 UTC ( [id://732445]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Sorting Gigabytes of Strings Without Storing Them (bytes)
in thread Sorting Gigabytes of Strings Without Storing Them

Hm. I can't get this to work. There's a couple of obvious things:

  1. %#quad; => $#quad;
  2. /ge

But the main problem is my long standing trouble with your NestedLoops. You're passing in an anon array of four references to an array containing ACGT: [ ( \@bases ) x 4 ], but that results in Not an ARRAY reference... from the @$_ within the map, and I can't see how to put that right?

Also, for a proper comparison, you;d need a 'decode' operation.

Generally, I have tried various pure perl methods of packing and unpacking genomic data--there's one of my previous attempts here, and there are others kicking around--but given it would require saving 3 minutes (on 1e6 cycles) during the packing and unpacking to allow salva's in-memory sort approach to compete with the external sort, I fear that none of them are going to achieve much.

A pair of Inline::C routines might achieve sufficient performance to make an in-memery approach viable, but given the shortness on the strings and the need to transition perl->C->perl for every 34 chars, even that seems unlikely. If the inputs were always multiples of 4-bytes and always 4-byte aligned, maybe. But dealing with the edge cases where those two conditions cannot be guarenteed makes things much slower.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://732445]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (2)
As of 2024-04-25 19:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found