Perl: the Markov chain saw | |
PerlMonks |
Re^3: Unpack Fail to Decompress String?by BrowserUk (Patriarch) |
on Jan 10, 2009 at 10:56 UTC ( [id://735378]=note: print w/replies, xml ) | Need Help?? |
I'd favour putting a count of the number of significant characters in the last byte in the last 2 bits of the that byte. (Which may require a zero last byte.) Sorry guys, but I don't see any way of making a trailing indicator allow for string comparison.
That said, the sorting and comparing of packed ACGT (other than simple equality), is a fairly non-useful thing anyway. Sequence and subsequence representations don't have any intrisic ordering. The more frequent operation is to search one sequence for the presence of another (sub)sequence, and doing that with packed sequences means you're only checking every fourth position, rather than every position. Performing shifts or rotates on bitstring greater than register size is a prohibitively expensive operation. It far outweights any gains you might get from searching quarter sized strings, as you have to perform 4 searches anyway, and you need the expensive shift operation inbetween each search. I think the only useful use for packed ACGT (2bit) format is to reduce storage overhead. It's almost certainly quicker to just unpack the sequences for searching. In which case, the prefix byte of the significant bits in the last byte is a good compromise between compression level and implementation ease. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
In Section
Seekers of Perl Wisdom
|
|