Just another Perl shrine | |
PerlMonks |
Translating non-printable asciiby samurai (Monk) |
on Oct 04, 2004 at 16:29 UTC ( [id://396278]=perlquestion: print w/replies, xml ) | Need Help?? |
samurai has asked for the wisdom of the Perl Monks concerning the following question:
Masters of the dromedary, My current project is to extract data from a proprietary format to MySQL. I use the database vendor's tool to dump the files to normal ASCII, and then I process them. I recently got my hands on the spec for the proprietary format. I now have the knowlege to "decode" the proprietary without using their tool to dump the file to ASCII (we're talking 27-30 gig files here. Disk usage is a big concern with this method.). My question involves iterating over string contents. The "compression" algorithm is incredibly simplistic, but effective. It uses run-length encoding for blank spaces (0xFF byte followed by ASCII byte value equaling length), and turns consecutive digits into the non-printable ASCII values. For example...
Now I can get to my question. Running speed is of the utmost importance here. I know that perl could never do this as fast as the proprietary C utility that I use to dump these 30 gig files. But if I can avoid creating temp files and read them natively in perl, I can avoid disk usage issues. What is the most efficient way to translate those ASCII bytes in perl? Perl's smallest character value, IIRC, is the string. I need to be able to translate, as per the table above, any ASCII 0x80 into "00" in place in the string, ASCII 0x81 into "01", and so on. I guess I could do s///, but regexes would probably be ridiculously slow. Or use index once per each type of replacement character, in combination with substring. But that would be running index 99 times (or more if there's more than one instance of the character) on over four million records @_@ I got my start coding in perl. So I am used to dealing with data in strings, not arrays of bytes. If anyone can help point me in the right direction for coding this up in the most efficient way possible, I'd be very grateful.
--
Back to
Seekers of Perl Wisdom
|
|