Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Need for speed (sometimes)

by bronto (Priest)
on Jul 27, 2004 at 18:10 UTC ( [id://377809]=note: print w/replies, xml ) Need Help??


in reply to Re: Most efficient way to build output file?
in thread Most efficient way to build output file?

While I'd suggest to iKnowNothing to try his luck testing a couple of different approaches with some benchmarking module (like Benchmark, for example, that has some coverage on the Camel Book), I would like to tell a small story

We had a closed-source mail server that can output a dump of its internal database in plain text, with each record beginning with a KEY=something string at the very beginning of a line, all records indented in subsequent lines, and an empty line as a record separator. I had to read two such dumps taken in different days, and output the changes in a format that had to be fed back to a preproduction server. The dumps are many gigabytes big.

I wrote and evolved a script that read the two files line by line, doing some comparisons and pattern matchings, and trying to speed it up; I used all the best practices that I knew and tried to keep the code clear and clean. Running it on an old SUN server I could not succeed in make it run in less than 40 minutes

I then passed it to a colleague that changed it here and there, eliminating a couple of subroutines and modules; his coding style was rather old-looking to me (it recalled me the old days of Perl 4) and a bit less clear, but it ran in 32 minutes! He tested it on his Linux box, and it took about 11 minutes (mine took 15/16 minutes).

Now, 40 minutes was fast enough for us, and so was 32, but since there was a version of the script that was 20% faster than mine, it meant that I could improve it a lot.

So, the following day I remembered that I could change the input record separator to "\n\n" and read one record at a time. Moreover, having the whole record in a string I could just match what was interesting, instead of trying different patterns at each line read. Did some benchmarking, changed a couple of subs and rerun the script:16 minutes on the SUN server (yes: a 100% speedup!)

This is to say that sometimes you don't really need to make your programs faster, but trying to do it teaches you things that you never cared about --I knew the input-record-separator thing, but I never realized before how to use it to make my job easier and my scripts faster

My 2 Eurocents

PS: Oooops! Incidentally, I wrote a meditation!

Ciao!
--bronto


The very nature of Perl to be like natural language--inconsistant and full of dwim and special cases--makes it impossible to know it all without simply memorizing the documentation (which is not complete or totally correct anyway).
--John M. Dlugosz

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://377809]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (1)
As of 2024-04-25 01:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found