Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: Looking for ways to speed up the parsing of a file...

by starbolin (Hermit)
on May 18, 2008 at 00:47 UTC ( [id://687165]=note: print w/replies, xml ) Need Help??


in reply to Looking for ways to speed up the parsing of a file...

This:

if (($TotalNets == 50000) || ($TotalNets == 100000) || ($TotalNets == 250000) || ($TotalNets == 500000) || ($TotalNets == 1000000) || ($TotalNets == 1500000) || ($TotalNets == 2000000) || ($TotalNets == 3000000)) {
should be done in parallel, ie. writing the current net to a fifo or shared memory; then display the totals with another process. Inside the read loop only do those tasks specifically necessary to processing the net records. Alternately, read the file N lines at a time:
do { for (0..N) { if ( my $line = <FH>) { ... do stuff here ... } else { last; } } print "$Some_Total"; } until (eof );


You're processing every token three times here:

if ($_ =~ /wire capacitance/) { if ($_ =~ /^\s+wire capacitance\:\s+\d.*\d\s*$/) { ($NetCapRaw) = $_ =~ /^\s+wire capacitance\:\s+(\d +.*\d)\s*$/;
Replace the token on the first pass or capture the remainder of the string and pass it to another regex.


Actually I like the idea of tokenizing the whole file in a multi-pass interpreter; tokenize the file first replacing each token with a code-ref and each constant with an object that returns a constant. then execute the resulting file.


What does this do?

if (($DriverForwardSlashCount == 0) && ($NetNameForwardSlashCount == +0)) { $AddToCustomTable = 1;
There are four copies of this and they all just set $AddToCustomTable to the same value. Isn't the following the same thing?
$AddToCustomTable = 1 if ($DriverForwardSlashCount | $NetNameForwardSl +ashCount <= 1 );


There are two time eaters in the code; reading the file and executing the regexes. I would try to separate those. Read the file in and split the fields, generating a hash of tokens and data ( note this is similar to the parsing idea above.) then process the hash for your data. This would seem like extra work but often when you refactor the code like this you see optimizations you wouldn't see with the code all in one mashup like it is.


s//----->\t/;$~="JAPH";s//\r<$~~/;{s|~$~-|-~$~|||s |-$~~|$~~-|||s,<$~~,<~$~,,s,~$~>,$~~>,, $|=1,select$,,$,,$,,1e-1;print;redo}

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://687165]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (3)
As of 2024-04-20 15:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found