I have to search many files that can be upwards of 1M lines. I need to find about 10 fields through the entire file store their value in a hash.
The method I am using works fine, but I think it is very fragile. If anything would be out of order, my method fails.
What I have done is this:
Store my fields in an array in the order I am guessing they appear
my %SEARCH_FIELDS = (
'1.X' => [qw!RESULT_T LOTID DEVICEID SETUPID STEPID OMARK D
+IEPITCH WAFERID SLOTID CENTER DEFECTS!],
'1.8' => [qw!LOTID WAFERID RESULT_T CENTER SLOTID DEFECTS D
+EVICEID DIEPITCH OMARK SETUPID STEPID!],
);
I store my regex in a hash for searching:
my %KLARF_REGEXP = (
'1.X' => {
'LOTID' => qr/LotID "(.+)";/i,
'DEVICEID' => qr/DeviceID "(\w+)";/i,
'STEPID' => qr/StepID "(.+)";/i,
'SLOTID' => qr/Slot (\d+);/i,
'DEFECTS' => qr/DefectList/i,
'RESULT_T' => qr/ResultTimestamp (.+);/i,
'WAFERID' => qr/WaferID "(.+)";/i,
'SETUPID' => qr/SetupID (.+);/i,
'OMARK' => qr/OrientationMarkLocation (.+);/i,
'DIEPITCH' => qr/DiePitch (.+);/i,
'CENTER' => qr/SampleCenterLocation (.+);/i,
},
'1.8' => {
'LOTID' => qr/LotRecord "(.+)"/i,
'DEVICEID' => qr/DeviceID 1 \{"(\w+)"\}/i,
'STEPID' => qr/StepID 1 \{"(.+)"\}/i,
'SLOTID' => qr/Field SlotNumber 1 \{(\d+)\}/i,
'DEFECTS' => qr/DefectList/i,
'WAFERID' => qr/WaferRecord "(.+)"/i,
'RESULT_T' => qr/Field ResultTimestamp \d \{(.+)\
+}/i,
'SETUPID' => qr/Field RecipeID 3 \{(.+)\}/i,
'OMARK' => qr/Field OrientationMarkLocation 1
+\{(.+)\}/i,
'DIEPITCH' => qr/Field DiePitch \d \{(.+)\}/i,
'CENTER' => qr/Field SampleCenterLocation \d \{
+(.+)\}/i,
},
);
I then shift off the current
$search_field and run until I successfully have a match. Upon a successful match, I shift off the next value.
while ( <FILE> ){
if ( $_ =~ $KLARF_REGEXP{$KLARF_VERSION}{$current_state} ){
$summary->{$current_state} = $1 if $1;
LogMsg( "Found $current_state $summary->{$current_state}" ) if
+ $summary->{$current_state};
$current_state = shift @{$SEARCH_FIELDS{$KLARF_VERSION}};
}
}
Is there another better faster stronger way to do what I am doing and not have to hard code the search order? I thought about iterating every line over every possibe regex, but I don't know if that is the best method. The machine that runs this process is already heavily utilized, so I am looking for a memory / processor efficient way to do this.