http://qs321.pair.com?node_id=11145629


in reply to Optimization tips

On top of what hippo says, you are reading and parsing the mapping files anew for every line you process, that's an awful lot of wasted work. Similarly, you are splitting the line into @source repeatedly for each mapping record.

As a first step: separate out the reading and parsing of the mapping files into data structures, do that once, then walk through the data structures in the loop over @lines. That might look something like this:

# Mapping des charges directes my @mapping1 = map { my @mapping = split /\t/, $_; # account, mapped account, mapped source [ $mapping[0], $mapping[2], $mapping[3] ]; } <MAPPINGFILE1>; # Mapping des charges indirectes my @mapping2 = map { my @mapping = split /\t/, $_; # account, mapped account, mapped source [ $mapping[0], $mapping[4], $mapping[2] ]; } <MAPPINGFILE2>; LINE: for my $line (@lines) { my $source = (split /\t/, $line)[2]; # Mapping des charges directes for my $mapping1 (@mapping1) { my($account, $mapped_account, $mapped_source) = @$mapping1; # Account is matching with source if ($line =~ /$account/) { # Account substitution if ($mapping eq "") { $line =~ s/$account/"Compte cible non défini !"/; } else { $line =~ s/$account/$mapped_account/; } # Mapping = target Unit, Alloc_ + Unit source if ($mapped_source eq 'Unit source') { # Mapping : source Unit, Alloc_ + source Unit $line =~ s/$source/$source\tALLOC_$source/; } elsif ($mapped_source eq "") { $line =~ s/$source/"Unit cible non définie !"\tALLOC_$source/; } else { # Unit substitution $line =~ s/$source/$mapped_source\tALLOC_$source/; } push @lines2, $line; next LINE; } } # Mapping des charges indirectes for my $mapping2 (@mapping2) { my($account, $mapped_account, $mapped_source) = @$mapping2; if ($line =~ /$account/) { $line =~ s/$account/$mapped_account/; $line =~ s/$source/$mapped_source\tALLOC_$source/; push @lines2, $line; next LINE; } } push @rejects, "Lignes non mappées (Account): \t$line"; }

However I suspect that even bigger savings are possible: for example, if the account name appears in a specific column in $line, you could probably turn the whole thing into a single hash lookup.

Replies are listed 'Best First'.
Re^2: Optimization tips
by Marshall (Canon) on Jul 24, 2022 at 17:24 UTC
    I see that you fixed one "unreachable code" problem...
    Op should be aware of this in his code:
    To be inside of the while (<MAPPINGFILE2>) { loop means that you are not at the eof yet. When this while loop finishes, MAPPINGFILE2 will be eof, but not before.

    UPDATE: I decided that this is not right. The read of the very last line will consume all characters and hence reach eof. So you can reach eof before a read of <MAPPINGFILE2> would return an undef (the normal way to detect eof).

    while (<MAPPINGFILE2>) { chomp(); # Skip blank lines and comments next if /^(\s*(#.*)?)?$/; # Split mapping columns (tab) @mapping = split /\t/, $_; # Mapping is matching if ($line =~ m/$mapping[0]/) { $line =~ s/$mapping[0]/$mapping[4]/; @source = split /\t/, $line; $line =~ s/$source[2]/$mapping[2]\tALLOC_$sour +ce[2]/; push @lines2, $line; last; } elsif (eof(MAPPINGFILE2)) { #correction: ###### +Can happen push @rejects, "Lignes non mappées (Account): +"."\t".$line; } }