Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Shorten script

by Anonymous Monk
on Jul 22, 2003 at 18:35 UTC ( [id://276865]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have some if conditions where I push the data into an array. Anyway I can shorten because I repeat some stuff in here and would like to make it look better:
while($line = <F>) { for $hit ($line =~ /matchPattern/gi) { if($line =~ /(patternOne)/gi) { print "PAGE ->\t$name\ndata ->\t\t$1\nMATCHED - >\t$hit\n"; push (@files, $name); $ct++; } elsif($line =~ /(patternTwo)/gi) { print "PAGE ->\t$name\ndata ->\t\t$1\nMATCHED - >\t$hit\n"; push (@files, $name); $ct++ } else { print "PAGE ->\t$name\ndata -> TEXT INFO HERE.\n"; push (@files, $name); $ct++ } } } close F;

Replies are listed 'Best First'.
Re: Shorten script
by BrowserUk (Patriarch) on Jul 22, 2003 at 19:19 UTC

    Here's my go.

    while( <F>){ for $hit ( /matchPattern/gi) { print "PAGE ->\t$name\ndata ->", /(patternOne)/gi or /(patternTwo)/gi ? "\t\t$1\nMATCHED - >\t$hit\n" : "TEXT INFO HERE.\n"; push (@files, $name); $ct++ } } close F;

    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller

        ++ Well picked up on the or thing. I really should have added an "untested" rider to that code. It was purely an editor exercise and completely untested. I should have spotted the error, but I tend to use or rather than || as a matter of course and let the compiler inform me when I need to change it. Hey, that's what compilers are good at :)

        With respect to the non-localisation of $_ in while loops. Your right to point it out. It's one of those things that I take on a case by case basis, but it is so useful, that I wouldn't want to never use it.

        Does anyone know if there is a good reason why $_ isn't localised in while loops as it is in for loops?


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller

      Thanks for ALL your replies!!! I now see it looking better.
Re: Shorten script
by sauoq (Abbot) on Jul 22, 2003 at 18:58 UTC
    while($line = <F>) { for my $hit ($line =~ /matchPattern/gi) { $ct++; push @files, $name; print "PAGE ->\t$name\ndata ->"; if( $line =~ /(patternOne)/i ) { print "\t\t$1\nMATCHED - >\t$hit\n"; } elsif( $line =~ /(patternTwo)/i ) { print "\t\t$1\nMATCHED - >\t$hit\n"; } else { print " TEXT INFO HERE.\n"; } } }
    -sauoq
    "My two cents aren't worth a dime.";
    
      Since you're printing more or less the same thing, see if you can combine the patterns you're matching against thusly:
      while($line = <F>) { for my $hit ($line =~ /matchPattern/gi) { $ct++; push @files, $name; print "PAGE ->\t$name\ndata ->"; if( $line =~ /(patternOne|patternTwo)/i ) { print "\t\t$1\nMATCHED - >\t$hit\n"; } else { print " TEXT INFO HERE.\n"; } } }

        In general, it's more efficient to elevate the "ors" out of the pattern. In other words,

        $line =~ /pat1/ or $line =~ /pat2/
        is usually better than a single more complex regular expression.

        -sauoq
        "My two cents aren't worth a dime.";
        
Re: Shorten script
by l2kashe (Deacon) on Jul 22, 2003 at 19:07 UTC
    You could also use something along the lines of a dispatch table.
    my %match = ( 'pattern_one' => 1, 'pattern_two' => 1, ); while (<F>) { next unless ( m/(match_pattern)/ ); print "PAGE ->\t$name\ndata ->"; print $match{$1} ? "\t\t$1\nMatched ->\t$hit\n" : " TEXT INFO HERE\n"; push(@files, $name); $ct++; } close(F);

    In this case the %match is a little useless, as your blocks aren't really doing anything all that different aside from what to print. You are also using variables in the block that aren't being created in the block so its difficult to grasp exactly what you are trying to shorten. In a more complex case, the value of $match{pattern} could be a code ref, or any other data structure which could shorten the main loop a touch.. Something like

    my %match = ( 'pattern_one' => [\&sub_one, $results_one], 'pattern_two' => [\&sub_two, $results_two], ); while (<F>) { chomp; next unless ( m/(match_pattern)/ ); unless ( $match{$1} ) { #something basic goes here next; } # get the right function from our dispatch table ($func, $results) = [ $match{$1} ]; # now call it, pass it the text to process, and store the # results. $results = &$func($1); } # END while <F> close(F);

    In my own code, I tend to either make $results either an array or hash reference, depending on the situation, and the call may be altered to be say push(@$results, &$func($1)); or some such.

    HTH, regards

    use perl;

Re: Shorten script
by roju (Friar) on Jul 22, 2003 at 18:45 UTC
    Roll the inner three checks into a loop. So you have
    my @patterns = (qr/(patternOne)/, qr/(patternTwo)/); my $match; print "PAGE ->\t$name\ndata ->"; for my $pat (@patterns) { if ($line =~ $pat and not $match++) { print "\t\t$1\nMATCHED - >\t$hit\n"; push (@files, $name); $ct++; last; } } $print " TEXT INFO HERE.\n" if not $match;

    Mmmmm refactoring. See also Once and Only Once.

    Update: Fixed bugs pointed out by Re: Re: Shorten script. Big error, thanks for the heads-up. Hrm. Could probably still be cleaned up more.. let's call that an exercise to the reader..

    Update: stole idea (break up prints) from sauoq's response. If you are unlikely to add more conditions, his way is probably better. Mine is pretty easy to add new conditions to though, just add to the @patterns = (...);

      That dosen't do the same thing as the original code did. It would only use the first matching pattern and then stop, and it also had an action for when none of them matched.
Re: Shorten script
by ihb (Deacon) on Jul 23, 2003 at 08:32 UTC

    There has been other good replies already, so I'll leave it with that. Except there is (at least) one thing that everyone seem to have overlooked.

    The $ct variable is increased for every hit, or more interestingly, for each time @files is increased. So that can be replaced by

    $ct -= @files; while (my $line = <F>) { ... } # Old code without $ct $ct += @files;
    HTH,
    ihb
Re: Shorten script
by chunlou (Curate) on Jul 22, 2003 at 21:10 UTC
    More like different than short perhaps:
    @files; $name = 1; sub dothis { ($name, $m, $hit) = @_; print "PAGE ->\t$name\ndata ->" , $m ? "\t\t$m\nMATCHED - >\t$hit\ +n" : " TEXT INFO HERE.\n"; return $name; } while ($line = <DATA>){ push @files, map { $line =~ /(patternOne)/i || $line =~ /(patternTwo)/i ? dothis( +$name, $1, $_) : dothis($name); } ($line =~ /matchPattern/gi); } print "\nfiles: @files\nct: ", scalar @files, "\n"; # for debugging +purpose __DATA__ matchPattern matchPattern patternTwo matchPattern la la la patternTwo matchPattern blablablamatchPattern huh? patternOne patternOnepatternOnepatternOnematchPattern patternOne matchPattern oh dear patternTwo la la la patternTwo matchPattern here we go again matchPattern done yet? machPattrn over over over

    _____________________
    Update: There were only two replies when I started to reply... by the time I posted, BrowserUk already posted a better reply. Oopsie.
Re: Shorten script
by hangmanto (Monk) on Jul 23, 2003 at 20:07 UTC
    Since $name is not changed within the while loop, why keep pushing it into @files?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://276865]
Approved by sauoq
Front-paged by hsmyers
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (1)
As of 2024-04-25 01:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found