Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Iteration condition...

by abachus (Monk)
on Mar 31, 2008 at 14:47 UTC ( [id://677543]=perlquestion: print w/replies, xml ) Need Help??

abachus has asked for the wisdom of the Perl Monks concerning the following question:

Good day fellow monks,

I'm trying to find better ways to do what the following code does. This could be applied to many situations, one of which could be to read the contents of an html page and processes certain tags. Anyone know a shorter, more economical method ?

for(@Input) { if(/$Start/) { $State = 1; } elsif(/$Finish/) { $State = 0; } else { $State = $State; } ($State) ? do_this() : do_that; }
thanks ever so much,

Isaac Close.

Replies are listed 'Best First'.
Re: Iteration condition...
by rhesa (Vicar) on Mar 31, 2008 at 15:00 UTC
    You might have use for the "flip-flop" operator .., documented in perlop under Range Operators:
    for( @input ) { if( /$start/ .. /$finish/ ) { do_this(); } else { do_that(); } }

      The trouble with using the flip flop directly is that it's still true when /$Finish/ matches, but the OP wants $State to be false when /$Finish/ matches. To account for this, my solution says ( ( /$Start/ .. /$Finish/ ) && ! /$Finish/ ). The inner parentheses are necessary because && binds tighter than the flip flop. Without them, it means ( /$Start/ .. ( /$Finish/ && ! /$Finish/ ) ), which is true at /$Start/ and then never turns false.

        You're quite right. Gotta love boundary conditions ;-)

        As an alternative to the parentheses, you could replace && with and, which has low enough precedence: /$start/ .. /$finish/ and !/$finish/.

Re: Iteration condition...
by CountZero (Bishop) on Mar 31, 2008 at 16:48 UTC
    What you are trying to build is a state machine.

    There are some modules available doing exactly that, such as Class::StateMachine or FSA::Rules. Have a look at them, they may very well inspire you.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      You sir, are the middle and two ends of a fine gentleman -- and a scholar to wit. This is by far the best answer I've seen on this thread. I wish I could ++ more than once.

      FWIW to the OP: Desaware offers a pretty decent introduction to state machines. It seems to do a better job explaining them than I ever do, in any case.

      <radiant.matrix>
      Ramblings and references
      The Code that can be seen is not the true Code
      I haven't found a problem yet that can't be solved by a well-placed trebuchet
      I don't know, I think a state machine may be overkill for something with only two states ("started" and "finished" from the looks of the OP).
        May be, but as the OP refered to HTML it will not end with two states and then having a look at the code of those modules can do no harm.

        CountZero

        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: Iteration condition...
by moritz (Cardinal) on Mar 31, 2008 at 14:58 UTC
    You can make it fancier by storing function references:
    my $action = \&initial_action; for (@input){ if (/$Start/){ $action = \&do_this; } elsif (/$Finish/){ $action = \&do_that; } # perform your action $action->(); }

    But it might depend on your idea of "simple" if that's really more simple.

Re: Iteration condition...
by kyle (Abbot) on Mar 31, 2008 at 15:05 UTC
    for ( @Input ) { $State = ( ( /$Start/ .. /$Finish/ ) && ! /$Finish/ ); ($State) ? do_this() : do_that; }
Re: Iteration condition...
by radiantmatrix (Parson) on Mar 31, 2008 at 19:50 UTC

    I don't know about "shorter, more economical", but perhaps easier to maintain:

    ## a dispatch table in the form state => CODEref my %operation = ( 1 => \&do_this, 0 => \&do_that, 9 => \&do_another_thing, ); ## a table of transitions in the form pattern => state my @transition = ( [ $Start => 1 ], [ $Finish => 0 ], [ $Break => 9 ], ); my $State = 0; # or whatever your initial state is. for my $item (@Input) { # check transition conditions in order; if met, change state for my $trans (@transition) { next unless $item =~ /$trans->[0]/; $State = $trans->[1]; last; } $operation->{$State}->(); # dereference and execute op for this S +tate }

    That's a very simple state machine that uses a dispatch table. It's pretty easy to expand: add transitions to the @transition structure, write a sub for each operation and add it to the %operation dispatch table. You don't need to change any other code.

    However, if you have a large number of states and/or transitions, using one of the state-machine modules that CountZero references above could be a better choice.

    <radiant.matrix>
    Ramblings and references
    The Code that can be seen is not the true Code
    I haven't found a problem yet that can't be solved by a well-placed trebuchet
Re: Iteration condition...
by runrig (Abbot) on Mar 31, 2008 at 16:47 UTC
    As others point out, the flip-flop operator can work, except that it'll return true on your finish condition, which you seem to want to exclude (update: also, why the useless "$State = $State" statement?). Also, I'd not use the ternary "?:" operator in this case, since its purpose of it is to return a value, which you don't use. So here's my go at it:
    for (@input) { my $status = /$start/.../$finish/; if ( $status and $status !~ /E/ ) { do_this(); } else { do_that(); } }
Re: Iteration condition...
by wade (Pilgrim) on Mar 31, 2008 at 15:31 UTC
    I'm probably pounding a thumbtack with a sledgehammer, here, but what about something like (and this assumes that $Start and $Finish don't change through the code operation):
    my $machine = {$Start => \&do_this, $Finish => \&do_that}; my $doit = \&do_this; foreach my $line(@Input) { my $func = $machine->{$line}; $doit = $func if (defined $func); $doit(); }
    Now, this is totally untested but something like this should work...
    --
    Wade
      note that $Start and $Finish are regexes in the OP, so you can't just use them as if they were strings (as hash keys), and you can't assume that the whole string in $line matches. (It's enough if a substring matches, unless the regexes contain anchors).
        You're exactly right, of course -- I didn't fully caveat the code (hey, this guy used caveat as a verb -- can he do that?). I was just trying to provide an alternative line of thought. 8o)
        --
        Wade
Re: Iteration condition...
by repellent (Priest) on Apr 01, 2008 at 17:56 UTC
    Regarding the flip-flop operator in scalar context, there is a subtle difference between the double-dot and triple-dot version.

    The double-dot version evaluates the right-expression immediately after the left-expression unlocks the operator as true.
    The triple-dot version waits until the next re-evaluation of the flip-flop operator before it evaluates the right-expression.

    For example, try these on the shell:
    echo "123\n2\n3\nxxx\n1\n2\n3" | perl -lne 'if (/1/.../3/) { print + $_ }' echo "123\n2\n3\nxxx\n1\n2\n3" | perl -lne 'if (/1/../3/) { print +$_ }'
    In the case of the OP, the triple-dot version is preferred.
Re: Iteration condition...
by Anonymous Monk on Apr 02, 2008 at 00:33 UTC
    I may be misunderstanding the post by here is my opinion. If you are trying to parse HTML. Use one of the many modules to do it for you. Such as HTML::TreeBuilder. Or if you are working with XML: XML::Simple. Modules such as these have the ability to parse markup language into a convenient data structure. Just my 2 cents.
Re: Iteration condition...
by mobiusinversion (Beadle) on Apr 01, 2008 at 18:26 UTC
    I would reevaluate your strategy considering those seemingly simple regexes could actually be quite nasty.

    For example, what is the start of an html tag? Is it '<'? What is the close? Is it '>'? The answer to both questions is not always:
    $html_tag = '<img src="img.png" alt="peg leg > with a kickstand">';
    Is a single valid html tag. To accurately describe and manipulate html with regular expressions, one would want something like:
    my $x = SOME_HTML; while($x =~ /\G(.*?)(<(?:"[^"]*"|'[^']*'|[^'">])*>)/gcs){ do_something_with_text($1); do_something_with_tag($2); }
    Which could be packaged as an iterator.
Re: Iteration condition...
by stiller (Friar) on Mar 31, 2008 at 14:55 UTC
    Update: Moritz is right, it does something else entirely. rhesa's flipflop is a good solution
    for(@Input) { do_this($_) and next if /$Start/; do_that($_) if /$Finish/; }
      That's different from the original in that it doesn't do the same as the previous iteration if there is no match at all.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://677543]
Approved by Corion
Front-paged by Erez
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (4)
As of 2024-04-25 12:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found