http://qs321.pair.com?node_id=1233692


in reply to processing file content as string vs array

You only have one @user_info in the whole file right? Otherwise your regex will give the wrong result: everything from the first @user_info_start to the last @user_info_end. This is because of '.*' in your regex, because * is 'greedy' it will try to match as much as possible. This means that after @user_info_start has been found, the regex engine will basically jump to the end of the file, and move backward one character at a time (this is called backtracking) until it finds @user_info_end.

To have the reverse behaviour: go forward one character at a time right after finding @user_info_start you could use (.*?), where .*? will start by matching nothing, and only consume an extra character when necessary.

That being said, I really like the idiom presented by haukex here, which is quite intuitive when you know that the .. operator is read as "FROM .. TO" so in haukex's code that would be FROM @user_info_start TO @user_info_end. One thing you can add to his code if you only have one occurence of @user_info in the whole file is an exit from the loop as soon as you have found your data:

use warnings; use strict; my @userinfo; LINE: while (<DATA>) { chomp; if ( /\@user_info_start/ ... /\@user_info_end/ ) { push @userinfo, $_; } elsif (@userinfo) { last LINE; # stop looking } } use Data::Dumper; print Dumper(\@userinfo); __DATA__ xxxxxxxxxxx xxxx*@user_info_start xxxx*@Title : Mr xxxx*@Username : xxxxx xxxx*@Filetype : txt xxxx*@Version : 0001 xxxx*@Create_Date : 20190407 xxxx*@Product : xxxx xxxx*@user_info_end xxxxxxxxxxxxxxxxxxxxxxxxxxxx

Replies are listed 'Best First'.
Re^2: processing file content as string vs array
by haukex (Archbishop) on May 13, 2019 at 13:41 UTC
    One thing you can add to his code if you only have one occurence of @user_info in the whole file is an exit from the loop as soon as you have found your data

    That's a very good point! Here's two more variants, the first if the start and end tag should be captured, the second if they shouldn't (replaces the if/elsif):

    if ( my $flag = /\@user_info_start/ ... /\@user_info_end/ ) { push @userinfo, $_; last LINE if $flag=~/E0/; } # - or - if ( my $flag = /\@user_info_start/ ... /\@user_info_end/ ) { last LINE if $flag=~/E0/; push @userinfo, $_ unless $flag==1; }

    See also Behavior of Flip-Flop Operators and Flipin good, or a total flop?

      ++ in the spirit of TIMTOWTDI, but I personally don't like that version because /E0/ is too much of a magic value for me.

        /E0/ is too much of a magic value for me

        I see what you mean*, but I don't mind it as much - it's explicitly documented in perlop, which I interpret as a guarantee of this API:

        The final sequence number in a range has the string "E0" appended to it, which doesn't affect its numeric value, but gives you something to search for if you want to exclude the endpoint. You can exclude the beginning point by waiting for the sequence number to be greater than 1.

        * Update: It would probably make sense to document the /E0/ with a code comment to demystify this magic!

      Thank you haukex it works awesome!!!


      All is well. I learn by answering your questions...