Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Regular Expression problem when Extracting Start\ VALUE \End

by gasho (Beadle)
on Sep 30, 2005 at 14:19 UTC ( [id://496428]=perlquestion: print w/replies, xml ) Need Help??

gasho has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to come up with universal code :) that will extract value between $StartTag value $EndTag from single line. I am having problem when special characters are involved.
my $line = <DATA>; my @wanted_substrings=(); #No Problem #my $StartTag='START'; #my $EndTag='END'; #Error Unmatched ) in regex; marked by <-- HERE in m/TRicky\(.*?) #my $StartTag="TRicky\\"; #my $EndTag="\\endTricky"; #No Error but no value VALUE #my $StartTag="Next\$"; #my $EndTag="\^Next"; #No Error but no value VALUE my $StartTag="Last\+"; my $EndTag="\+some"; if ($line=~/$StartTag(.*?)$EndTag/g) { push(@wanted_substrings,$1) ; } print join "\n", @wanted_substrings; __DATA__ CharSTARTanotherENDCharTRicky\VALUEE\endTrickyNext$VALUE^NextLast+VALU +E+some
#Forgot to mention if I do not use $StartTag or $EndTag # and insted use actual string that it will work. #Instead if ($line=~/$StartTag(.*?)$EndTag/g) #This one works if ($line=~/TRicky\\(.*?)\\endTricky/g) #Problem is that I have to use varialble $ because #I am using it as an arg in my sub sub getInfoFromSingleLineMultiLineFile { #$stag,$etag uses as arguments my ($InputFile,$stag,$etag)=@_; my ($line,@wanted_substrings); #Openning file for reading open(IFH,"$InputFile") || die "Can't open file: $InputFile\n"; while($line=<IFH>) { if ($line =~ m/$stag(.*?)$etag/g) { push(@wanted_substrings,$1); } } return @wanted_substrings; }
Thanks in advance Gasho

Replies are listed 'Best First'.
Re: Regular Expression problem when Extracting Start\ VALUE \End
by japhy (Canon) on Sep 30, 2005 at 14:35 UTC
    Backslashes are a pain in the back. Slash. The problem is that your regex ends up being /TRicky\(.*?)\endTricky/ because your variables interpolate. When that gets compiled as a regex, it's a problem because the trailing backslash of "TRicky\" has escaped the opening parenthesis. I would suggest using my $StartTag = qr/TRicky\\/; my $EndTag = qr/\\endTricky/; The qr// operator will keep things properly backslashed later, because the content is treated like a regex.

    Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
    How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart
      Thank you all for quick responses
      #Works fine my $StartTag = qr/TRicky\\/; my $EndTag = qr/\\endTricky/;
Re: Regular Expression problem when Extracting Start\ VALUE \End
by philcrow (Priest) on Sep 30, 2005 at 14:31 UTC
    Why not just make sure the string is a single line (say with split) then use extract_tagged from Text::Balanced. Unless you are just trying to teach yourself regexes, this module is ideal.

    Phil

      I got an error when tried to use Text::Balanced I verified that I have Balanced.pm under /lib/Text Thanks
      use Text::Balanced; $text='blabla<Else><LogEntry message="FAIL TESTCASE "/><FailTestCase/> +</Else>blabla'; ($extracted, $remainder) = extract_tagged($text); print $extracted; #Error #Undefined subroutine &main::extract_bracketed called at C:\InstallV3\ +Test.pl
        Text::Balances does not export functions into the main namespace by default. This means you have two options. First, you could ask for the function by name:
        use Text::Balanced qw( extract_tagged ); # The rest of your code from above here.
        This will bring extract_tagged into your module's namespace.

        Alternatively, you could fully qualify the name:

        use Text::Balanced; my $text = 'sometexthere'; ($extracted, $remainder) = Text::Balanced::extract_tagged($text);
        Phil
Re: Regular Expression problem when Extracting Start\ VALUE \End
by salva (Canon) on Sep 30, 2005 at 14:41 UTC
    use quotemeta to escape special regex chars on the start and end strings:
    my $StartTag = quotemeta("Last+"); my $EndTag = quotemeta("+some");
Re: Regular Expression problem when Extracting Start\ VALUE \End
by injunjoel (Priest) on Sep 30, 2005 at 17:15 UTC
    Greetings all,
    In the spirit of TIMTOWTDI my suggestion is to use \Q\E.
    sub getInfoFromSingleLineMultiLineFile { #args and file opening... while($line = <IFH>){ if($line =~ /\Q$stag\E(.*?)\Q$etag\E/){ push(@wanted_substrings,$1); } } return @wanted_substrings; }
    That should get you what you want.

    -InjunJoel
    "I do not feel obliged to believe that the same God who endowed us with sense, reason and intellect has intended us to forego their use." -Galileo

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://496428]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (4)
As of 2024-04-19 14:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found