Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Negative Lookbehind question

by crusty_collins (Friar)
on Nov 11, 2015 at 17:40 UTC ( [id://1147482]=perlquestion: print w/replies, xml ) Need Help??

crusty_collins has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to get my mind around negative look behinds. This is something that I am working on but did not write.

I think that it looks for the number of spaces to remove the right txt. If I use the script as written it does NOT grab the STATE ZIPCODE section.

If I change

$addr =~ s/(?<![|\s])\s{25,}[^|]+//g; # extra right-text
to
$addr =~ s/(?<![|\s])\s{27,}[^|]+//g; # extra right-text
then i get the STATE and ZIPCODE

this is the example script

use strict; use warnings; my $addr = '| + Note 00 +001| FIRST LAST NAME| ADDRESS 1 + Interest Rate + 5.450000| CITY STATE ZIPCODE| + YTD Interest $4,442.64| Total Payment Amount + $886.00| + Escrow Portion $ +344.49|'; print "address to parse : \n $addr \n"; $addr =~ s/\|\s{25,}[^|]+//g; # rm spaces left $addr =~ s/(?<![|\s])\s{27,}[^|]+//g; # extra right-text $addr =~ s/\s*\|\s*/|/g; $addr =~ s/\|{2,}/|/g; $addr =~ s/\s+\|$//; $addr =~ s/\s+/ /g; # use up to 6 of last lines for addr $addr = join('|', (split('\|', "||||||$addr"))[-6..-1]); $addr =~ s/^\|+//; print "Result addr : \n $addr \n";
Output
c:\Users\collinsc\dev>perl lookBehind.pl Address to parse : | + Note 00001| +FIRST LAST NAME| ADDRESS 1 Interest Rat +e 5.450000| CITY + STATE ZIPCODE| + YTD Interest $4,442.64| Total Payment Amount + $886.00| + Escrow Portion $ +344.49| Result addr : FIRST LAST NAME|ADDRESS 1|CITY STATE ZIPCODE
Question If the spaces are outside of the negative look behind section how is this working?

Replies are listed 'Best First'.
Re: Negative Lookbehind question
by blindluke (Hermit) on Nov 11, 2015 at 21:39 UTC

    The code line that you are analyzing removes the match - as you may already found out, s/left/right/g; matches left, and replaces it with right. In your case, the right side expression is an empty string, so the match is effectively removed.

    The tricky thing here is that the look behind is not part of the match. So, this line:

    $addr =~ s/(?<![|\s])\s{25,}[^|]+//g; # extra right-text

    finds a block of at least 25 spaces in a row followed by some characters that are not | (this is our match), but only if such block is preceded by something that is not | or whitespace. Then, this block is removed.

    In the sample input you provided, there are 25 spaces between CITY and STATE ZIPCODE. So we have a match, and this line removes the whole match - those 25 spaces right after CITY, together with STATE ZIPCODE, up until the | character. That is why you don't "grab the STATE ZIPCODE section" - because this nasty little line grabs it earlier, and removes it. When you extend the match criteria to 27 spaces, STATE ZIPCODE is no longer matching, and is not removed.

    You can try playing around with an online regex tester - http://regex101.com. It could make the way those expressions work a bit more clear. Good luck!

    - Luke

Re: Negative Lookbehind question
by ExReg (Priest) on Nov 11, 2015 at 21:31 UTC

    The first expression deletes any group of spaces that are at least 25 spaces long that are preceded by a |. It also continues deleting anything until it reaches the next |. After the first expression you are left with

    '| FIRST LAST NAME| ADDRESS 1 + Interest Rate + 5.450000| CITY STATE ZIPCODE| + Total Payment Amount +$886.00|';

    The next expression will remove any set of 25 or 27 or more spaces that are not immediately preceded by a |. It will continue deleting spaces and anything else until it reaches another |. It will delete the spaces after ADDRESS 1, since there are more than 25 of them and keep deleting up to and including 5.450000. Then it will look for another match (/g). There are 25 spaces between CITY and STATE. Those spaces are not immediately preceded by a |. So it will delete those spaces and keep deleting STATE and ZIPCODE until it finds another |. By increasing the 25 to 27, you keep it from matching in this case.

Re: Negative Lookbehind question
by crusty_collins (Friar) on Nov 12, 2015 at 17:41 UTC
    Thanks for the answers! I think I have it know.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1147482]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (10)
As of 2024-03-28 12:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found