Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.

string matching

by amitvalia (Novice)
on Jan 26, 2017 at 21:55 UTC ( [id://1180402] : perlquestion . print w/replies, xml ) Need Help??

amitvalia has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to match a description field from an external file and pick records that match certain patterns in that field. The piece that is baffling me is this. ...
if ($desc =~ /;FTP*:*/) { $txn_type = 4 ; print "Desc before changing: $desc \n"; .... }
What I was expecting this to do is match fields like

;FTP : Foreign Tax Withheld

;FTP : File Transfer to server etc.

But it's picking up desc that is ;FTR :Foreign Tax Reclaim

Here's the output of the print statement in my code above

Desc before changing: ;FTR :Foreign Tax Reclaim (BASFY)

Desc before changing: ;FTR :Foreign Tax Reclaim (SIEGY)

Desc before changing: ;FTR :Foreign Tax Reclaim (SIEGY)

Desc before changing: ;FTR :Foreign Tax Reclaim (DDAIY)

How is /;FTP*:*/ matching the lines above? What am I doing wrong here?

Replies are listed 'Best First'.
Re: string matching
by Paladin (Vicar) on Jan 26, 2017 at 22:18 UTC
    In a regex, * means to match 0 or more of the character right before it, so the regex /;FTP*:*/ means the following:
    Match ;
    Match F
    Match T
    Match 0 or more of P
    Match 0 or more of :
    ;FTR :Foreign Tax Reclaim does indeed match ;FT followed by 0 P followed by 0 ;. If you mean to match 0 or more of any character, you want to use .*. So your regex becomes /;FTP.*:.*/.

    Although to be more correct, you probably don't mean "match any amount of any character", you mean "match any amount of anything that isn't a :". Which would be /;FTP[^:]*:.*/

Re: string matching
by hippo (Bishop) on Jan 26, 2017 at 23:31 UTC

    The short answer is that you have an asterisk where you should have a space. So your if test might become:

    if ($desc =~ /;FTP : /)

    The longer answer is that if you are searching for a fixed string like this it is usually both more efficient and less error prone to use index instead of a regex match in the first place. eg:

    if (index ($desc, ';FTP : ') > -1)

    If you only want to trigger when $desc starts with that string (rather than containing it anywhere) then look for the index value being precisely zero.

Re: string matching
by stevieb (Canon) on Jan 26, 2017 at 22:49 UTC

    Because this:


    ...says match any line that has a ; followed by FT followed by P or not P followed up with a :. Essentially, you're being too greedy.

    Try something like:


    ...which says match ;FTP anywhere in the line, followed by anything non-greedily (.*?) up until it finds a : (you have whitespace that the .*? registers), followed by anything, greedy.

Re: string matching
by Laurent_R (Canon) on Jan 27, 2017 at 07:14 UTC
Re: string matching
by amitvalia (Novice) on Jan 27, 2017 at 21:11 UTC
    The reason I had to do that, be greedy, is because the only consistent feature of the string I'm picking up is that it has ;FTP followed by some characters depending on the trade followed by a : and then a few more words.

    So my goal was to match ;FTP<anything else>:<anything else>.

    I think I know what changes to make to my code now.

    Thank you all.