Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: Case insensitive string comparison

by Marshall (Canon)
on Jun 27, 2020 at 16:45 UTC ( [id://11118610]=note: print w/replies, xml ) Need Help??


in reply to Case insensitive string comparison

I have a few comments for you. I will leave the dereferencing code out of my response because my main points have to do with the regex matching part and also I don't really understand what you are doing with your deref of a reference to a scalar.

First I would not match against the whole comma separated line, I would narrow the focus to the field that you are interested in. Below I use a split to get field[1]. Another poster suggested using a boundary condition in the regex for the same intended purpose (making sure you are matching against what you think that you are). We don't know what those other names or id's in the line look like, perhaps one server is "sms1Master" or whatever.

Instead of multiple "or" terms, I would use a character set in this case. This makes it easier for me to see what is going to match or not match. Of course mileage varies.

use strict; use warnings; while (<DATA>) { my $SMSfield = (split(',',$_))[1]; if ($SMSfield =~ /SMS[1HI]/i) { print "Match $SMSfield\n"; } else { print "No Match $SMSfield\n"; } } =prints Match SMS1 Match SMSh Match SMSH Match SMSi Match SMSI Match SmsI **Note this match** I think in your case, this is fine. No Match SMSx =cut __DATA__ SMS,SMS1,20190811,084500,servname,servid,servname1,s1,400,300,300,300, +300,300 SMS,SMSh,20190811,084500,servname,servid,servname1,s1,700,300,300,300, +300,300 SMS,SMSH,20190811,084500,servname,servid,servname1,s1,600,300,300,300, +300,300 SMS,SMSi,20190811,084500,servname,servid,servname1,s1,800,300,300,300, +300,300 SMS,SMSI,20190811,084500,servname,servid,servname1,s1,500,300,300,300, +300,300 SMS,SmsI,20190811,084500,servname,servid,servname1,s1,500,300,300,300, +300,300 SMS,SMSx,20190811,084500,servname,servid,servname1,s1,500,300,300,300, +300,300

Replies are listed 'Best First'.
Re^2: Case insensitive string comparison
by AnomalousMonk (Archbishop) on Jun 28, 2020 at 05:21 UTC

    I agree with matching against a particular field rather than against the entire string, and with using a character class rather than several regexes | matches in tandem.

    I have some comments regarding implementation details. I'm forced to admit, however, that because I don't really know DAN0207's requirements, these comments may be meaningless. That said, I forge ahead.

    Firstly, the  /SMS[1HI]/i match against the extracted $SMSfield field allows a field like 'xSMSIx' to be accepted. This match could benefit from anchor assertions:  / \A SMS [1HI] \z /xi rejects this field.

    Secondly, I find the use of the global  /i flag problematic. In the OPed code statement
        $$blk_ref = 'SMSblk' if $$blk_ref =~ /SMSi/i || ... || $$blk_ref =~ /SMS1/;
    the  /i modifier is only present in matches with an  i I h H suffix, not with the numeric suffix. This suggests (and again, I'm only guessing) that the 'SMS' subfield of the field in question should not be matched case-insensitively. If that's so, a match of
        / \A SMS [1hHiI] \z /x
    (which I personally prefer) or
        / \A SMS (?i) [1HI] \z /x
    will reject the 'SmsI' field and all like it.


    Give a man a fish:  <%-{-{-{-<

      Good points.
      I didn't read too much into the OP's use of the /i modifier because when I saw: $$blk_ref =~ /SMSh/i || $$blk_ref =~ /SMSH/i that lead me to believe that perhaps the OP doesn't really understand what /i does. So I gave an example where you have to rely upon the /i operation working. Having said that, in my own code I probably would have used your character set [1hHiI] which explicitly enumerates the possibilities because this is just H and I. If there were say 10 options, all with lower and uppercase versions, I'd do it more like I showed in my example in an attempt to avoid missing one possibility.

      I actually did consider the use of anchors. I thought that narrowing the focus to the field of interest would be "good enough". We don't know where this csv data comes from. I suppose that this could potentially come from some spreadsheet or other program which might add "" marks even where not required (but allowed). In that case, something like /^SMS/ would fail.

      I think it is highly likely that this data comes from another program rather than from user input. In cases like that, I often write regex'es that allow more matches than a very rigid interpretation because the computer won't "fumble finger" in an extraneous character. All of these types of decisions come down to the exact application which we just don't know.

      Overall I think this is a good thread. Although I do wish that the OP had provided more code to put his problem into a wider context. The Monks demonstrated some new points for the OP to consider along with adequate explanations. I hope that the OP reads all this stuff and decides what is right for his application.

        Thank you very much for all the replies.To be more clear, now i am working on the below line of code to get the output.I am trying to write some if condition here so that SGWa-i is renamed to SGWa-i_LOWCASE and SGWA-I is renamed to SGWA-I_UPCASE.Sample data is provided below,Earlier i provided the data file with values but here i am providing the basic sketch file which is supposed to be referred

        Sample data

        sgw sketch sgwpts1 format EMS,SGW1,%date%,%time%,%sgw-vpnname%,%sgw-vp +nid%,%sgw-servname%,%sgw-servid% sgw sketch sgwptsh format EMS,SGWh,%date%,%time%,%sgw-vpnname%,%sgw-vp +nid%,%sgw-servname%,%sgw-servid% sgw sketch sgwptsi format EMS,SGWi,%date%,%time%,%sgw-vpnname%,%sgw-vp +nid%,%sgw-servname%,%sgw-servid% sgw sketch, sgwptsH format, EMS,SGWH,%date%,%time%,%sgw-vpnname%,%sgw- +vpnid%,%sgw-servname%,%sgw-servid% sgw sketch sgwptsI format EMS,SGWI,%date%,%time%,%sgw-vpnname%,%sgw-vp +nid%,%sgw-servname%,%sgw-servid%
        Lines of code i am writing
        sub load_sketch { my ($sketch_file) = @_; my %all_sketches = (); open(DAT, $sketch_file) || die("Could not open file $sketch_file!" +); my @lines = <DAT>; close(DAT); foreach (@lines) { s/[\r\n\s%]+//g; my @all_columns = split(',', $_); $all_sketches{ uc($all_columns[1]).'STAT' } = [ @all_columns ] +; # Output file will have SGWSTAT in its name } return %all_sketches; } 1; # tells perl that the package is ready to run
        Please help me in the above code on how to rename or please suggest if there are better ways to handle it here

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11118610]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (6)
As of 2024-04-25 15:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found