http://qs321.pair.com?node_id=11118617


in reply to Re: Case insensitive string comparison
in thread Case insensitive string comparison

I agree with matching against a particular field rather than against the entire string, and with using a character class rather than several regexes | matches in tandem.

I have some comments regarding implementation details. I'm forced to admit, however, that because I don't really know DAN0207's requirements, these comments may be meaningless. That said, I forge ahead.

Firstly, the  /SMS[1HI]/i match against the extracted $SMSfield field allows a field like 'xSMSIx' to be accepted. This match could benefit from anchor assertions:  / \A SMS [1HI] \z /xi rejects this field.

Secondly, I find the use of the global  /i flag problematic. In the OPed code statement
    $$blk_ref = 'SMSblk' if $$blk_ref =~ /SMSi/i || ... || $$blk_ref =~ /SMS1/;
the  /i modifier is only present in matches with an  i I h H suffix, not with the numeric suffix. This suggests (and again, I'm only guessing) that the 'SMS' subfield of the field in question should not be matched case-insensitively. If that's so, a match of
    / \A SMS [1hHiI] \z /x
(which I personally prefer) or
    / \A SMS (?i) [1HI] \z /x
will reject the 'SmsI' field and all like it.


Give a man a fish:  <%-{-{-{-<

Replies are listed 'Best First'.
Re^3: Case insensitive string comparison
by Marshall (Canon) on Jun 29, 2020 at 06:59 UTC
    Good points.
    I didn't read too much into the OP's use of the /i modifier because when I saw: $$blk_ref =~ /SMSh/i || $$blk_ref =~ /SMSH/i that lead me to believe that perhaps the OP doesn't really understand what /i does. So I gave an example where you have to rely upon the /i operation working. Having said that, in my own code I probably would have used your character set [1hHiI] which explicitly enumerates the possibilities because this is just H and I. If there were say 10 options, all with lower and uppercase versions, I'd do it more like I showed in my example in an attempt to avoid missing one possibility.

    I actually did consider the use of anchors. I thought that narrowing the focus to the field of interest would be "good enough". We don't know where this csv data comes from. I suppose that this could potentially come from some spreadsheet or other program which might add "" marks even where not required (but allowed). In that case, something like /^SMS/ would fail.

    I think it is highly likely that this data comes from another program rather than from user input. In cases like that, I often write regex'es that allow more matches than a very rigid interpretation because the computer won't "fumble finger" in an extraneous character. All of these types of decisions come down to the exact application which we just don't know.

    Overall I think this is a good thread. Although I do wish that the OP had provided more code to put his problem into a wider context. The Monks demonstrated some new points for the OP to consider along with adequate explanations. I hope that the OP reads all this stuff and decides what is right for his application.

      Thank you very much for all the replies.To be more clear, now i am working on the below line of code to get the output.I am trying to write some if condition here so that SGWa-i is renamed to SGWa-i_LOWCASE and SGWA-I is renamed to SGWA-I_UPCASE.Sample data is provided below,Earlier i provided the data file with values but here i am providing the basic sketch file which is supposed to be referred

      Sample data

      sgw sketch sgwpts1 format EMS,SGW1,%date%,%time%,%sgw-vpnname%,%sgw-vp +nid%,%sgw-servname%,%sgw-servid% sgw sketch sgwptsh format EMS,SGWh,%date%,%time%,%sgw-vpnname%,%sgw-vp +nid%,%sgw-servname%,%sgw-servid% sgw sketch sgwptsi format EMS,SGWi,%date%,%time%,%sgw-vpnname%,%sgw-vp +nid%,%sgw-servname%,%sgw-servid% sgw sketch, sgwptsH format, EMS,SGWH,%date%,%time%,%sgw-vpnname%,%sgw- +vpnid%,%sgw-servname%,%sgw-servid% sgw sketch sgwptsI format EMS,SGWI,%date%,%time%,%sgw-vpnname%,%sgw-vp +nid%,%sgw-servname%,%sgw-servid%
      Lines of code i am writing
      sub load_sketch { my ($sketch_file) = @_; my %all_sketches = (); open(DAT, $sketch_file) || die("Could not open file $sketch_file!" +); my @lines = <DAT>; close(DAT); foreach (@lines) { s/[\r\n\s%]+//g; my @all_columns = split(',', $_); $all_sketches{ uc($all_columns[1]).'STAT' } = [ @all_columns ] +; # Output file will have SGWSTAT in its name } return %all_sketches; } 1; # tells perl that the package is ready to run
      Please help me in the above code on how to rename or please suggest if there are better ways to handle it here

        Your sample data seems inconsistent and the code is therefore confusing (update: to me :).

        Firstly, it seems as if the name of the sample data file is being passed to load_sketch() as the $sketch_file string. Is this true?

        Secondly, most lines of the sample data are in the format
            sgw sketch sgwpts1 format EMS,SGW1,%date%,%time%,%sgw-vpnname%,%sgw-vpnid%,%sgw-servname%,%sgw-servid%
        whereas line 4 is
            sgw sketch,sgwptsH,format EMS,SGWH,%date%,%time%,%sgw-vpnname%,%sgw-vpnid%,%sgw-servname%,%sgw-servid%
        (commas are in different columns). Executing the statement
            my @all_columns = split(',', $_);
        on the sample data will produce substrings with very different formats in $all_columns[1].

        Please add an update (see How do I change/delete my post?) to either confirm that the originally posted sample data is in fact correct, or else fix the sample data. (Of course, please cite any updates/changes/additions/corrections; please see How do I change/delete my post?)


        Give a man a fish:  <%-{-{-{-<