Re: Case insensitive string comparison

I have a few comments for you. I will leave the dereferencing code out of my response because my main points have to do with the regex matching part and also I don't really understand what you are doing with your deref of a reference to a scalar.

First I would not match against the whole comma separated line, I would narrow the focus to the field that you are interested in. Below I use a split to get field[1]. Another poster suggested using a boundary condition in the regex for the same intended purpose (making sure you are matching against what you think that you are). We don't know what those other names or id's in the line look like, perhaps one server is "sms1Master" or whatever.

Instead of multiple "or" terms, I would use a character set in this case. This makes it easier for me to see what is going to match or not match. Of course mileage varies.

use strict;
use warnings;

while (<DATA>)
{
    my $SMSfield = (split(',',$_))[1];
    
    if ($SMSfield =~ /SMS[1HI]/i)
    {
       print "Match $SMSfield\n";
    }
    else
    {
       print "No Match $SMSfield\n";
    }  
}   

=prints
Match SMS1
Match SMSh
Match SMSH
Match SMSi
Match SMSI
Match SmsI  **Note this match** I think in your case, this is fine.
No Match SMSx
=cut

__DATA__
SMS,SMS1,20190811,084500,servname,servid,servname1,s1,400,300,300,300,
+300,300
SMS,SMSh,20190811,084500,servname,servid,servname1,s1,700,300,300,300,
+300,300
SMS,SMSH,20190811,084500,servname,servid,servname1,s1,600,300,300,300,
+300,300
SMS,SMSi,20190811,084500,servname,servid,servname1,s1,800,300,300,300,
+300,300
SMS,SMSI,20190811,084500,servname,servid,servname1,s1,500,300,300,300,
+300,300
SMS,SmsI,20190811,084500,servname,servid,servname1,s1,500,300,300,300,
+300,300
SMS,SMSx,20190811,084500,servname,servid,servname1,s1,500,300,300,300,
+300,300
[download]

Comment on Re: Case insensitive string comparison Select or Download Code

Replies are listed 'Best First'.
Re^2: Case insensitive string comparison by AnomalousMonk (Archbishop) on Jun 28, 2020 at 05:21 UTC
I agree with matching against a particular field rather than against the entire string, and with using a character class rather than several ~~regexes~~ \| matches in tandem. I have some comments regarding implementation details. I'm forced to admit, however, that because I don't really know DAN0207's requirements, these comments may be meaningless. That said, I forge ahead. Firstly, the `/SMS[1HI]/i` match against the extracted `$SMSfield` field allows a field like `'xSMSIx'` to be accepted. This match could benefit from anchor assertions: `/ \A SMS [1HI] \z /xi` rejects this field. Secondly, I find the use of the global `/i` flag problematic. In the OPed code statement `$$blk_ref = 'SMSblk' if $$blk_ref =~ /SMSi/i \|\| ... \|\| $$blk_ref =~ /SMS1/;` the `/i` modifier is only present in matches with an `i I h H` suffix, not with the numeric suffix. This suggests (and again, I'm only guessing) that the `'SMS'` subfield of the field in question should not be matched case-insensitively. If that's so, a match of `/ \A SMS [1hHiI] \z /x` (which I personally prefer) or `/ \A SMS (?i) [1HI] \z /x` will reject the `'SmsI'` field and all like it. Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re^3: Case insensitive string comparison by Marshall (Canon) on Jun 29, 2020 at 06:59 UTC
Good points. I didn't read too much into the OP's use of the /i modifier because when I saw: `$$blk_ref =~ /SMSh/i \|\| $$blk_ref =~ /SMSH/i` that lead me to believe that perhaps the OP doesn't really understand what /i does. So I gave an example where you have to rely upon the /i operation working. Having said that, in my own code I probably would have used your character set `[1hHiI]` which explicitly enumerates the possibilities because this is just H and I. If there were say 10 options, all with lower and uppercase versions, I'd do it more like I showed in my example in an attempt to avoid missing one possibility. I actually did consider the use of anchors. I thought that narrowing the focus to the field of interest would be "good enough". We don't know where this csv data comes from. I suppose that this could potentially come from some spreadsheet or other program which might add "" marks even where not required (but allowed). In that case, something like /^SMS/ would fail. I think it is highly likely that this data comes from another program rather than from user input. In cases like that, I often write regex'es that allow more matches than a very rigid interpretation because the computer won't "fumble finger" in an extraneous character. All of these types of decisions come down to the exact application which we just don't know. Overall I think this is a good thread. Although I do wish that the OP had provided more code to put his problem into a wider context. The Monks demonstrated some new points for the OP to consider along with adequate explanations. I hope that the OP reads all this stuff and decides what is right for his application.	[reply] [d/l] [select]
Re^4: Case insensitive string comparison by DAN0207 (Acolyte) on Jun 30, 2020 at 12:09 UTC
Thank you very much for all the replies.To be more clear, now i am working on the below line of code to get the output.I am trying to write some if condition here so that SGWa-i is renamed to SGWa-i_LOWCASE and SGWA-I is renamed to SGWA-I_UPCASE.Sample data is provided below,Earlier i provided the data file with values but here i am providing the basic sketch file which is supposed to be referred Sample data sgw sketch sgwpts1 format EMS,SGW1,%date%,%time%,%sgw-vpnname%,%sgw-vp +nid%,%sgw-servname%,%sgw-servid% sgw sketch sgwptsh format EMS,SGWh,%date%,%time%,%sgw-vpnname%,%sgw-vp +nid%,%sgw-servname%,%sgw-servid% sgw sketch sgwptsi format EMS,SGWi,%date%,%time%,%sgw-vpnname%,%sgw-vp +nid%,%sgw-servname%,%sgw-servid% sgw sketch, sgwptsH format, EMS,SGWH,%date%,%time%,%sgw-vpnname%,%sgw- +vpnid%,%sgw-servname%,%sgw-servid% sgw sketch sgwptsI format EMS,SGWI,%date%,%time%,%sgw-vpnname%,%sgw-vp +nid%,%sgw-servname%,%sgw-servid% [download] Lines of code i am writing `sub load_sketch { my ($sketch_file) = @_; my %all_sketches = (); open(DAT, $sketch_file) \|\| die("Could not open file $sketch_file!" +); my @lines = <DAT>; close(DAT); foreach (@lines) { s/[\r\n\s%]+//g; my @all_columns = split(',', $_); $all_sketches{ uc($all_columns[1]).'STAT' } = [ @all_columns ] +; # Output file will have SGWSTAT in its name } return %all_sketches; } 1; # tells perl that the package is ready to run` [download] Please help me in the above code on how to rename or please suggest if there are better ways to handle it here	[reply] [d/l] [select]
Re^5: Case insensitive string comparison by AnomalousMonk (Archbishop) on Jun 30, 2020 at 17:25 UTC
Re^6: Case insensitive string comparison by DAN0207 (Acolyte) on Jul 01, 2020 at 06:16 UTC


Pathologically Eclectic Rubbish Lister
	PerlMonks