http://qs321.pair.com?node_id=11116867


in reply to Re: Regex with Backslashes
in thread Regex with Backslashes

Thanks for your reply. The data that is received consists of a single quoted string of characters. This '1,//Text//Text,C,150' is 22 characters long, as shown when I use print length($cmdValues) . "\n";, so double backslash is really two characters in the data I am processing. Later in the script if a backslash is followed by another backslash it is replaced with a single character that is displayed on an lcd screen.

The hex code was meant to mean two hex characters, rather than literally two characters, in the form 0xFF, sorry for any confusion.

I will use qr for all further work, as suggested.

As to

"1,Something\\,\\\\text\\\\text\\0x2B\\\\,X,99"

, no, the input data is

'1,Something\,\\text\\text\\0x2B\\,X,99'

The expected outcome is an array containing the following:

1 Something\,\\text\\text\\0x2B\\ X 99

The string should be split on every comma and every comma preceded by two backslash characters, but not on a comma preceded by a single backslash.

I have looked at the quote-like operators and had already been through an interesting discussion starting at Ways of quoting

..."in my $regex = '(?<!\\\),';, the string actually only contains two backslashes because '\\' becomes \ but '\)' remains as \)" ... Using Data::Dumper, I can now see (I think) why my original regex worked:

The string I was splitting looks like this

my $text = '1,This\, is a problem->\\,B,99';

but when printed with Dumper it looks like this

$VAR1 = "1,This\\, is a problem->\\,B,99";

So both '\,' and '\\,' appear the same during processing. Is there a way I can stop '\,' being processed as '\\,'.

I may have to go down the route of a custom parser as you have suggested

Replies are listed 'Best First'.
Re^3: Regex with Backslashes
by AnomalousMonk (Archbishop) on May 17, 2020 at 21:03 UTC
    The string I was splitting looks like this

    my $text = '1,This\, is a problem->\\,B,99';

    but when printed with Dumper it looks like this

    $VAR1 = "1,This\\, is a problem->\\,B,99";

    So both '\,' and '\\,' appear the same during processing. Is there a way I can stop '\,' being processed as '\\,'.

    Both Data::Dumper, which is core, and Data::Dump, which I prefer, but it's not core, represent a string in the form of the double-quote constructor needed to reproduce that string, not as the "actual" string. I think this is one source of your confusion.

    I think the critical point you're missing is that there is a fundamental difference between a single- or double-quoted string constructor, e.g., '...' or "...", and the string that is constructed.

    So both '\,' and '\\,' appear the same during processing.

    No. A string may have one or two or any number of uniquely distinguishable sequential backslashes. The question is how to construct the desired string. Consider

    c:\@Work\Perl\monks>perl -wMstrict -le "my $sq = '\ \\ \\\ \\\\ \\\\\ \\\\\\'; print qq{<$sq> \n}; ;; my $dq = qq{\\ \\\\ \\\\\\}; print qq{>$dq< \n}; " <\ \ \\ \\ \\\ \\\> >\ \\ \\\<
    In a single-quoted string constructor,  \ and  \\ are different representations of the same constructed character. This peculiarity of single-quoted string constructors allows a string so constructed (update: to have a single-quote character in a '...'-quoted string, or) to end in a single-quote or backslash character:
    c:\@Work\Perl\monks>perl -wMstrict -le "my $sqsq = 'abc\''; print qq{<$sqsq> \n}; ;; my $sqbs = 'abc\\'; print qq{>$sqbs< \n}; " <abc'> >abc\<
    (Note that in my code examples, I use  qq{...} as the double-quote "constructor," as I will call it in this reply, due to peculiarities of the Windoze command line interpreter.)

    It's possible to (fairly easily) split the double-quoted string you give as an example and get your desired result:

    c:\@Work\Perl\monks>perl -wMstrict -le "my $s = qq{1,Something\\,\\\\text\\\\text\\0x2B\\\\,X,99}; print qq{<$s> \n}; ;; my @ra = split qr{ (?<! (?<! \\) \\) , }xms, $s; print qq{[$_]} for @ra; " <1,Something\,\\text\\text\0x2B\\,X,99> [1] [Something\,\\text\\text\0x2B\\] [X] [99]
    The string is split on the pattern "comma that is not preceded by a backslash that is not preceded by a backslash." This sort of tricksy, double-negative logic is part of the reason that a well-tested module like Text::CSV is so often and enthusiastically recommended for this seemingly-simple parsing application. (I hope this module or one like it is what you're referring to when you write about going "the route of a custom parser.")


    Give a man a fish:  <%-{-{-{-<