http://qs321.pair.com?node_id=11116910


in reply to Re: Regex with Backslashes (updated)
in thread Regex with Backslashes

Thanks for taking the time to point out the issues with my presentation of strings, which has caused confusion.

If I post again I will take your advice on the presentation and the use of quoting.

Having considered the problem I originally posted, I have decided that I should take a slightly different approach which I touched on in a response to another monk, and my data will use two commas where a non-splitting comma is required and two backslashes where a backslash is required. This changes the regex requirements substantially.

My data would look like this: 1,Text,,with,,commas,X,99 and my regex is: my $regex = qr /(?<!,),(?!,)|(?<=,,),/;

This is working in my script with this output:

1 Text,,with,,commas X 99

Thank you to all who responded.

Maybe I will take the plunge and post my 'lcd daemon with battery meter script' once it is completed. Not exactly an Earth-shattering piece of work, but quite fun.

Replies are listed 'Best First'.
Re^3: Regex with Backslashes
by haukex (Archbishop) on May 18, 2020 at 19:51 UTC

    If you have control over the format the string is generated in, then why not use a well-established format like CSV? The defaults of Text::CSV are that fields are separated by commas, if a field contains commas (or whitespace), it is surrounded by double quotes, and if a double quote needs to be escaped, then it is doubled up. For example:

    use warnings; use strict; use Text::CSV; my $data = <<'END'; 1,"Text,with,commas and ""quotes""",X,99 END open my $fh, '<', \$data or die $!; my $csv = Text::CSV->new({ binary=>1, auto_diag=>2 }); while ( my $row = $csv->getline($fh) ) { print "<<$_>>\n" for @$row; } $csv->eof or $csv->error_diag; close $fh; __END__ <<1>> <<Text,with,commas and "quotes">> <<X>> <<99>>
    Maybe I will take the plunge and post my 'lcd daemon with battery meter script' once it is completed.

    Yes, that'd be interesting!

      The comma separated data is entered by a user and I want to keep it as simple as possible, so extra quoting is something I want to avoid.

      I felt that escaped commas and backslashes was just about OK, or two commas and two backslashes also just about OK, but the more complex it gets the harder it is for the user. I am happy to add extra load to the script to help the user.

      I have included some code to handle simple input errors such as a space inserted in a command: '-- text' instead of '--text'.

        The comma separated data is entered by a user and I want to keep it as simple as possible, so extra quoting is something I want to avoid.

        Ok, I see, although there are of course other alternatives. Like for example, it may not be so difficult on the user if you require all fields to be quoted, that's one less rule for the user to remember. In the end, it'll be up to you to decide what is easiest for the user and for the implementation. I agree with AnomalousMonk's point that doubling up the commas leads to ambiguity, so if you really don't like the quoting, perhaps the backslashes are not such a bad idea (and the only issue was really the misunderstandings about the format); the parser I showed here has some pretty simple rules: commas are field separators, backslashes and commas can be escaped with backslashes, plus the support for the \x... sequence.

Re^3: Regex with Backslashes (updated)
by AnomalousMonk (Archbishop) on May 18, 2020 at 19:08 UTC

    How does that work if you have a null (absolutely empty) comma-separated field? Can you have such fields in your application? Why not just split the original non-escaped commas a la this or some similar approach if you do not want to use a module?


    Give a man a fish:  <%-{-{-{-<

      I have code already that handles absolutely empty comma-separated fields, even if the number of fields does not match the anticipated number of fields, so I should be OK with that. But thanks for pointing out a potential point of failure.