Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re: Regex with Backslashes (updated)

by haukex (Archbishop)
on May 17, 2020 at 21:54 UTC ( [id://11116877]=note: print w/replies, xml ) Need Help??


in reply to Regex with Backslashes

To avoid further confusion, I suggest we take a step back and agree on how to communicate the strings appropriately. I think what is causing confusion here is that you are using single quotes to show strings*, and we, being Perl programmers, are assuming that Perl's rules for single-quoted string literals apply, but based on what you've written I don't think that's the definition you're using. So:

  1. When you write 'foo \x \\ \' \ bar', due to Perl's rules for single-quoted strings (Quote Like Operators: "A backslash represents a backslash unless followed by the delimiter or another backslash, in which case the delimiter or backslash is interpolated."), this string is actually the 16-character string «foo \x \ ' \ bar», as you can see when you execute the Perl code print 'foo \x \\ \' \ bar', "\n";.
    • Note: I'm using these special quoting characters here to make it clear that I don't mean Perl's quotes. In PerlMonks' HTML, what I've written is &laquo;<c>my string here</c>&raquo;. This is not an established standard, just something I'm doing in this node to differentiate between "", '', and "the characters the string literals actually represent".
  2. When you write "foo \x22 \\ \' \" bar", due to Perl's rules (same link as above), this string is actually the 15-character string «foo " \ ' " bar» (try print "foo \x22 \\ \' \" bar", "\n";). This is the format that tools like Data::Dump and Data::Dumper (with $Data::Dumper::Useqq=1; turned on, which I always recommend) will output. Because of this, I suggested you use this format to show us what strings you're working with.
  3. When you want to show us a string without any quoting/escaping/interpolation, then don't use '''s or ""'s. Just show us the string in PerlMonks' <code> tags, as in: My input is the 14-character string <code>my string here</code>., optionally add some special quotes like I showed above, and tell us the actual length of the string so we can verify.
    • * Update: Another option is heredocs, as tybalt89 showed here; just make sure to put the heredoc marker into single quotes, as in my $str = <<'END'; ... END, to disable interpolation inside the heredoc. This might be useful because from your reply here, I seem to understand the single quotes are actually part of the string, which would also help explain the confusion we've been having. (Note the other quoting methods still work too, as in '\'...\'' and "'...'".)
  4. When you want to show us a regex, show us the Perl code and use a qr// operator, don't use quotes (and don't use qr'' either). Again, this is the least ambiguous format. (See also Regexp Quote Like Operators.)
  5. If you wanted to be really, really thorough, or there is some real confusion as to what your inputs are, then you could also show us the output of Devel::Peek's Dump(), or, for files, show us a hex dump of the file: On Linux, either hexdump -C filename or od -tx1c filename (see also).

I think once we've got that cleared up and we understand what your actual strings are, we'll be able to help much more effectively :-)

Replies are listed 'Best First'.
Re^2: Regex with Backslashes (updated)
by anita2R (Scribe) on May 18, 2020 at 18:52 UTC

    Thanks for taking the time to point out the issues with my presentation of strings, which has caused confusion.

    If I post again I will take your advice on the presentation and the use of quoting.

    Having considered the problem I originally posted, I have decided that I should take a slightly different approach which I touched on in a response to another monk, and my data will use two commas where a non-splitting comma is required and two backslashes where a backslash is required. This changes the regex requirements substantially.

    My data would look like this: 1,Text,,with,,commas,X,99 and my regex is: my $regex = qr /(?<!,),(?!,)|(?<=,,),/;

    This is working in my script with this output:

    1 Text,,with,,commas X 99

    Thank you to all who responded.

    Maybe I will take the plunge and post my 'lcd daemon with battery meter script' once it is completed. Not exactly an Earth-shattering piece of work, but quite fun.

      If you have control over the format the string is generated in, then why not use a well-established format like CSV? The defaults of Text::CSV are that fields are separated by commas, if a field contains commas (or whitespace), it is surrounded by double quotes, and if a double quote needs to be escaped, then it is doubled up. For example:

      use warnings; use strict; use Text::CSV; my $data = <<'END'; 1,"Text,with,commas and ""quotes""",X,99 END open my $fh, '<', \$data or die $!; my $csv = Text::CSV->new({ binary=>1, auto_diag=>2 }); while ( my $row = $csv->getline($fh) ) { print "<<$_>>\n" for @$row; } $csv->eof or $csv->error_diag; close $fh; __END__ <<1>> <<Text,with,commas and "quotes">> <<X>> <<99>>
      Maybe I will take the plunge and post my 'lcd daemon with battery meter script' once it is completed.

      Yes, that'd be interesting!

        The comma separated data is entered by a user and I want to keep it as simple as possible, so extra quoting is something I want to avoid.

        I felt that escaped commas and backslashes was just about OK, or two commas and two backslashes also just about OK, but the more complex it gets the harder it is for the user. I am happy to add extra load to the script to help the user.

        I have included some code to handle simple input errors such as a space inserted in a command: '-- text' instead of '--text'.

      How does that work if you have a null (absolutely empty) comma-separated field? Can you have such fields in your application? Why not just split the original non-escaped commas a la this or some similar approach if you do not want to use a module?


      Give a man a fish:  <%-{-{-{-<

        I have code already that handles absolutely empty comma-separated fields, even if the number of fields does not match the anticipated number of fields, so I should be OK with that. But thanks for pointing out a potential point of failure.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11116877]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (3)
As of 2024-04-25 22:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found