Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked

Re: dumb regex question

by Nkuvu (Priest)
on Apr 07, 2009 at 00:10 UTC ( #755897=note: print w/replies, xml ) Need Help??

in reply to dumb regex question

I'd change the regex to exclude quotes, rather than match everything: m,"?(/[^"]*)"?,

Test script (including some lines where I tried to break the match):

#!/usr/bin/perl use strict; use warnings; while (my $line = <DATA>) { chomp $line; if ($line =~ m,"?(/[^"]*)"?,) { print "Line matched: $line ($1)\n"; } else { print "Line didn't match: $line\n"; } } __DATA__ "/moreIters 10" "/bootMe any text here" /fewIter /some stuff here "/albatross" foo bar baz monkeys leprechauns /not monkeys /gnomes "not leprechauns though"


Line matched: "/moreIters 10" (/moreIters 10) Line matched: "/bootMe any text here" (/bootMe any text here) Line matched: /fewIter (/fewIter) Line matched: /some stuff here (/some stuff here) Line matched: "/albatross" foo bar baz (/albatross) Line didn't match: monkeys Line matched: leprechauns /not monkeys (/not monkeys) Line matched: /gnomes "not leprechauns though" (/gnomes )

Replies are listed 'Best First'.
Re^2: dumb regex question
by linuxfan (Beadle) on Apr 07, 2009 at 00:38 UTC
    I just noticed that this regex fails for the following input:
    /gnomes more data here
    My expected string is only /gnomes, whereas it matches everything upto end of the line.. Any idea on how to fix this?


      if (m{"(/[^"]+)"|(/\S+)}) { my $match = defined $1 ? $1 : $2; ... }
      Or whatever's appropriate instead of \S.

      Update: Fixed slashes

        ...yeah. Or that. Although the regex as given needs a tweak, with embedded slashes in there.

        If it wasn't late in the day on a Monday, I might have come up with a regex that would work. Maybe. But at least the Text::CSV_XS solution is not totally wrong.

      With that additional qualification, it will get a bit more tricky. My first thought was to add a space to the character class: m,"?(/[^" ]*)"?,

      But that doesn't work because it won't care that it has found a space inside or outside of a quote, and will stop the regex. Meaning it would capture just "/bootMe" from the line "/bootMe any text here".

      I'd suggest looking into a module like Text::xSV or Text::CSV_XS and setting the delimiter to spaces. Then reject any entry that doesn't have a leading slash. This means dropping the regex entirely.

      Something like:

      #!/usr/bin/perl use strict; use warnings; use Text::CSV_XS; my $csv = Text::CSV_XS->new ({sep_char => ' '}); while (my $line = <DATA>) { chomp $line; # See perldoc Text::CSV_XS for warnings # about this approach with possible embedded # newlines: my $status = $csv->parse($line); my @fields; if ($status) { @fields = $csv->fields(); } else { warn "Problem parsing $line\n"; } for my $field (@fields) { print "Captured ($field) from $line\n" if $field =~ m!^/!; } } __DATA__ "/moreIters 10" "/bootMe any text here" /fewIter /some stuff here "/albatross" foo bar baz monkeys leprechauns /not monkeys /gnomes "not leprechauns though" /gnomes more data here

      Which gives the output:

      Captured (/moreIters 10) from "/moreIters 10" Captured (/bootMe any text here) from "/bootMe any text here" Captured (/fewIter) from /fewIter Captured (/some) from /some stuff here Captured (/albatross) from "/albatross" foo bar baz Captured (/not) from leprechauns /not monkeys Captured (/gnomes) from /gnomes "not leprechauns though" Captured (/gnomes) from /gnomes more data here

Re^2: dumb regex question
by linuxfan (Beadle) on Apr 07, 2009 at 00:24 UTC
    Thank you so much. This is exactly what I wanted.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://755897]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (2)
As of 2022-12-04 21:41 GMT
Find Nodes?
    Voting Booth?

    No recent polls found