in reply to Re: dumb regex question
in thread dumb regex question

I just noticed that this regex fails for the following input:
/gnomes more data here
My expected string is only /gnomes, whereas it matches everything upto end of the line.. Any idea on how to fix this?


Replies are listed 'Best First'.
Re^3: dumb regex question
by ikegami (Patriarch) on Apr 07, 2009 at 01:15 UTC
    if (m{"(/[^"]+)"|(/\S+)}) { my $match = defined $1 ? $1 : $2; ... }
    Or whatever's appropriate instead of \S.

    Update: Fixed slashes

      ...yeah. Or that. Although the regex as given needs a tweak, with embedded slashes in there.

      If it wasn't late in the day on a Monday, I might have come up with a regex that would work. Maybe. But at least the Text::CSV_XS solution is not totally wrong.

Re^3: dumb regex question
by Nkuvu (Priest) on Apr 07, 2009 at 01:01 UTC

    With that additional qualification, it will get a bit more tricky. My first thought was to add a space to the character class: m,"?(/[^" ]*)"?,

    But that doesn't work because it won't care that it has found a space inside or outside of a quote, and will stop the regex. Meaning it would capture just "/bootMe" from the line "/bootMe any text here".

    I'd suggest looking into a module like Text::xSV or Text::CSV_XS and setting the delimiter to spaces. Then reject any entry that doesn't have a leading slash. This means dropping the regex entirely.

    Something like:

    #!/usr/bin/perl use strict; use warnings; use Text::CSV_XS; my $csv = Text::CSV_XS->new ({sep_char => ' '}); while (my $line = <DATA>) { chomp $line; # See perldoc Text::CSV_XS for warnings # about this approach with possible embedded # newlines: my $status = $csv->parse($line); my @fields; if ($status) { @fields = $csv->fields(); } else { warn "Problem parsing $line\n"; } for my $field (@fields) { print "Captured ($field) from $line\n" if $field =~ m!^/!; } } __DATA__ "/moreIters 10" "/bootMe any text here" /fewIter /some stuff here "/albatross" foo bar baz monkeys leprechauns /not monkeys /gnomes "not leprechauns though" /gnomes more data here

    Which gives the output:

    Captured (/moreIters 10) from "/moreIters 10" Captured (/bootMe any text here) from "/bootMe any text here" Captured (/fewIter) from /fewIter Captured (/some) from /some stuff here Captured (/albatross) from "/albatross" foo bar baz Captured (/not) from leprechauns /not monkeys Captured (/gnomes) from /gnomes "not leprechauns though" Captured (/gnomes) from /gnomes more data here