http://qs321.pair.com?node_id=986587


in reply to Re^2: Regex Extraction Help
in thread Regex Extraction Help

You make a good point about splitting on a record separator within possibly malformed records. Based upon the OP's regex, it appears that the pattern's stable--with one space after the semi-colon. However, we can ask split to 'test' the format of the input, like this:

my $info = (split /\s*;\s*/, $dat)[1];

This will return the info the OP wants, whether there are spaces before or after the semi-colon, or not.

And within a regex on the OP's data:

use Modern::Perl; my $dat = 'DR Pfam; PF00070; Pyr_redox; 2.'; $dat =~ /;\s*(\w+)\s*;.+;/ and say $1; #prints PF00070

It was a good call to address this issue...

Replies are listed 'Best First'.
Re^4: Regex Extraction Help
by Flexx (Pilgrim) on Aug 09, 2012 at 22:08 UTC
    « Based upon the OP's regex, it appears that the pattern's stable--with one space after the semi-colon »

    Oh indeed, my "warning" was meant more like a general tip, I didn't just mean this particular example. Just meant to say that it's a difference in how split vs if(m//) with some rather "strict" regexp typically result in a different level of defensiveness of the code. Again, I mean just typically. I mean hey, "just use split" would've been first answer, too. But you wrote that already, so I had to come up with something nitpicking. ;)

    « However, we can ask split to 'test' the format of the input »

    Umm... ok, you wrote 'test' in quotes, so alright... ;)

    Sure, you can combine the split and trim operation, but still, this split would happily work on any input you throw at it (including undef, with a warning, though). It won't tell you (by not even matching) that your input looks a bit strange there.

    Now, again, I am not so much talking about the OP's concrete problem, but was trying to educate a bit on what method to use when, since his usage of \d\d\d\d\d instead of \d{5} suggested that regexen ain't something he works with since years (No offence meant.)

    So long,
    Flexx

      You make more good points, and am glad you offered the "general tip," as it helps with developing good programming practices. Anticipating and coding for exceptions can (and does) save many headaches...

Re^4: Regex Extraction Help
by invaderzard (Acolyte) on Aug 10, 2012 at 14:45 UTC
    Cheers, Kenosis, Flexx and Ratazong for your help!

    Kenosis, your method really worked like a charm for mine, but Kudos to Flexx and Ratazong for giving me a better insight on how to settle regex in perl.

    Thanks again!

      Glad it worked for you, invaderzard!