Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

Re^3: Regex Extraction Help

by Kenosis (Priest)
on Aug 09, 2012 at 19:05 UTC ( #986587=note: print w/replies, xml ) Need Help??

in reply to Re^2: Regex Extraction Help
in thread Regex Extraction Help

You make a good point about splitting on a record separator within possibly malformed records. Based upon the OP's regex, it appears that the pattern's stable--with one space after the semi-colon. However, we can ask split to 'test' the format of the input, like this:

my $info = (split /\s*;\s*/, $dat)[1];

This will return the info the OP wants, whether there are spaces before or after the semi-colon, or not.

And within a regex on the OP's data:

use Modern::Perl; my $dat = 'DR Pfam; PF00070; Pyr_redox; 2.'; $dat =~ /;\s*(\w+)\s*;.+;/ and say $1; #prints PF00070

It was a good call to address this issue...

Replies are listed 'Best First'.
Re^4: Regex Extraction Help
by Flexx (Pilgrim) on Aug 09, 2012 at 22:08 UTC
    « Based upon the OP's regex, it appears that the pattern's stable--with one space after the semi-colon »

    Oh indeed, my "warning" was meant more like a general tip, I didn't just mean this particular example. Just meant to say that it's a difference in how split vs if(m//) with some rather "strict" regexp typically result in a different level of defensiveness of the code. Again, I mean just typically. I mean hey, "just use split" would've been first answer, too. But you wrote that already, so I had to come up with something nitpicking. ;)

    « However, we can ask split to 'test' the format of the input »

    Umm... ok, you wrote 'test' in quotes, so alright... ;)

    Sure, you can combine the split and trim operation, but still, this split would happily work on any input you throw at it (including undef, with a warning, though). It won't tell you (by not even matching) that your input looks a bit strange there.

    Now, again, I am not so much talking about the OP's concrete problem, but was trying to educate a bit on what method to use when, since his usage of \d\d\d\d\d instead of \d{5} suggested that regexen ain't something he works with since years (No offence meant.)

    So long,

      You make more good points, and am glad you offered the "general tip," as it helps with developing good programming practices. Anticipating and coding for exceptions can (and does) save many headaches...

Re^4: Regex Extraction Help
by invaderzard (Acolyte) on Aug 10, 2012 at 14:45 UTC
    Cheers, Kenosis, Flexx and Ratazong for your help!

    Kenosis, your method really worked like a charm for mine, but Kudos to Flexx and Ratazong for giving me a better insight on how to settle regex in perl.

    Thanks again!

      Glad it worked for you, invaderzard!

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://986587]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (3)
As of 2022-11-28 18:36 GMT
Find Nodes?
    Voting Booth?