http://qs321.pair.com?node_id=1077941


in reply to Re^2: Help to build a REGEXP (BioPerl)
in thread Help to build a REGEXP

It's supposed to be for an assignment and we must use REGEXPS...

That's akin to being asked to do a gainer off a diving board when just learning to swim. Especially so if you're in bioinformatics. From my experience, it would be more pedagogically sound to first learn to proficiently wield the (BioPerl) tools, then learn how to forge such tools...

If you must, however, use a regex in your script, perhaps the following will be helpful:

use strict; use warnings; use Bio::SeqIO; my $filename = 'sequences.gen'; my $stream = Bio::SeqIO->new( -file => $filename, -format => 'GenBank' ); while ( my $seq = $stream->next_seq() ) { my $trans = $seq->translate(); print $trans->seq(), "\n"; } my $string = 'This script uses a regex.'; $string =~ s/uses/doesn't use/; print $string;

Replies are listed 'Best First'.
Re^4: Help to build a REGEXP (BioPerl)
by erix (Prior) on Mar 12, 2014 at 00:20 UTC

    Nice, but that doesn't work (because the text used by the OP does not constitute a valid genbank record).

    That could be worked around by getting the complete record, I guess. But the wrath of the teacher needs to be deflected too. Perhaps make the regex a (quoted) multiline capture? :)

      Yes, I said that, easily fixed.

      Still, it might be a good idea to do the obviously intended regexp multiline capture (intended by the teacher), especially as you included a regex line already.

      (oops, replied to myself... ah well, you get the idea)

        Yes, I do think you've made a good point.

      Nice, but that doesn't work (because the text used by the OP does not constitute a valid genbank record).

      It's a snip from a valid genbank record, and it parses beautifully when simply pasted into a full record. The OP said, "I have this part of a file that I want to match..." The "part" is the snip provided. The assignment is pedagogically problematic as it is (IMO), but it would be even worse to require the raw parsing of an incomplete genbank record.