Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
Previous replies have given you a working solution, but in case it helps to know how the OP code went wrong:
while ($line=<FILE>){ $hit1= $line=~ /^(>data_\d+\s+GENEID_\d+.*\n.*)/s; print OUT1 "$hit1\n"; $hit2= $line=~ /^(>data_\d+\s+PROTID_\d+.*\n.*)/s; print OUT2 "$hit2\n"; }
The problems are:
  • The while loop is reading one line at a time, and printing to both output files on every iteration.

  • The input is structured as multi-line records, and the criteria for selecting the correct output file is only present on the first line of each record, so you would need to maintain a "state" variable (or use a variable for the output file handle, and assign it properly on reading the first line of each multi-line record) -- but your loop is pretending that every line contains the criteria for deciding which output to use.

  • You are using capturing parens in your regex match, but assigning the result to a scalar variable in a scalar context, which means the value assigned will be the number of captured strings (i.e. 1 or 0, depending on which line was just read). Note the following difference between assigning the match return in a scalar context ($c) versus a list context (@m, or $m in parens)

    $str = "text with some pattern in it"; $c = $str =~ / (some pattern) /; # sets $c to the numeric value " +1" @m = $str =~ / (some pattern) /; # assigns "some pattern" as sole + element of @m ( $m ) = $str =~ / (some pattern) /; # sets $m to "some pattern"
The result of those points taken together is that both your output files had the same line count as your input file, and the content of those lines is either "1" or "0". (When you said your "results are giving only the headers...", I suspect that you were looking at data that was not created by the code you posted.)

In reply to Re: pattern matching to separate data by graff
in thread pattern matching to separate data by patric

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others wandering the Monastery: (6)
    As of 2021-04-22 22:37 GMT
    Find Nodes?
      Voting Booth?

      No recent polls found