Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re^8: How to write to a file?

by davi54 (Sexton)
on Oct 15, 2019 at 17:32 UTC ( [id://11107495]=note: print w/replies, xml ) Need Help??


in reply to Re^7: How to write to a file?
in thread How to count the length of a sequence of alphabets and number of occurence of a particular alphabet in the sequence?

Following is the script that I tried with open:
#!/usr/bin/perl use strict; use warnings; print 'Please enter protein sequence filename: '; chomp( my $prot_filename = <STDIN> ); open my $PROTFILE, '<', $prot_filename or die "Cannot open '$prot_filename' because: $!"; my $report_name = $prot_filename.'_alanine_report'; open my $out_file, '>', $report_name or die "Cannot open '$report_name' because: $!"; my $rx_fasta_header = qr{ \A > }xms; # header pattern - very naive my $rx_sequence = qr{ \A [ACDEFGHIKLMNPQRSTVWY_]+ \z }xms; # plea +se check my $sequence; # sequence accumulator - initially undefined RECORD: while (my $record = <$PROTFILE>) { chomp $record; # get rid of newline, if any # fasta header - process any sequence accumulated so far if ($record =~ $rx_fasta_header) { # do not process if no sequence accumulated so far (i.e., firs +t header) next RECORD if not defined $sequence; my $seq_len = length $sequence; my $seq_n_A = $sequence =~ tr/A//;; do_something_with($sequence, $seq_len, $seq_n_A); undef $sequence; # prepare to process next sequence next RECORD; } # part of sequence - accumulate if ($record =~ $rx_sequence) { $sequence .= $record; # accumulate sub-sequences (records) next RECORD; } die "bad record: '$record'"; # don't know what this is } # show all results print $out_file; close $out_file;
But, it gave me the error: bad record: '' at ../alanine_vs_sequence_length_2.pl line 46, <$PROTFILE> line 42. And the outfile is empty. What am I doing wrong here?

Replies are listed 'Best First'.
Re^9: How to write to a file?
by soonix (Canon) on Oct 15, 2019 at 19:27 UTC

    Your input file seems to contain an empty line, but the script does not provide for it.

    If empty lines are ok, you could define something like
    my $rx_empty = qr{ \A \s* \z }xms; # empty line (or whitespace on +ly)
    and after the chomp
    if ($record =~ $rx_empty) { # ignore next RECORD }
Re^9: How to write to a file?
by AnomalousMonk (Archbishop) on Oct 15, 2019 at 20:35 UTC

    All my coding suggestions are untested.

    ... the error: bad record: '' at ../alanine_vs_sequence_length_2.pl line 46, <$PROTFILE> line 42.

    (soonix has already addressed this.)

    ... the outfile is empty.

    Almost all processing of a sequence is done in the  do_something_with() subroutine, which you do not show. I assume this subroutine exists in your code because the code would not compile if it did not (because you're enabling strictures with strict). Because I cannot see the code, I cannot say why it produces no output.

    Here's another version of your posted code, including do_something_with(). It's not complete because I still don't know exactly what you want to do with each sequence. I've also done some re-organization, and handled the case of a sequence resulting from a final FASTA record (which I had not considered before). I'm using autodie to simplify file I/O error reporting. Again, this is untested.


    Give a man a fish:  <%-{-{-{-<

      I'm sorry I did not include my subroutines in the previous message. But, I am counting the length of the sequence in the subroutine but it's not working yet. I tried your suggestion as follows:
      #!/usr/bin/perl use strict; use warnings; use autodie; print 'Please enter protein sequence filename: '; chomp( my $prot_filename = <STDIN> ); open my $PROTFILE, '<', $prot_filename; my $report_name = "${prot_filename}_alanine_report"; open my $out_file, '>', $report_name; my $rx_fasta_header = qr{ \A > }xms; # header pattern - very naive my $rx_sequence = qr{ \A [ACDEFGHIKLMNPQRSTVWY_]+ \z }xms; # plea +se check my $rx_blank_line = qr{ \A \s* \z }xms; my $sequence; # sequence accumulator - initially undefined RECORD: while (my $record = <$PROTFILE>) { # handle any records to be ignored here next RECORD if $record =~ $rx_blank_line; chomp $record; # get rid of newline, if any # fasta header - process any sequence accumulated so far if ($record =~ $rx_fasta_header) { # do not process if no sequence accumulated so far (i.e., firs +t header) next RECORD if not defined $sequence; do_something_with($sequence, $out_file); undef $sequence; # prepare to process next sequence next RECORD; } # part of sequence - accumulate if ($record =~ $rx_sequence) { $sequence .= $record; # accumulate sub-sequences (records) next RECORD; } die "bad record: '$record'"; # don't know what this is } # end while RECORD loop # handle any sequence accumulated from final FASTA record do_something_with($sequence, $out_file) if defined $sequence; # clean up close $PROTFILE; close $out_file; exit; # subroutines # sub do_something_with { my ($sequence, # accumulated sequence $fh, # output filehandle ) = @_; my $seq_len = length $sequence; }
      But I'm getting the error - bad record: 'SRALGMLAVDNQARVUHGPTVASLAPTFGRGAMTNHWVDIKNANLVVVMGGNAAEAHPVG' at ../alanine_vs_sequence_length_3.pl line 47, <$_...> line 1370. And I get this error for both your and my script. Does it have something to do with the "die" command on line 47 as it is pointing out?

        The
            my $rx_sequence     = qr{ \A [ACDEFGHIKLMNPQRSTVWY_]+ \z }xms;  # please check
        statement defines a regex to match a correct sequence. The attached comment reflects the fact, noted elsewhere in the post | in a previous post, that I'm not a bio-monk and am not at all certain of the correct amino or protein (if that's what they are) code letters to use.

        Is there a code in the
            SRALGMLAVDNQARVUHGPTVASLAPTFGRGAMTNHWVDIKNANLVVVMGGNAAEAHPVG
        sequence that is not included in the
            [ACDEFGHIKLMNPQRSTVWY_]
        character class that is used to recognize a correct sequence? If there is, is the sequence still correct? If the sequence is correct, should the regex be changed to include the unrecognized code? As I said, I'm not expert on biological topics and I cannot say. Please take a look at the sequence and the regex and, based on your training and experience, decide on the proper course of action.

        AFAICT, the regex, as defined, does not match the sequence in question and the sequence is properly (according to the code as it now stands!) rejected as being a bad record.

        BTW: The subroutine

        sub do_something_with { my ($sequence, # accumulated sequence $fh, # output filehandle ) = @_; my $seq_len = length $sequence; }
        still does nothing to write to the output file.


        Give a man a fish:  <%-{-{-{-<

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11107495]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (8)
As of 2024-04-23 07:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found