Re^8: How to write to a file?

Following is the script that I tried with open:

#!/usr/bin/perl
use strict;
use warnings;

print 'Please enter protein sequence filename: ';
chomp( my $prot_filename = <STDIN> );

open my $PROTFILE, '<', $prot_filename 
  or die "Cannot open '$prot_filename' because: $!";

my $report_name = $prot_filename.'_alanine_report';
open my $out_file, '>', $report_name
     or die "Cannot open '$report_name' because: $!";

my $rx_fasta_header = qr{ \A > }xms;  # header pattern - very naive
my $rx_sequence     = qr{ \A [ACDEFGHIKLMNPQRSTVWY_]+ \z }xms;  # plea
+se check

my $sequence;  # sequence accumulator - initially undefined

RECORD:
while (my $record = <$PROTFILE>) {

    chomp $record;  # get rid of newline, if any

    # fasta header - process any sequence accumulated so far
    if ($record =~ $rx_fasta_header) {
        
        # do not process if no sequence accumulated so far (i.e., firs
+t header)
        next RECORD if not defined $sequence;
        
        my $seq_len = length $sequence;
        my $seq_n_A = $sequence =~ tr/A//;;
        do_something_with($sequence, $seq_len, $seq_n_A);
        
        undef $sequence;  # prepare to process next sequence
        next RECORD;
        
        }

    # part of sequence - accumulate
    if ($record =~ $rx_sequence) {
        $sequence .= $record;  # accumulate sub-sequences (records)
        next RECORD;
        }

    die "bad record: '$record'";  # don't know what this is

    }

# show all results
print $out_file;
close $out_file;
[download]

But, it gave me the error: bad record: '' at ../alanine_vs_sequence_length_2.pl line 46, <$PROTFILE> line 42. And the outfile is empty. What am I doing wrong here?

Comment on Re^8: How to write to a file? Download Code

Replies are listed 'Best First'.
Re^9: How to write to a file? by soonix (Canon) on Oct 15, 2019 at 19:27 UTC
Your input file seems to contain an empty line, but the script does not provide for it. If empty lines are ok, you could define something like `my $rx_empty = qr{ \A \s* \z }xms; # empty line (or whitespace on +ly)` [download] and after the `chomp` `if ($record =~ $rx_empty) { # ignore next RECORD }` [download]	[reply] [d/l] [select]
Re^9: How to write to a file? by AnomalousMonk (Archbishop) on Oct 15, 2019 at 20:35 UTC
All my coding suggestions are untested. ... the error: bad record: '' at ../alanine_vs_sequence_length_2.pl line 46, <$PROTFILE> line 42. (soonix has already addressed this.) ... the outfile is empty. Almost all processing of a sequence is done in the `do_something_with()` subroutine, which you do not show. I assume this subroutine exists in your code because the code would not compile if it did not (because you're enabling strictures with strict). Because I cannot see the code, I cannot say why it produces no output. Here's another version of your posted code, including `do_something_with()`. It's not complete because I still don't know exactly what you want to do with each sequence. I've also done some re-organization, and handled the case of a sequence resulting from a final FASTA record (which I had not considered before). I'm using autodie to simplify file I/O error reporting. Again, this is untested. Read more... (2 kB) Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re^10: How to write to a file? by davi54 (Sexton) on Oct 15, 2019 at 21:38 UTC
I'm sorry I did not include my subroutines in the previous message. But, I am counting the length of the sequence in the subroutine but it's not working yet. I tried your suggestion as follows: #!/usr/bin/perl use strict; use warnings; use autodie; print 'Please enter protein sequence filename: '; chomp( my $prot_filename = <STDIN> ); open my $PROTFILE, '<', $prot_filename; my $report_name = "${prot_filename}_alanine_report"; open my $out_file, '>', $report_name; my $rx_fasta_header = qr{ \A > }xms; # header pattern - very naive my $rx_sequence = qr{ \A [ACDEFGHIKLMNPQRSTVWY_]+ \z }xms; # plea +se check my $rx_blank_line = qr{ \A \s* \z }xms; my $sequence; # sequence accumulator - initially undefined RECORD: while (my $record = <$PROTFILE>) { # handle any records to be ignored here next RECORD if $record =~ $rx_blank_line; chomp $record; # get rid of newline, if any # fasta header - process any sequence accumulated so far if ($record =~ $rx_fasta_header) { # do not process if no sequence accumulated so far (i.e., firs +t header) next RECORD if not defined $sequence; do_something_with($sequence, $out_file); undef $sequence; # prepare to process next sequence next RECORD; } # part of sequence - accumulate if ($record =~ $rx_sequence) { $sequence .= $record; # accumulate sub-sequences (records) next RECORD; } die "bad record: '$record'"; # don't know what this is } # end while RECORD loop # handle any sequence accumulated from final FASTA record do_something_with($sequence, $out_file) if defined $sequence; # clean up close $PROTFILE; close $out_file; exit; # subroutines # sub do_something_with { my ($sequence, # accumulated sequence $fh, # output filehandle ) = @_; my $seq_len = length $sequence; } [download] But I'm getting the error - bad record: 'SRALGMLAVDNQARVUHGPTVASLAPTFGRGAMTNHWVDIKNANLVVVMGGNAAEAHPVG' at ../alanine_vs_sequence_length_3.pl line 47, <$_...> line 1370. And I get this error for both your and my script. Does it have something to do with the "die" command on line 47 as it is pointing out?	[reply] [d/l]
Re^11: How to write to a file? by AnomalousMonk (Archbishop) on Oct 15, 2019 at 22:40 UTC
The `my $rx_sequence = qr{ \A [ACDEFGHIKLMNPQRSTVWY_]+ \z }xms; # please check` statement defines a regex to match a correct sequence. The attached comment reflects the fact, noted ~~elsewhere in the post~~ \| in a previous post, that I'm not a bio-monk and am not at all certain of the correct amino or protein (if that's what they are) code letters to use. Is there a code in the `SRALGMLAVDNQARVUHGPTVASLAPTFGRGAMTNHWVDIKNANLVVVMGGNAAEAHPVG` sequence that is not included in the `[ACDEFGHIKLMNPQRSTVWY_]` character class that is used to recognize a correct sequence? If there is, is the sequence still correct? If the sequence is correct, should the regex be changed to include the unrecognized code? As I said, I'm not expert on biological topics and I cannot say. Please take a look at the sequence and the regex and, based on your training and experience, decide on the proper course of action. AFAICT, the regex, as defined, does not match the sequence in question and the sequence is properly (according to the code as it now stands!) rejected as being a bad record. BTW: The subroutine `sub do_something_with { my ($sequence, # accumulated sequence $fh, # output filehandle ) = @_; my $seq_len = length $sequence; }` [download] still does nothing to write to the output file. Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re^12: How to write to a file? by davi54 (Sexton) on Oct 16, 2019 at 14:56 UTC


Do you know where your variables are?
	PerlMonks