Peter Keystrokes has asked for the wisdom of the Perl Monks concerning the following question:
Hello,
I'm trying to create subroutines for the functions in my script. My attempts so far don't produce the files of a specific length, although the second subroutine named, 'HashSequences' seems to work at least as far as for requesting minimum length and maximum length from the user, which of course is far from the objective.
I know it will be important to learn how to use subroutines because I have other ideas of functions I would like to implement into this script to eventually have a script that manipulates large files containing fasta sequences in many ways.
Here is my code:
##Invoking subroutines
HashSequences();
SpecifySeqLengths(my $id, my %seq);
## I got errors when I didn't declare these, yet somehow I think this
+is wrong
## Open file. Hash sequences. Make the sequence IDs the keys to their
## respective (hashed) sequences.
sub HashSequences{
open F, "human_hg19_circRNAs_putative_spliced_sequence.fa", or die $!;
my %seq = ();
my $id = '';
while (<F>){
chomp;
if ($_ =~ /^>(.+)/){
$id = $1;
}else{
$seq{$id} .= $_;
}
}
close F;
return (%seq, $id);
}
## Request sequence length desired. Sieve sequences of given length
## into arrays. Create file containing desired sequences.
sub SpecifySeqLengths{
print "Enter Max sequence length: \n";
my $maxlength = <STDIN>;
chomp $maxlength;
print "Enter Min sequence length: \n";
my $minlength = <STDIN>;
chomp $minlength;
my @seqarray;
foreach $id (keys %seq){
if ((length$seq{$id} <= $maxlength) && (length$seq{$id} >= $minlen
+gth)){
push @seqarray, $id;
}
}
for $id (@seqarray){
if (-f $id){print $id, " already exists. It is about to be overwri
+tten"};
open new_F, '>>', "SeqLength_$minlength-$maxlength", or die$!;
print new_F ($id."\n".$seq{$id}."\n");
close new_F;
}
}
Any help is well appreciated
Thank you.
Re: I'm trying to consolidate my functions into subroutines
by hippo (Bishop) on May 15, 2017 at 08:24 UTC
|
TL;DR - see Coping with Scoping
In your code you have called the first subroutine like this:
HashSequences();
But that subroutine returns values like this:
return (%seq, $id);
Since you call the subroutine in a void context those values are lost and the net result of this particular call to the subroutine is zero (bar a waste of time). If you want to return values from a subroutine and use them in the calling routine you must assign them in the calling routine. eg:
$bestday = max($mon,$tue,$wed,$thu,$fri);
That code is taken from the very first example in perlsub which I hope you have read. See how the result on the right is assigned to a variable on the left? That's what you need to do. If returning multiple items then the thing on the left should be a list or an array (forcing list context - you will need to start thinking about context a lot more).
The next problem with your return statement is that you are returning a hash and a scalar. But the return flattens all lists (including hashes) so this is not the correct way to proceed. You could use a reference but that's probably a few chapters down the line. For now, respect the basic approach which is to return all the scalars first so the calling routine knows where the list starts. eg. Change the return line to:
return ($id, %seq);
and call it like
my ($thisid, %thatseq) = HashSequences();
See the Writing Subroutines section of perlintro for more about the basics. | [reply] [d/l] [select] |
Re: I'm trying to consolidate my functions into subroutines
by LanX (Saint) on May 13, 2017 at 16:37 UTC
|
Functions and subroutines are the same thing.
> Any help is well appreciated
Do yourself a favor and start indenting your code properly.
A decent editor does it automatically.
| [reply] |
|
| [reply] |
|
sub name {
print $a;
....
}
not
sub name {
print $a;
....
}
> Except subroutines are supposed to be reusable.
in which language?
| [reply] [d/l] [select] |
|
|
|
Except subroutines are supposed to be reusable.
What LanX means is that a function is a subroutine. Specifically, a subroutine that returns a value (or a list of values).
In Perl, all subroutines return a value, either explicitly or implicitly, so all Perl subroutines are also functions.
So, the following 3 definitions are equivalent:
sub myfunction_long
{
my $z = $_[0] + $_[1]; # add the first 2 parameters and assign res
+ult to $z
return $z;
}
sub myfunction_medium
{
my $z = $_[0] + $_[1]; # add the first 2 parameters and assign res
+ult to $z
} # value of $z implicitly returned
sub myfunction_short
{
$_[0] + $_[1]; # add the first 2 parameters
} # result of expression implicitly returned
print myfunction_long(1, 2); # prints 3
print myfunction_medium(1, 2); # prints 3
print myfunction_short(1, 2); # prints 3
Many people are more comfortable using return even when not needed. When not used, the result of the last executed statement is the value returned.
In some programming languages, such as C, you can define either a "pure" subroutine or a function:
In some programming languages, such as C, you can define either a function or a "pure" subroutine:
int z = 0;
void myroutine(int x, y) // "void" tells the compiler that no value is
+ returned
{
z = x + y; // add the values of the 2 defined parameters and assig
+n the result to the global z
}
int myfunction(int x, y) // "int" tells the compiler that an integer v
+alue is returned
{
return(x + y); // add the values of the 2 defined parameters then
+return the result
}
Perl, however, has no such distinction.
Update: Changed order of terms in a sentence to limit scope of the adjective "pure".
| [reply] [d/l] [select] |
|
|
|
|
|
|
|
|
|
|
|
| [reply] |
|
| [reply] |
Re: I'm trying to consolidate my functions into subroutines
by Anonymous Monk on May 13, 2017 at 17:15 UTC
|
Parameter passing is kind of weird in Perl. When you say
SpecifySeqLengths(my $id, my %seq);
it's actually creating global variables $id and %seq. But your subs don't use those global variables, they create their own variables by saying "my". Those "my" variables have the same name, but they are actually completely separate, so they don't actually share data with each other. Putting your data in global variables is arguably a bad habit, as it tends to inhibit code reuse. The "best practice" is to write something like this:
{
my $seq = HashSequences();
SpecifySeqLengths($seq);
}
sub HashSequences {
my $seq = {};
...
$seq->{$id} .= $1;
...
return $seq;
}
sub SpecifySeqLengths {
my ($seq) = @_;
...
for my $id (keys %$seq) {
... length($seq->{$id}) ...
}
...
}
| [reply] [d/l] [select] |
|
Parameter passing is kind of weird in Perl. When you say SpecifySeqLengths(my $id, my %seq); it's actually creating global variables $id and %seq.
While I agree with the rest of the post and the code example is good, this part is not quite accurate. Those two variables are still lexically scoped to whatever block they're declared in. If that happens to be the scope of the file, one might tend to call them "global", but typically in Perl that term is used for package variables, which are "global" in the sense that they cross the file boundary. There are actually only a limited number of "truly global" variables (i.e. they cross even package boundaries), such as $_ and other special variables (although strangely, that list only seems to be in the Camel's reference section, not in the Perl docs).
| [reply] [d/l] |
|
Good point. What would be a better term for top-level lexical variables?
| [reply] |
|
|
|
c:\@Work\Perl>perl -wMstrict -le
"confuse_the_issue(my $x, my %hash);
;;
sub confuse_the_issue {
print qq{x: $x};
for $x (keys %hash) {
print qq{$x $hash{$x}};
}
}
"
Use of uninitialized value $x in concatenation (.) or string at -e lin
+e 1.
x:
c:\@Work\Perl>perl -wMstrict -le
"sub confuse_the_issue {
print qq{x: $x};
for $x (keys %hash) {
print qq{$x $hash{$x}};
}
}
;;
confuse_the_issue(my $x, my %hash);
"
Global symbol "$x" requires explicit package name at -e line 1.
Global symbol "$x" requires explicit package name at -e line 1.
Global symbol "%hash" requires explicit package name at -e line 1.
Global symbol "$x" requires explicit package name at -e line 1.
Global symbol "%hash" requires explicit package name at -e line 1.
Global symbol "$x" requires explicit package name at -e line 1.
Execution of -e aborted due to compilation errors.
In the first case, Perl just gives you a light slap on the wrist, but that doesn't mean you are not already far down the road to Hell.
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
Re: I'm trying to consolidate my functions into subroutines
by Anonymous Monk on May 14, 2017 at 07:52 UTC
|
#!/usr/bin/perl --
# consolidated.pl
# 2017-05-13-19:55:06
#
#
## perltidy -olq -csc -csci=10 -cscl="sub : BEGIN END if while for "
+-otr -opr -ce -nibc -i=4 -pt=0 "-nsak=*"
#!/usr/bin/perl --
use strict;
use warnings;
Main( @ARGV );
exit( 0 );
sub Main {
my $infile = "human_hg19_circRNAs_putative_spliced_sequence.fa";
my( $seq, $id ) = ReadHashSequences( $infile );
my( $min, $max ) = PromptMinMax();
SpecifySeqLengths( $seq, $min, $max );
}
sub ReadHashSequences {
my( $infile ) = @_;
use autodie qw/ open /;
open my( $fh ), '<', $infile;
my %seq = ();
my $id = '';
while( <$fh> ) {
chomp;
if( $_ =~ /^>(.+)/ ) {
$id = $1;
} else {
$seq{$id} .= $_;
}
}
close $fh;
return \%seq, $id;
} ## end sub ReadHashSequences
sub Prompt {
my( $msg, $default ) = @_;
...;
return $default;
}
sub PromptMinMax {
my( $def_min, $def_max ) = grep defined, @_, 0, 10;
my $min = Prompt( "Enter Max sequence length:", $def_min );
my $max = Prompt( "Enter Min sequence length:", $def_max );
return $min, $max;
}
sub SpecifySeqLengths {
my( $seq, $minlength, $maxlength ) = @_;
for my $id ( keys %$seq ) {
if( ( length $seq->{$id} <= $maxlength )
&& ( length $seq->{$id} >= $minlength ) )
{
PunchFile( "SeqLength_$minlength-$maxlength", $id, $seq->{
+$id} );
}
}
} ## end sub SpecifySeqLengths
sub PunchFile {
my( $newfile, $id, $val ) = @_;
if( -f $id ) {
print $id, " already exists. It is about to be overwritten\n";
}
use autodie qw/ open /;
open my( $newfh ), '>>', $newfile;
print $newfh $id, "\n", $val, "\n";
close $newfh;
} ## end sub PunchFile
__END__
$ perl consolidated.pl
Can't open 'human_hg19_circRNAs_putative_spliced_sequence.fa' for read
+ing: 'No such file or directory' at consolidated.p
l line 23
| [reply] [d/l] |
|
|