Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

I'm trying to consolidate my functions into subroutines

by Peter Keystrokes (Beadle)
on May 13, 2017 at 16:27 UTC ( [id://1190199]=perlquestion: print w/replies, xml ) Need Help??

Peter Keystrokes has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

I'm trying to create subroutines for the functions in my script. My attempts so far don't produce the files of a specific length, although the second subroutine named, 'HashSequences' seems to work at least as far as for requesting minimum length and maximum length from the user, which of course is far from the objective.

I know it will be important to learn how to use subroutines because I have other ideas of functions I would like to implement into this script to eventually have a script that manipulates large files containing fasta sequences in many ways.

Here is my code:

##Invoking subroutines HashSequences(); SpecifySeqLengths(my $id, my %seq); ## I got errors when I didn't declare these, yet somehow I think this +is wrong ## Open file. Hash sequences. Make the sequence IDs the keys to their ## respective (hashed) sequences. sub HashSequences{ open F, "human_hg19_circRNAs_putative_spliced_sequence.fa", or die $!; my %seq = (); my $id = ''; while (<F>){ chomp; if ($_ =~ /^>(.+)/){ $id = $1; }else{ $seq{$id} .= $_; } } close F; return (%seq, $id); } ## Request sequence length desired. Sieve sequences of given length ## into arrays. Create file containing desired sequences. sub SpecifySeqLengths{ print "Enter Max sequence length: \n"; my $maxlength = <STDIN>; chomp $maxlength; print "Enter Min sequence length: \n"; my $minlength = <STDIN>; chomp $minlength; my @seqarray; foreach $id (keys %seq){ if ((length$seq{$id} <= $maxlength) && (length$seq{$id} >= $minlen +gth)){ push @seqarray, $id; } } for $id (@seqarray){ if (-f $id){print $id, " already exists. It is about to be overwri +tten"}; open new_F, '>>', "SeqLength_$minlength-$maxlength", or die$!; print new_F ($id."\n".$seq{$id}."\n"); close new_F; } }

Any help is well appreciated

Thank you.

Replies are listed 'Best First'.
Re: I'm trying to consolidate my functions into subroutines
by hippo (Bishop) on May 15, 2017 at 08:24 UTC

    TL;DR - see Coping with Scoping


    In your code you have called the first subroutine like this:

    HashSequences();

    But that subroutine returns values like this:

    return (%seq, $id);

    Since you call the subroutine in a void context those values are lost and the net result of this particular call to the subroutine is zero (bar a waste of time). If you want to return values from a subroutine and use them in the calling routine you must assign them in the calling routine. eg:

    $bestday = max($mon,$tue,$wed,$thu,$fri);

    That code is taken from the very first example in perlsub which I hope you have read. See how the result on the right is assigned to a variable on the left? That's what you need to do. If returning multiple items then the thing on the left should be a list or an array (forcing list context - you will need to start thinking about context a lot more).

    The next problem with your return statement is that you are returning a hash and a scalar. But the return flattens all lists (including hashes) so this is not the correct way to proceed. You could use a reference but that's probably a few chapters down the line. For now, respect the basic approach which is to return all the scalars first so the calling routine knows where the list starts. eg. Change the return line to:

    return ($id, %seq);

    and call it like

    my ($thisid, %thatseq) = HashSequences();

    See the Writing Subroutines section of perlintro for more about the basics.

Re: I'm trying to consolidate my functions into subroutines
by LanX (Saint) on May 13, 2017 at 16:37 UTC
    Functions and subroutines are the same thing.

    > Any help is well appreciated

    Do yourself a favor and start indenting your code properly.

    A decent editor does it automatically.

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)
    Je suis Charlie!

      Except subroutines are supposed to be reusable.

      "do yourself a favor and start indenting your code properly."

      Explain?
        > Explain

        sub name { print $a; .... }

        not

        sub name { print $a; .... }

        > Except subroutines are supposed to be reusable.

        in which language?

        Cheers Rolf
        (addicted to the Perl Programming Language and ☆☆☆☆ :)
        Je suis Charlie!

        Except subroutines are supposed to be reusable.

        What LanX means is that a function is a subroutine. Specifically, a subroutine that returns a value (or a list of values).

        In Perl, all subroutines return a value, either explicitly or implicitly, so all Perl subroutines are also functions.

        So, the following 3 definitions are equivalent:

        sub myfunction_long { my $z = $_[0] + $_[1]; # add the first 2 parameters and assign res +ult to $z return $z; } sub myfunction_medium { my $z = $_[0] + $_[1]; # add the first 2 parameters and assign res +ult to $z } # value of $z implicitly returned sub myfunction_short { $_[0] + $_[1]; # add the first 2 parameters } # result of expression implicitly returned print myfunction_long(1, 2); # prints 3 print myfunction_medium(1, 2); # prints 3 print myfunction_short(1, 2); # prints 3

        Many people are more comfortable using return even when not needed. When not used, the result of the last executed statement is the value returned.

        In some programming languages, such as C, you can define either a "pure" subroutine or a function:

        In some programming languages, such as C, you can define either a function or a "pure" subroutine:

        int z = 0; void myroutine(int x, y) // "void" tells the compiler that no value is + returned { z = x + y; // add the values of the 2 defined parameters and assig +n the result to the global z } int myfunction(int x, y) // "int" tells the compiler that an integer v +alue is returned { return(x + y); // add the values of the 2 defined parameters then +return the result }

        Perl, however, has no such distinction.

        Update: Changed order of terms in a sentence to limit scope of the adjective "pure".

      One wonders here - does a decent editor fix typoes and grammatical errors, such as capitalizing the first word of a sentence or adding a missing dot at the end of sentence?

        OK Jeff! :)
Re: I'm trying to consolidate my functions into subroutines
by Anonymous Monk on May 13, 2017 at 17:15 UTC
    Parameter passing is kind of weird in Perl. When you say
    SpecifySeqLengths(my $id, my %seq);
    it's actually creating global variables $id and %seq. But your subs don't use those global variables, they create their own variables by saying "my". Those "my" variables have the same name, but they are actually completely separate, so they don't actually share data with each other. Putting your data in global variables is arguably a bad habit, as it tends to inhibit code reuse. The "best practice" is to write something like this:
    { my $seq = HashSequences(); SpecifySeqLengths($seq); } sub HashSequences { my $seq = {}; ... $seq->{$id} .= $1; ... return $seq; } sub SpecifySeqLengths { my ($seq) = @_; ... for my $id (keys %$seq) { ... length($seq->{$id}) ... } ... }
      Parameter passing is kind of weird in Perl. When you say SpecifySeqLengths(my $id, my %seq); it's actually creating global variables $id and %seq.

      While I agree with the rest of the post and the code example is good, this part is not quite accurate. Those two variables are still lexically scoped to whatever block they're declared in. If that happens to be the scope of the file, one might tend to call them "global", but typically in Perl that term is used for package variables, which are "global" in the sense that they cross the file boundary. There are actually only a limited number of "truly global" variables (i.e. they cross even package boundaries), such as $_ and other special variables (although strangely, that list only seems to be in the Camel's reference section, not in the Perl docs).

        Good point. What would be a better term for top-level lexical variables?

      Further to the AnonyMonk's post: Peter Keystrokes: Note that the invocation of the  SpecifySeqLengths() function in the OPed code before the definition of the subroutine hides problems that exist in that function definition and that warnings and strict (the latter in particular) would like to bring to your attention. Consider the case of invoking the following function before its definition versus after the definition:

      c:\@Work\Perl>perl -wMstrict -le "confuse_the_issue(my $x, my %hash); ;; sub confuse_the_issue { print qq{x: $x}; for $x (keys %hash) { print qq{$x $hash{$x}}; } } " Use of uninitialized value $x in concatenation (.) or string at -e lin +e 1. x: c:\@Work\Perl>perl -wMstrict -le "sub confuse_the_issue { print qq{x: $x}; for $x (keys %hash) { print qq{$x $hash{$x}}; } } ;; confuse_the_issue(my $x, my %hash); " Global symbol "$x" requires explicit package name at -e line 1. Global symbol "$x" requires explicit package name at -e line 1. Global symbol "%hash" requires explicit package name at -e line 1. Global symbol "$x" requires explicit package name at -e line 1. Global symbol "%hash" requires explicit package name at -e line 1. Global symbol "$x" requires explicit package name at -e line 1. Execution of -e aborted due to compilation errors.
      In the first case, Perl just gives you a light slap on the wrist, but that doesn't mean you are not already far down the road to Hell.


      Give a man a fish:  <%-{-{-{-<

Re: I'm trying to consolidate my functions into subroutines
by Anonymous Monk on May 14, 2017 at 07:52 UTC

    Fix this

    #!/usr/bin/perl -- # consolidated.pl # 2017-05-13-19:55:06 # # ## perltidy -olq -csc -csci=10 -cscl="sub : BEGIN END if while for " +-otr -opr -ce -nibc -i=4 -pt=0 "-nsak=*" #!/usr/bin/perl -- use strict; use warnings; Main( @ARGV ); exit( 0 ); sub Main { my $infile = "human_hg19_circRNAs_putative_spliced_sequence.fa"; my( $seq, $id ) = ReadHashSequences( $infile ); my( $min, $max ) = PromptMinMax(); SpecifySeqLengths( $seq, $min, $max ); } sub ReadHashSequences { my( $infile ) = @_; use autodie qw/ open /; open my( $fh ), '<', $infile; my %seq = (); my $id = ''; while( <$fh> ) { chomp; if( $_ =~ /^>(.+)/ ) { $id = $1; } else { $seq{$id} .= $_; } } close $fh; return \%seq, $id; } ## end sub ReadHashSequences sub Prompt { my( $msg, $default ) = @_; ...; return $default; } sub PromptMinMax { my( $def_min, $def_max ) = grep defined, @_, 0, 10; my $min = Prompt( "Enter Max sequence length:", $def_min ); my $max = Prompt( "Enter Min sequence length:", $def_max ); return $min, $max; } sub SpecifySeqLengths { my( $seq, $minlength, $maxlength ) = @_; for my $id ( keys %$seq ) { if( ( length $seq->{$id} <= $maxlength ) && ( length $seq->{$id} >= $minlength ) ) { PunchFile( "SeqLength_$minlength-$maxlength", $id, $seq->{ +$id} ); } } } ## end sub SpecifySeqLengths sub PunchFile { my( $newfile, $id, $val ) = @_; if( -f $id ) { print $id, " already exists. It is about to be overwritten\n"; } use autodie qw/ open /; open my( $newfh ), '>>', $newfile; print $newfh $id, "\n", $val, "\n"; close $newfh; } ## end sub PunchFile __END__ $ perl consolidated.pl Can't open 'human_hg19_circRNAs_putative_spliced_sequence.fa' for read +ing: 'No such file or directory' at consolidated.p l line 23

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1190199]
Approved by marto
Front-paged by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (6)
As of 2024-04-25 18:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found