comment on

Hi Perl Monks,

I am a beginner in perl programming. I have written a perl script which can read a small text file and gives correct results for inter-substring distance in cmd in Windows XP. But cmd shows the problem of “out of memory” when I try to analyze a large text file with 219475005 letters for finding the inter-substring distance although the program counts the number of each letter in the file correctly within 2 minutes but fails to find the inter-substring distance. I think this could due to incorrect reading of file.

So I have given the initial part of the script and the results of cmd screen below. I am seeking your suggestions to rectify the mistake in the script for analyzing a large file.

Furthermore, I need the syntax at the initial part to assign the input large file to an array variable like my @linesso that I can assign this array to a scalar variable like my $string ="@lines";for use in later part of the script.

#!/usr/bin/perl –w
print "\n\nPlease type the filename: ";
$DNAfilename = <STDIN>;
chomp $DNAfilename;
# open the large file
unless ( open(DNAFILE, $DNAfilename) ) {
print "Cannot open file \"$DNAfilename\"\n\n";
exit;
} 
my @lines = <DNAFILE>; 
while (<DNAFILE>) {
  say $_;
} 
close DNAFILE;
$DNA = join( '', @lines);
# Remove whitespace
$DNA=~ s/\s//g;
# Count number of bases
$b=length($DNA);
print "\nNumber of bases: $b.";
# Count number of each base and nonbase
$A=0;$T=0;$G=0;$C=0;$e=0; 
while($DNA=~ /A/ig){$A++}
while($DNA=~ /T/ig){$T++}
while($DNA=~ /G/ig){$G++}
while($DNA=~ /C/ig){$C++}
while($DNA=~ /[^ATGC]/ig){$e++}
. . . .
[download]

Command Prompt Results:

C:\Documents and Settings\user\Desktop>m3.pl

Please type the filename of the DNA sequence data: chr1.txt

Number of bases: 219475005.

A=63473407; T=63582431; G=45425056; C=45435903; Errors(N)=1558208.

Enter a motif to count nt between two such motifs: GAATTCCT

I found the motif!

Out of memory!

C:\Documents and Settings\user\Desktop>

Thanks to Perl Monks for their quick reply in solving perl problems.

In reply to Request to detect the mistake in a perl script for finding inter-substring distance from a large text file by supriyoch_2008

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Keep It Simple, Stupid
	PerlMonks