Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re^2: Searching and Coutning using 2 files with multiple columns

by shart3 (Novice)
on Feb 19, 2010 at 15:59 UTC ( [id://824193]=note: print w/replies, xml ) Need Help??


in reply to Re: Searching and Coutning using 2 files with multiple columns
in thread Searching and Coutning using 2 files with multiple columns

Ok. I changed my code to do a Mysql query. I have 2 files DB and a MySQL Table. The mysql table is constructed with 3 column: chrom, start, end. DB has 7 columns. For each line in DB, I want to count the number of times that col2 matches chrom and start is BETWEEN col3 and col4. Then I want to print that value in DB col8. Here is the code I have:

#!/usr/bin/perl # PERL MODULE use DBI; my $DB_NAME = "H3K36me3"; my $DB_USER = "root"; my $DB_PASS = "password"; my $dbh = DBI->connect("DBI:mysql:$DB_NAME","$DB_USER","$DB_PASS"); open (FILE,"@ARGV[0]")||die "usage: perl MySQL.query.pl <DB> "; @array=<FILE>; close(FILE); foreach $line (@array){ my ($Gene,$chr_id,$left,$right,$Strand,$ExonCount,$SizeKB)=split(/ +\t/,$line); $sql = "select count(*) from H3K36me3 where chrom =\"$chr_id\" AND + start between $left AND $right"; $sth= $dbh->prepare($sql); @Count=$sth->execute()||die "problem Here!\n"; @Count=$sth->fetchrow(); while (@Count = $sth->fetchrow()) { print "@Count[0]\n"; } }

The problem is that I can print it to the screen and it will slowly work (the MySQL table is 20 million rows), but when I go to print to file with ' perl MySQL.query.pl DB > Out.txt' it stops around ~6000 rows. It actually stops in a wierd way, like if the actual count is 30, it will only print 3. How do I deal with this?

Replies are listed 'Best First'.
Re^3: Searching and Coutning using 2 files with multiple columns
by moritz (Cardinal) on Feb 19, 2010 at 16:56 UTC
    There are lots of ways in which you can improve your code.

    The first is to tell mysql to build indexes on the chrom and the start columns. The second is to use prepared statements and execute() as show in the DBI documentation and in this tutorial.

    If you use the RaiseError option in DBI->connect, you can leave out the ||die and get much better error messages.

    It actually stops in a wierd way, like if the actual count is 30, it will only print 3. How do I deal with this?

    How does it "stop"? Does it hang? or does it terminate? What is the exit code? does it run out of memory or disk space?

    Perl 6 - links to (nearly) everything that is Perl 6.

      It would just hang. It ended up running all night. The processor was still processing, but I wasn't getting any data out. I will try to index my MySQL table to make it run faster. I have to learn how to do it, but for now I am doing

      mysql> create index Chr_start_end using btree on H3K36me3 (chrom,start,end)

      Thanks for the suggestion!

      Building the index worked! Thanks again!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://824193]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (4)
As of 2024-04-19 14:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found