Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re: Creating 2D array / Matrix

by bioinformatics (Friar)
on Mar 18, 2013 at 20:29 UTC ( [id://1024144]=note: print w/replies, xml ) Need Help??


in reply to Creating 2D array / Matrix

If you are looking to put them in order, then you are better off using a hash of hashes (with an array :) ). For instance:

#!/usr/bin/perl use warnings; use strict; my $usage = "merge_bed.pl <input1> <input2> <output>"; my $input_1 = shift or die $usage; my $input_2 = shift or die $usage; my $output = shift or die $usage; open my $in1, "<", "$input_1" or die "Cannot open $input_1: $!\n"; open my $in2, "<", "$input_2" or die "Cannot open $input_2: $!\n"; open my $out, ">", "$output" or die "Cannot open $output: $!\n"; my %bed_files = (); while ( <$in1> ) { chomp; my ($chrom, $start, $end, undef, undef, $strand) = split "\t"; $bed_files{$chrom}{$start}[0] = $end; # you can save multiple valu +es as an array, so both the end and strand and anything else you want $bed_files{$chrom}{$start}[1] = $strand; } while ( <$in2> ) { chomp; my ($chrom, $start, $end, undef, undef, $strand) = split "\t"; $bed_files{$chrom}{$start}[0] = $end; $bed_files{$chrom}{$start}[1] = $strand; } for my $chrom (sort keys %bed_files) { for my $start (sort {$a <=> $b} keys %{$bed_files[$chrom}}) { # print out the results sorted by chromosome (or scaffold) and + start site print $out "$chrom\t$start\t$bed_files{$chrom}{$start}[0]\t$be +d_files{$chrom}{$start}[1]\n"; } } close $in1; close $in2; close $out; exit;


This assumes that your lists don't have a) some of the same start sites and b) that there are not overlaps. If you want to find overlaps, then you can do the same thing, but have two separate data structures; you can find the overlaps, and then output the unique regions and one copy of the overlapping regions. There are ways to do this using an index (so it's faster and you only make one pass instead of looping though the entire bed file multiple times).

You could create function for code generating the main data structure, but I left it as is just so it's easier to read and see what I'm doing. Have fun!

EDIT: Fixed a couple of typos!

Bioinformatics

Replies are listed 'Best First'.
Re^2: Creating 2D array / Matrix
by kash650 (Novice) on Mar 18, 2013 at 21:31 UTC
    Can you explain the $usage and "shift or die" part?
      shift here shifts from the @ARGV array. If it fails, (if not enough file names, 3, were supplied on the command line), then the program dies, (quits), and prints the string contained in $usage, ("merge_bed.pl <input1> <input2> <output>". (This message tells the user that his perl program, merge_bed.pl or whatever name you choose as your program name requires 2 input filenames and 1 output name).

      Can you explain the $usage and "shift or die" part?

      If you employ Basic debugging checklist (deparse,print) you can figure it out pretty quick :)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1024144]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (4)
As of 2024-04-25 16:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found