comment on

Marsel:

Another way you might be able to do the job is with a file merge. To do so, sort both files on the key(s) of interest, then read records in order and merge them as appropriate.

Example:

#!/usr/bin/perl -w
use strict;
use warnings;

open F1, 'sort -k3 mergefile.1|' or die "opening file 1";
open F2, 'sort -k2 mergefile.2|' or die "opening file 2";

open OUF, '>', 'mergefile.out' or die "opening output file";

my @in1;
my @in2;

sub getrec1 {
        @in1 = ();
        if (!eof(F1)) {
                (@in1) = split /\t/, <F1>;
                chomp $in1[2];
        }
}

sub getrec2 {
        @in2 = ();
        if (!eof(F2)) {
                (@in2) = split /\t/, <F2>;
                chomp $in2[2];
        }
}

sub write1 {
        print OUF "$in1[2]\t$in1[0]\t$in1[1]\tnull\tnull\n";
        getrec1;
}

sub write2 {
        print OUF "$in2[1]\tnull\tnull\t$in2[0]\t$in2[2]\n";
        getrec2;
}

sub writeboth {
        print OUF "$in1[2]\t$in1[0]\t$in1[1]\t$in2[0]\t$in2[2]\n";
        getrec1;
        getrec2;
}

# Prime the pump
getrec1;
getrec2;

while (1) {
        last if $#in1<0 and $#in2<0;

        if ($#in1<0 or $#in2<0) {
                # Only one file is left...
                write2 if $#in1<0;
                write1 if $#in2<0;
        }
        elsif ($in1[2] eq $in2[1]) {
                # Matching records, merge & write 'em
                writeboth;
        }
        elsif ($in1[2] lt $in2[1]) {
                # unmatched item in file 1, write it & get next rec
                write1;
        }
        else {
                # unmatched item in file 2, write it & get next rec
                write2;
        }
}
[download]

Example output:

root@swill ~/PerlMonks
$ cat mergefile.1
15      20      foo
22      30      bar
30      33      baz
14      22      fubar

root@swill ~/PerlMonks
$ cat mergefile.2
alpha   baz     17.30
gamma   foobar  22.35
gamma   bar     19.01
delta   fromish 33.03
sigma   bear    14.56

root@swill ~/PerlMonks
$ ./file_merge.pl

root@swill ~/PerlMonks
$ cat mergefile.out
bar     22      30      gamma   19.01
baz     30      33      alpha   17.30
bear    null    null    sigma   14.56
foo     15      20      null    null
foobar  null    null    gamma   22.35
fromish null    null    delta   33.03
fubar   14      22      null    null

root@swill ~/PerlMonks
$
[download]

--Roboticus

In reply to Re: How to deal with Huge data by roboticus
in thread How to deal with Huge data by Marsel

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Clear questions and runnable code get the best and fastest answer
	PerlMonks