http://qs321.pair.com?node_id=548184

azaria has asked for the wisdom of the Perl Monks concerning the following question:

Hello,
I would like to read in parallel several files and to generate a file which each line is a result of a line conctanation of each one of the files. Lets say:
file A:
111
222
333
...

file B:
AAA
BBB
CCC
...

file C:
aaa
bbb
ccc
...
Then the output file , will contain:
111AAAaaa
222BBBbbb
333CCCccc
Please advice how can i do it shortly ?

Thanks azaria

Replies are listed 'Best First'.
Re: parallel reading
by Zaxo (Archbishop) on May 09, 2006 at 13:38 UTC

    Here's another,

    open my $out, '>', 'ABC' or die $!; { local $_; open my $A, '<', 'A' or die $!; open my $B, '<', 'B' or die $!; open my $C, '<', 'C' or die $!; no warnings 'uninitialized'; while ($_ = <$A> . <$B> . <$C>) { s/\n//g; print $out $_, "\n"; } } close $out or warn $!;
    That will let the files have different numbers of lines. Memory use is small, and independent of file size.

    Update: Repaired the thinko blazar++ spotted. Empty lines are not a problem - we don't chomp, so they retain newlines until we s/// them gone. I like blazar's extension to different numbers of files.

    After Compline,
    Zaxo

      Nice approach. And my be merged with mine, e.g.:

      #!/usr/bin/perl -l use strict; use warnings; my @fh=map { open my $fh, '<', $_ or die "Can't open `$_': $!\n"; $fh } @ARGV; no warnings 'uninitialized'; print while $_=join '', map { chomp(my $line=<$_>); $line } @fh, __END__

      However:

      • you should s/undefined/uninitialized/;
      • it may not be fully reliable if empty lines are to be expected in the files.

      Update: the second point was a thinko as Zaxo pointed out.

Re: parallel reading
by blazar (Canon) on May 09, 2006 at 13:49 UTC

    Since others already gave you good general purpose suggestions...

    #!/usr/bin/perl -l use strict; use warnings; my @fh=map { open my $fh, '<', $_ or die "Can't open `$_': $!\n"; $fh } @ARGV; while (@fh) { @fh=grep !eof $_, @fh; print map { chomp(my $line=<$_>); $line } @fh; } __END__
      blazar:

      Very nice (++)! I've never used map before, but that's an eye opener. It's so much better than my (admittedly terrible) hack, and clear to boot. That example is going on my "cheatsheet" of tips I keep pinned to my cube wall.

      --roboticus

        Now that you know... beware! It's easy to get addicted to map & grep. They're good for... the jobs they're good for! Do not abuse them!

Re: parallel reading
by roboticus (Chancellor) on May 09, 2006 at 12:40 UTC
    azaria

    If you're on a *nix box, you could use the paste command, e.g.:

    paste A B C
    But since you asked on perlmonks, you could try something like this (terrible) program:

    #!/usr/bin/perl -w use strict; use warnings; open(A,"<A") or die "Can't open A!"; open(B,"<B") or die "Can't open B!"; open(C,"<C") or die "Can't open C!"; my @a = <A>; my @b = <B>; my @c = <C>; while (1) { my $fl=0; my $aa = shift @a || ""; my $bb = shift @b || ""; my $cc = shift @c || ""; chomp $aa; chomp $bb; chomp $cc; print $aa, $bb, $cc, "\n"; next if $#a + $#b + $#c; last; }
    --roboticus
      First thanks for your reply. The example I gave is very shortly. The size of the input files might be change and might be big, so i guess it might influence the memory ? azaria

        In that case I wouldn't slurp in the files all at once. My solution is one that doesn't, copes with a different number of lines per file, and with an arbitrary number of files passed on the cmd line. Shameless self ad terminated! ;-)

Re: parallel reading
by graff (Chancellor) on May 09, 2006 at 12:34 UTC
    Try putting <code> and </code> around your data samples, so that we can see what the data really look like.

    What you want is what the unix "paste" command does. Someone has written a perl version of "paste" already (google for "perl power tools").

    (update: in case you have trouble finding it, here's the source for a perl implementation of paste: http://ppt.perl.org/commands/paste/paste.randy)

Re: parallel reading
by ashokpj (Hermit) on May 09, 2006 at 13:07 UTC

    Try this

    #!/usr/local/bin/perl open (INFILE1, "/home/ashokpj/merge1.txt") || die ("Cannot open input file merge1\n"); open (INFILE2, "/home/ashokpj/merge2.txt") || die ("Cannot open input file merge2\n"); open (INFILE3, "/home/ashokpj/merge3.txt") || die ("Cannot open input file merge2\n"); chomp($line1 = <INFILE1>); chomp($line2 = <INFILE2>); chomp($line3 = <INFILE3>); while ($line1 ne "" || $line2 ne "" || $line3 ne "" ) { print $line1.$line2.$line3."\n"; if ($line1 ne "") { chomp($line1 = <INFILE1>); } if ($line2 ne "") { chomp($line2 = <INFILE2>); } if ($line3 ne "") { chomp($line3 = <INFILE3>); } } close(INFILE1); close(INFILE2); close(INFILE3);
Re: parallel reading
by McDarren (Abbot) on May 09, 2006 at 13:32 UTC
    If we can make the assumption that each file has the same number of lines, then the following should work:
    #!/usr/bin/perl -w use strict; my %files; my @infiles = qw(fileA fileB fileC); for (@infiles) { open IN, "<", $_ or die "Cannot open $_:$!\n"; chomp(@{$files{$_}} = <IN>); close IN; } open OUT, ">", "fileD" or die "Cannot open fileD:$!\n"; for my $line (0 .. $#{$files{fileA}}) { for my $file (@infiles) { print OUT $files{$file}[$line]; } print OUT "\n"; } close OUT;
    $ cat fileD 111AAAaaa 222BBBbbb 333CCCccc
    Cheers,
    Darren :)
Re: parallel reading
by wfsp (Abbot) on May 09, 2006 at 12:38 UTC
    Hi azaria!

    Please advice how can i do it shortly?
    The very short answer is: write some code. :-)

    I would guess you need to open 3 files for input and 1 for output. Assuming fairly small input files, read the input into arrays, loop over them and build your output. Save your output to a file.

    Try it and let us know how you get on.

Re: parallel reading
by smokemachine (Hermit) on May 10, 2006 at 02:51 UTC
    Can be this?
    perl -e 'for(@ARGV){open FILE,$_;chomp($a[$.-1].=$_)while<FILE>;close +FILE}$,=$/;open FILE,">out";print FILE@a' A B C
Re: parallel reading
by whyxys (Initiate) on May 11, 2006 at 01:52 UTC
    JAPH(just another perl approach),hehe:
    perl -e 'map{chomp;$a[$i<3?$i:($i=0)].=$_;$i++}<>;print"@a\n";' filea +fileb filec
    assumption that each file has the same number of lines, here line=3 for ease