Perl command line for table join functionality

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hey monks!

I work in a unix environment all day and have grown very efficient through use of perl and unix on the command prompt.

I still have not developed a good method of joining data in two separate files based on similarity or dissimilarity criteria. For example, does anyone have a good solution for implementing the following process on the command line?

use strict;
use Data::Dumper;
######################
###################### Read in the files 
open (FILEHANDLE, "$ARGV[0]") || die("Could not open input file");
    my @File1 = <FILEHANDLE> ;
close (FILEHANDLE);
open (FILEHANDLE, "$ARGV[1]") || die("Could not open input file");
    my @File2 = <FILEHANDLE> ;
close (FILEHANDLE);

#####
my %hash1;

######## Read first file into a hash with first element a key and the 
+entire line the value
foreach ( @File1) {
chomp;
my @file1_elements = split (/\t/,$_);
push(@{$hash1{$file1_elements[0]}},$_);
}
foreach (@File2){
chomp;
my @file2_elements = split (/\t/,$_);
if exists ($hash1{file2_elements[5}) {
print $_ . "\t" . "@{$hash1{$file2_elements[5]}}" ."\n"; ##### Prints 
+the current line in file 2 and adds on the line in file1 where file1[
+0] == file2[5]. I understand that if there is more than one value i w
+ill need to put a loop in to print out all the values but lets leave 
+that out just to simplify. 
     }
}
[download]

Now again I'm just looking to do this kind of thing on the command line, if you can throw in the print loop some smart way that would be even better

Thank you so much for your time!

Comment on Perl command line for table join functionality Download Code

Replies are listed 'Best First'.
Re: Perl command line for table join functionality by Bloodnok (Vicar) on Apr 30, 2014 at 15:14 UTC
It sounds rather like `join` is the tool that you seek ... e.g. `sort file2 > file2.sorted sort file1 \| join -26 - file2.sorted` [download] or similar ... A user level that continues to overstate my experience :-))	[reply] [d/l] [select]
Re: Perl command line for table join functionality by Laurent_R (Canon) on Apr 30, 2014 at 19:22 UTC
Your program does a lot of useless things. No point to store your files into arrays and then process the arrays, process the files directly. Also a lot of useless parens. This is a possible rewrite, about twice shorter (untested, I don't have data, and not sure it does exactly what you want, some of your code seemed a bit strange to me, I might have changed it the wrong way, not having data does not help): `use strict; use warnings; my %hash1; open my $FH, "<", $ARGV[0] or die "Could not open input file $ARGV[0] +$!"; while (chomp (my $line = <$FH>)) { my $file1_element = (split /\t/, $line)[5]; $hash1{$file1_element} = $_; } close $FH; open my $FH2, "<", $ARGV[1] or die "Could not open input file $ARGV[1] + $!"; while (chomp (my $line = <$FH2>)) { my $file2_element = (split /\t/, $line)[5]; print $_ . "\t$hash1{$file2_element}\n" if exists $hash1{$file2_el +ement}; }` [download] Now, if you want to do this at the command line, you can just do this: $ perl -e ' use strict; > use warnings; > my %hash1; > open my $FH> , "<",m $yA RG%V[0] or die "Could not open input file $ +1ARGV[0;] $!"; > open my $FH, "<", $ARGV[0] or die "Could not open input file $ARGV[0 +] $!"; > while (chomp (my $line = <$FH>)) { > my $file1_element = (split /\t/, $line)[5]; > $hash1{$file1_element} = $_; > } > close $FH; > open my $FH2, "<", $ARGV[1] or die "Could not open input file $ARGV[ +1] $!"; > while (chomp (my $line = <$FH2>)) { > my $file2_element = (split /\t/, $line)[5]; > print $_ . "\t$hash1{$file2_element}\n" if exists $hash1{$file2_elem +ent}; > } ' file1.txt file2.txt [download] But why should you want to do this? Why not having a real program in a file? Command line instructions are good for very short code, not for this. I could probably reduce the code by another half on the command line, perhaps even a bit more than that, but this is still too long for a one-liner of a pure prompt command. Edit: Modified slightly the code, as some things coming from the OP code did not seem right to me.	[reply] [d/l] [select]
Re^2: Perl command line for table join functionality by Anonymous Monk on May 07, 2014 at 11:28 UTC
How about this ? `perl -ne 'print if ($seen{$_} .= @ARGV) =~ /10$/' fileA fileB >output`	[reply] [d/l]
Re: Perl command line for table join functionality by graff (Chancellor) on Apr 30, 2014 at 21:02 UTC
What the OP code is doing seems very similar to what I was doing when I wrote this: cmpcol HTH.	[reply]


Syntactic Confectionery Delight
	PerlMonks