Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Perl command line for table join functionality

by Anonymous Monk
on Apr 30, 2014 at 14:58 UTC ( [id://1084515]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hey monks!

I work in a unix environment all day and have grown very efficient through use of perl and unix on the command prompt.

I still have not developed a good method of joining data in two separate files based on similarity or dissimilarity criteria. For example, does anyone have a good solution for implementing the following process on the command line?
use strict; use Data::Dumper; ###################### ###################### Read in the files open (FILEHANDLE, "$ARGV[0]") || die("Could not open input file"); my @File1 = <FILEHANDLE> ; close (FILEHANDLE); open (FILEHANDLE, "$ARGV[1]") || die("Could not open input file"); my @File2 = <FILEHANDLE> ; close (FILEHANDLE); ##### my %hash1; ######## Read first file into a hash with first element a key and the +entire line the value foreach ( @File1) { chomp; my @file1_elements = split (/\t/,$_); push(@{$hash1{$file1_elements[0]}},$_); } foreach (@File2){ chomp; my @file2_elements = split (/\t/,$_); if exists ($hash1{file2_elements[5}) { print $_ . "\t" . "@{$hash1{$file2_elements[5]}}" ."\n"; ##### Prints +the current line in file 2 and adds on the line in file1 where file1[ +0] == file2[5]. I understand that if there is more than one value i w +ill need to put a loop in to print out all the values but lets leave +that out just to simplify. } }

Now again I'm just looking to do this kind of thing on the command line, if you can throw in the print loop some smart way that would be even better

Thank you so much for your time!

Replies are listed 'Best First'.
Re: Perl command line for table join functionality
by Bloodnok (Vicar) on Apr 30, 2014 at 15:14 UTC
    It sounds rather like join is the tool that you seek ... e.g.
    sort file2 > file2.sorted sort file1 | join -26 - file2.sorted
    or similar ...

    A user level that continues to overstate my experience :-))
Re: Perl command line for table join functionality
by Laurent_R (Canon) on Apr 30, 2014 at 19:22 UTC
    Your program does a lot of useless things. No point to store your files into arrays and then process the arrays, process the files directly. Also a lot of useless parens. This is a possible rewrite, about twice shorter (untested, I don't have data, and not sure it does exactly what you want, some of your code seemed a bit strange to me, I might have changed it the wrong way, not having data does not help):
    use strict; use warnings; my %hash1; open my $FH, "<", $ARGV[0] or die "Could not open input file $ARGV[0] +$!"; while (chomp (my $line = <$FH>)) { my $file1_element = (split /\t/, $line)[5]; $hash1{$file1_element} = $_; } close $FH; open my $FH2, "<", $ARGV[1] or die "Could not open input file $ARGV[1] + $!"; while (chomp (my $line = <$FH2>)) { my $file2_element = (split /\t/, $line)[5]; print $_ . "\t$hash1{$file2_element}\n" if exists $hash1{$file2_el +ement}; }
    Now, if you want to do this at the command line, you can just do this:
    $ perl -e ' use strict; > use warnings; > my %hash1; > open my $FH> , "<",m $yA RG%V[0] or die "Could not open input file $ +1ARGV[0;] $!"; > open my $FH, "<", $ARGV[0] or die "Could not open input file $ARGV[0 +] $!"; > while (chomp (my $line = <$FH>)) { > my $file1_element = (split /\t/, $line)[5]; > $hash1{$file1_element} = $_; > } > close $FH; > open my $FH2, "<", $ARGV[1] or die "Could not open input file $ARGV[ +1] $!"; > while (chomp (my $line = <$FH2>)) { > my $file2_element = (split /\t/, $line)[5]; > print $_ . "\t$hash1{$file2_element}\n" if exists $hash1{$file2_elem +ent}; > } ' file1.txt file2.txt
    But why should you want to do this? Why not having a real program in a file? Command line instructions are good for very short code, not for this. I could probably reduce the code by another half on the command line, perhaps even a bit more than that, but this is still too long for a one-liner of a pure prompt command.

    Edit: Modified slightly the code, as some things coming from the OP code did not seem right to me.

      How about this ?

      perl -ne 'print if ($seen{$_} .= @ARGV) =~ /10$/' fileA fileB >output
Re: Perl command line for table join functionality
by graff (Chancellor) on Apr 30, 2014 at 21:02 UTC
    What the OP code is doing seems very similar to what I was doing when I wrote this: cmpcol HTH.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1084515]
Approved by Bloodnok
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (5)
As of 2024-04-16 15:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found