Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Merge the difference between two files

by hopper (Novice)
on Jun 16, 2017 at 06:00 UTC ( [id://1192908]=perlquestion: print w/replies, xml ) Need Help??

hopper has asked for the wisdom of the Perl Monks concerning the following question:

Hi! I need help in merging the difference between two files based on the same content within them and sorting and saving in the third file following are my example files:
File1: NAME, ID1, ID2 apple banana NAME, ID1, ID3 strawberry grape File 2: NAME, ID1, ID2 apple jackfruit NAME, ID1, ID4 banana grapes Desired output result: NAME, ID1, ID2 apple banana NAME, ID1, ID3 strawberry grape NAME, ID1, ID4 banana grapes However, this is what I get: NAME, ID1, ID2 NAME, ID1, ID3 NAME, ID1, ID4 #!/bin/perl -w use strict; use warnings; use File::Copy; use Cwd; my $dir = cwd; main(); sub main { printf "\nStarting script\n"; printf "\nEnter file 1: "; my $a = <STDIN>; chomp $a; printf "\n"; printf "Enter file 2: "; my $b = <STDIN>; chomp $b; my $output = "output.txt"; if(-e $a and -e $b) { my $counter = 0; my $counter2 = 0; my %results = (); open (FILEA, "<$a") or die "Input file $a not found.\n"; while(my $line = <FILEA>) { $counter++; if($line =~ /^NAME/) { my ($name, $variable1, $variable2) = split(',', $l +ine, 3); $results{$line}=1; } } close(FILEA); open (FILEB, "<$b") or die "Input file $b not found.\n"; while(my $line =<FILEB>) { if($line =~ /^NAME/) { my ($name, $variable1, $variable2) = split(',', $l +ine, 3); $results{$line}++; } } close(FILEB); open (OUTPUT, ">$output") or die "Cannot open $output for +writing \n"; foreach my $line (keys %results) { print OUTPUT "$line"; $counter = $counter if $counter != $counter2; } close OUTPUT; } }
I am new to perl so please someone could help me and direct me to the right direction.

Thanks so much in advance.

Replies are listed 'Best First'.
Re: Merge the difference between two files
by afoken (Chancellor) on Jun 16, 2017 at 06:15 UTC

    How does that problem differ from the other thread you created?

    Please don't start new threads for the same problem, stay in the old thread. And please don't remove or radically change the content of your postings. If you think you need to update a posting, mark the update as an such, e.g. with <b>Update:</b>. If you need to strike out content, use <strike></strike>.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
      Hi Alex,

      Please accept my apologies. I am a new user and didn't understand the rules and but I am picking now. On the first post, I want to print out the differences between two files then print out the differences to the third file which I plan to merge the third file with the first file. However, after deep thought, I think I want the code to merge the files after it finds the differences. and skip the diff function. Please give me suggestions or ideas that would help me improve and writer better the code.

      thanks so much in advance.

Re: Merge the difference between two files
by NetWallah (Canon) on Jun 16, 2017 at 06:23 UTC
    I did not notice afoken's note until after I wrote this code, so here it is.
    To do this right, you need a slightly complicated structure - a hash-of-arrays.
    #!/bin/perl -w use strict; use warnings; my @filenames = ("file1.txt", "file2.txt"); my %result; # Hash of arrays for my $fname(@filenames){ my $currentheader = ""; open my $f, "<", $fname or die "Cannot open '$fname' : $!"; while (my $line = <$f>){ chomp $line; next unless length($line); # Skip blank lines if ($line=~/NAME/){ $currentheader = $line; next; } push @{ $result{$currentheader} }, $line; } } # Print results for my $currentheader(sort keys %result){ print "\n$currentheader\n"; print "$_\n" for @{ $result{$currentheader} }; }

                    Once it hits the fan, the only rational choice is to sweep it up, package it, and sell it as fertilizer.

      The code is merging the headers of both files if they are different. However, I don't want it compares the headers, just want it compares and merge the if the line start with "NAME". Can anyone please help me with this task? Thanks so much in inadvance.

      Hi NetWallah, I really like the code that you wrote. However It is not printout out the way I want.

      Currently, it finds the line "NAME" if match then merge the content. however, I want to separate two files and just merge unmatched

      line/content "NAME" that is in file 2 to file#1. If the line "NAME" match ignore the content of the lines. can I print the result to a

      file and not on the display only?

      For example:(ignore the content, if line "NAME, VAR1, VAR2" match)

      File 1: NAME, VAR1, VAR2 apple, mange File 2: NAME, VAR1, VAR2 jack fruit, banana

        Try this, still using hash-of-arrays

        #!/bin/perl -w use strict; use warnings; # input my %hash=(); my @filename = ('File1.txt','File2.txt'); for my $n (0..1){ parse($n,$filename[$n]); }; output('output.txt'); # parse sub parse { my ($n,$filename) = @_; open my $fh,'<',$filename or die "$!"; my $key; while (<$fh>){ if (/^NAME/){ $key = $_; $hash{$key}[$n] = ''; } else { $hash{$key}[$n] .= $_ if $key; } } close $fh; } # merge sub output { my ($filename) = @_; open my $fh,'>',$filename or die "$!"; for my $key (sort keys %hash){ if (defined $hash{$key}[0]){ print $fh $key.$hash{$key}[0]; } else { print $fh $key.$hash{$key}[1]; } } close $fh; }
        poj
        To achieve this, after the line
        $currentheader = $line;
        add this:
        if (exists $result{$currentheader}){ $currentheader="RECYCLE_BIN"; }
        I will leave it as an exercise for you to figure hou how to avoid printing the "RECYCLE_BIN" item.

                        Once it hits the fan, the only rational choice is to sweep it up, package it, and sell it as fertilizer.

        can I print the result to a file and not on the display only?

        Did you even try to figure this out by yourself or are you just looking for someone to do all your work for you?

        https://perldoc.perl.org/functions/select.html
        select FILEHANDLE

        If FILEHANDLE is supplied, sets the new current default filehandle for output. ... , a write or a print without a filehandle default to this FILEHANDLE.

Re: Merge the difference between two files
by hippo (Bishop) on Jun 16, 2017 at 07:49 UTC
    #!/bin/perl -w use strict; use warnings; use File::Copy; use Cwd; my $dir = cwd; main(); sub main { # ... }

    Aside from the first 3 lines here, I am intrigued by how you have structured this code. Did you learn this from a course or a book or are you translating from some other language?

    Can you explain why you have used File::Copy? Or Cwd? Why do you set $dir and then never subsequently refer to it (and BTW the same for $name, $variable1, $variable2)? What is the purpose of encapsulating code in sub main() when there's nothing relevant outside that scope?

    As you say you are new to Perl it would probably be beneficial for you to understand what the code you have already written is actually doing. If you cannot explain what you have written, start from code you do understand instead.

    Good luck.

      Thanks for taking your time to look over my code. Like I said, I am new to Perl and would like to learn how to do multiple tasks using Perl so I self taught myself and didn't learn this language from school or book. You are correct, I should remove the lines "used File::Copy? Or Cwd:" since I don't know what they mean. I was copy the template from other blogs and forgot to remove them. I want to use the "my $dir.." because I want to verify and make the files that I am working on are in the same directory as the Perl scrip. I used sub main because I need to the script to do multiple tasks after it is merging. Thanks gain.
        You are correct, I should remove the lines "used File::Copy? Or Cwd:" since I don't know what they mean.

        Perl is one of those long-established languages which has very extensive documentation. Your installation should have the perldoc command which will help but if it is not there you can use the online version instead (see how the word perldoc looks like a link? It is one!). The customary place to start is with perlintro but you can follow the "use" link in my previous post which explains in a rather dry way what that command does. You have lots of reading ahead of you!

        I was copy the template from other blogs and forgot to remove them.

        It's tempting when learning a new language to just copy code from somewhere else and hack it about to make it do something slightly different. However, it does pay to understand the code you are copying before you start to hack it about.

        I want to use the "my $dir.." because I want to verify and make the files that I am working on are in the same directory as the Perl scrip.

        Fine, but you are not using it in this example. The principle of an SSCCE is to reduce the example code to the minimum required (the first S stands for "Short"). It's a distraction to you and us so you may as well remove it for now. $dir isn't doing what you think anyway.

        I used sub main because I need to the script to do multiple tasks after it is merging.

        OK, but be careful about scoping. And try to avoid the particular word "main" since that's also a built-in package name in Perl and as a beginner you may get confused between warnings/errors which relate to either your "main" sub or the main package.

        Keep at it - you are at the beginning of a long, winding, fascinating and ultimately fulfilling path. We all started where you are now.

Re: Merge the difference between two files
by tybalt89 (Monsignor) on Jun 16, 2017 at 13:18 UTC
    #!/usr/bin/perl # http://perlmonks.org/?node_id=1192908 use strict; use warnings; use Algorithm::Diff qw(traverse_sequences); open my $fh1, '<', \<<END; NAME, ID1, ID2 apple banana NAME, ID1, ID3 strawberry grape END open my $fh2, '<', \<<END; NAME, ID1, ID2 apple jackfruit NAME, ID1, ID4 banana grapes END $/ = undef; my @file1 = <$fh1> =~ /^NAME.*\n(?:(?!NAME).*\n)*/gm; close $fh1; my @file2 = <$fh2> =~ /^NAME.*\n(?:(?!NAME).*\n)*/gm; close $fh2; #use Data::Dump 'pp'; pp \@file1; pp \@file2; traverse_sequences( [ map /(.*)/, @file1 ], # compare only first lines [ map /(.*)/, @file2 ], { MATCH => sub { print $file1[shift()] }, DISCARD_A => sub { print $file1[shift()] }, DISCARD_B => sub { print $file2[pop()] }, } );
      I am trying to test the code and it gives me errors. The errors are "Can't locate Algorithm/Diff.pm in @INC @INC contains...., Begin failed --complication aborted.. Can I open the files and read the lines instead add them to the code? Thanks in advance for looking over my code and giving me guidance.

        Install Algorithm::Diff from cpan.

Re: Merge the difference between two files
by marinersk (Priest) on Jun 17, 2017 at 04:17 UTC

    Hello, lonnie.

    I've made some changes to your code, similar to what dbander asked you to do in your previous and strikingly similar request for help.

    I get the same output you do, so we're on the right track.

    I will only comment on the things which prevent your code from working. There's a lot more we should discuss later.

    Your main problem seems to be in this logic:

    if ( $line =~ /^NAME/ ) { my ( $name, $variable1, $variable2 ) = split( ',', $line, 3 ); $results{$line} = 1; print " SET \$results[$line] = $results{$line}\n"; }

    In regular English, this code:

    1. Reads each file
    2. Skips any line that doesn't start with "NAME"
    3. Saves all the lines which do start with "NAME" to a hash.

    Then, at the end, you print out all the lines you saved -- which are the ones which start with "NAME".

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1192908]
Approved by Marshall
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (4)
As of 2024-04-23 23:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found