Merge the difference between two files

hopper has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Merge the difference between two files by afoken (Chancellor) on Jun 16, 2017 at 06:15 UTC
How does that problem differ from the other thread you created? Please don't start new threads for the same problem, stay in the old thread. And please don't remove or radically change the content of your postings. If you think you need to update a posting, mark the update as an such, e.g. with `<b>Update:</b>`. If you need to strike out content, use `<strike></strike>`. Alexander -- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)	[reply] [d/l] [select]
Re^2: Merge the difference between two files by hopper (Novice) on Jun 17, 2017 at 01:13 UTC
Hi Alex, Please accept my apologies. I am a new user and didn't understand the rules and but I am picking now. On the first post, I want to print out the differences between two files then print out the differences to the third file which I plan to merge the third file with the first file. However, after deep thought, I think I want the code to merge the files after it finds the differences. and skip the diff function. Please give me suggestions or ideas that would help me improve and writer better the code. thanks so much in advance.	[reply]
Re: Merge the difference between two files by NetWallah (Canon) on Jun 16, 2017 at 06:23 UTC
I did not notice afoken's note until after I wrote this code, so here it is. To do this right, you need a slightly complicated structure - a hash-of-arrays. #!/bin/perl -w use strict; use warnings; my @filenames = ("file1.txt", "file2.txt"); my %result; # Hash of arrays for my $fname(@filenames){ my $currentheader = ""; open my $f, "<", $fname or die "Cannot open '$fname' : $!"; while (my $line = <$f>){ chomp $line; next unless length($line); # Skip blank lines if ($line=~/NAME/){ $currentheader = $line; next; } push @{ $result{$currentheader} }, $line; } } # Print results for my $currentheader(sort keys %result){ print "\n$currentheader\n"; print "$_\n" for @{ $result{$currentheader} }; } [download] Once it hits the fan, the only rational choice is to sweep it up, package it, and sell it as fertilizer.	[reply] [d/l]
Re^2: Merge the difference between two files by hopper (Novice) on Jun 20, 2017 at 04:08 UTC
The code is merging the headers of both files if they are different. However, I don't want it compares the headers, just want it compares and merge the if the line start with "NAME". Can anyone please help me with this task? Thanks so much in inadvance.	[reply]
Re^2: Merge the difference between two files by hopper (Novice) on Jun 16, 2017 at 18:07 UTC
Hi NetWallah, I really like the code that you wrote. However It is not printout out the way I want. Currently, it finds the line "NAME" if match then merge the content. however, I want to separate two files and just merge unmatched line/content "NAME" that is in file 2 to file#1. If the line "NAME" match ignore the content of the lines. can I print the result to a file and not on the display only? For example:(ignore the content, if line "NAME, VAR1, VAR2" match) `File 1: NAME, VAR1, VAR2 apple, mange File 2: NAME, VAR1, VAR2 jack fruit, banana` [download]	[reply] [d/l]
Re^3: Merge the difference between two files by poj (Abbot) on Jun 16, 2017 at 19:36 UTC
Try this, still using hash-of-arrays #!/bin/perl -w use strict; use warnings; # input my %hash=(); my @filename = ('File1.txt','File2.txt'); for my $n (0..1){ parse($n,$filename[$n]); }; output('output.txt'); # parse sub parse { my ($n,$filename) = @_; open my $fh,'<',$filename or die "$!"; my $key; while (<$fh>){ if (/^NAME/){ $key = $_; $hash{$key}[$n] = ''; } else { $hash{$key}[$n] .= $_ if $key; } } close $fh; } # merge sub output { my ($filename) = @_; open my $fh,'>',$filename or die "$!"; for my $key (sort keys %hash){ if (defined $hash{$key}[0]){ print $fh $key.$hash{$key}[0]; } else { print $fh $key.$hash{$key}[1]; } } close $fh; } [download] poj	[reply] [d/l]
Re^3: Merge the difference between two files by NetWallah (Canon) on Jun 17, 2017 at 04:53 UTC
To achieve this, after the line `$currentheader = $line;` [download] add this: `if (exists $result{$currentheader}){ $currentheader="RECYCLE_BIN"; }` [download] I will leave it as an exercise for you to figure hou how to avoid printing the "RECYCLE_BIN" item. Once it hits the fan, the only rational choice is to sweep it up, package it, and sell it as fertilizer.	[reply] [d/l] [select]
Re^4: Merge the difference between two files by hopper (Novice) on Jun 22, 2017 at 21:12 UTC
Re^5: Merge the difference between two files by NetWallah (Canon) on Jun 23, 2017 at 19:21 UTC
Re^3: Merge the difference between two files by huck (Prior) on Jun 16, 2017 at 18:40 UTC
can I print the result to a file and not on the display only? Did you even try to figure this out by yourself or are you just looking for someone to do all your work for you? https://perldoc.perl.org/functions/select.html select FILEHANDLE If FILEHANDLE is supplied, sets the new current default filehandle for output. ... , a write or a print without a filehandle default to this FILEHANDLE.	[reply]
Re: Merge the difference between two files by hippo (Bishop) on Jun 16, 2017 at 07:49 UTC
`#!/bin/perl -w use strict; use warnings; use File::Copy; use Cwd; my $dir = cwd; main(); sub main { # ... }` [download] Aside from the first 3 lines here, I am intrigued by how you have structured this code. Did you learn this from a course or a book or are you translating from some other language? Can you explain why you have used File::Copy? Or Cwd? Why do you set `$dir` and then never subsequently refer to it (and BTW the same for `$name, $variable1, $variable2`)? What is the purpose of encapsulating code in sub `main()` when there's nothing relevant outside that scope? As you say you are new to Perl it would probably be beneficial for you to understand what the code you have already written is actually doing. If you cannot explain what you have written, start from code you do understand instead. Good luck.	[reply] [d/l] [select]
Re^2: Merge the difference between two files by hopper (Novice) on Jun 16, 2017 at 16:35 UTC
Thanks for taking your time to look over my code. Like I said, I am new to Perl and would like to learn how to do multiple tasks using Perl so I self taught myself and didn't learn this language from school or book. You are correct, I should remove the lines "used File::Copy? Or Cwd:" since I don't know what they mean. I was copy the template from other blogs and forgot to remove them. I want to use the "my $dir.." because I want to verify and make the files that I am working on are in the same directory as the Perl scrip. I used sub main because I need to the script to do multiple tasks after it is merging. Thanks gain.	[reply]
Re^3: Merge the difference between two files by hippo (Bishop) on Jun 17, 2017 at 09:02 UTC
You are correct, I should remove the lines "used File::Copy? Or Cwd:" since I don't know what they mean. Perl is one of those long-established languages which has very extensive documentation. Your installation should have the perldoc command which will help but if it is not there you can use the online version instead (see how the word perldoc looks like a link? It is one!). The customary place to start is with perlintro but you can follow the "use" link in my previous post which explains in a rather dry way what that command does. You have lots of reading ahead of you! I was copy the template from other blogs and forgot to remove them. It's tempting when learning a new language to just copy code from somewhere else and hack it about to make it do something slightly different. However, it does pay to understand the code you are copying before you start to hack it about. I want to use the "my $dir.." because I want to verify and make the files that I am working on are in the same directory as the Perl scrip. Fine, but you are not using it in this example. The principle of an SSCCE is to reduce the example code to the minimum required (the first S stands for "Short"). It's a distraction to you and us so you may as well remove it for now. $dir isn't doing what you think anyway. I used sub main because I need to the script to do multiple tasks after it is merging. OK, but be careful about scoping. And try to avoid the particular word "main" since that's also a built-in package name in Perl and as a beginner you may get confused between warnings/errors which relate to either your "main" sub or the main package. Keep at it - you are at the beginning of a long, winding, fascinating and ultimately fulfilling path. We all started where you are now.	[reply]
Re: Merge the difference between two files by tybalt89 (Monsignor) on Jun 16, 2017 at 13:18 UTC
#!/usr/bin/perl # http://perlmonks.org/?node_id=1192908 use strict; use warnings; use Algorithm::Diff qw(traverse_sequences); open my $fh1, '<', \<<END; NAME, ID1, ID2 apple banana NAME, ID1, ID3 strawberry grape END open my $fh2, '<', \<<END; NAME, ID1, ID2 apple jackfruit NAME, ID1, ID4 banana grapes END $/ = undef; my @file1 = <$fh1> =~ /^NAME.\n(?:(?!NAME).\n)/gm; close $fh1; my @file2 = <$fh2> =~ /^NAME.\n(?:(?!NAME).\n)/gm; close $fh2; #use Data::Dump 'pp'; pp \@file1; pp \@file2; traverse_sequences( [ map /(.)/, @file1 ], # compare only first lines [ map /(.)/, @file2 ], { MATCH => sub { print $file1[shift()] }, DISCARD_A => sub { print $file1[shift()] }, DISCARD_B => sub { print $file2[pop()] }, } ); [download]	[reply] [d/l]
Re^2: Merge the difference between two files by hopper (Novice) on Jun 16, 2017 at 19:15 UTC
I am trying to test the code and it gives me errors. The errors are "Can't locate Algorithm/Diff.pm in @INC @INC contains...., Begin failed --complication aborted.. Can I open the files and read the lines instead add them to the code? Thanks in advance for looking over my code and giving me guidance.	[reply]
Re^3: Merge the difference between two files by tybalt89 (Monsignor) on Jun 16, 2017 at 19:34 UTC
Install Algorithm::Diff from cpan.	[reply]
Re: Merge the difference between two files by marinersk (Priest) on Jun 17, 2017 at 04:17 UTC
Hello, lonnie. I've made some changes to your code, similar to what dbander asked you to do in your previous and strikingly similar request for help. I get the same output you do, so we're on the right track. I will only comment on the things which prevent your code from working. There's a lot more we should discuss later. Your main problem seems to be in this logic: `if ( $line =~ /^NAME/ ) { my ( $name, $variable1, $variable2 ) = split( ',', $line, 3 ); $results{$line} = 1; print " SET \$results[$line] = $results{$line}\n"; }` [download] In regular English, this code: Reads each file Skips any line that doesn't start with "NAME" Saves all the lines which do start with "NAME" to a hash. Then, at the end, you print out all the lines you saved -- which are the ones which start with "NAME".	[reply] [d/l]