Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

comparing contents of two arrays and output differences

by PitifulProgrammer (Acolyte)
on Jan 02, 2015 at 10:53 UTC ( [id://1111985]=perlquestion: print w/replies, xml ) Need Help??

PitifulProgrammer has asked for the wisdom of the Perl Monks concerning the following question:

Dear Perlmonks,

first off, happy new year to all of you.

What better way to start a new year than with a perl question. I am still a noob, not having had many chances to dig deeper into coding in the last months.

The issue is as follows:

After running a piece of code that finds particular files and replaces some characters if need be, the output folder contains the original files and backup files marked by .bak.

Previously, it was sufficient to compare the files manually using a diff tool. I also used the UNIX diff if there were more than a few files to check.

Recently, the number of folders and files has increased and I would like the comparison to be script-based, too.

The trouble is that I cannot find a starting point. I have been toying with modules such as Array::Diff and Array::Compare, but the output was always a hash reference not a report ( at least I think it was a hash reference ).

I also tried out Text::Diff leading to the same result

To put it in a nutshell: Given two arrays in a given folder, e.g.:

my @xml_files = glob( '*xml' ); #say for @xml_files; my @bak_files = glob( '*bak' ); #say for @bak_files; __END__ These would be the files: file_01.xml file_02.xml file_03.xml file_01.xml.bak file_02.xml.bak file_03.xml.bak

I would like to compare file_01.xml with file_01.xml.bak and so on. I was considering of maybe creating a hash but refrained from doing so since, as far as I have been told, hash items are unordered.

I am even not sure by now if separating the files into 2 arrays is a wise move.

Could somebody please give me a hint about how to approach this problem?

Please find my endeavours below, some of which have already been posted here in the forum and I would also like to thank those people. I am just adding these snippets, just in case I missed a convenient approach already.

Personally, the main difficulty is to think of a solution how to make sure that file_01.xml is compared with file_01.xml.bak and the moving on to the next and selecting the right data structure for doing so.

Thanks in advance for your suggestions

Kind regards

C.
use 5.018; use strict; use warnings; use Data::Dumper; use File::Glob; use List::Compare; use Array::Diff; use Array::Compare; #Separating xml and backup files my @xml_files = glob( '*xml' ); #say for @xml_files; my @bak_files = glob( '*bak' ); #say for @bak_files; #Show differences between file_01.xml and file_01.xml.bak, etc... my $diff_arrays = Array::Diff -> diff( \@xml_files, \@bak_files ); my $count = $diff_arrays -> count; my $added = $diff_arrays -> added; my $deleted = $diff_arrays -> deleted; #say $deleted; #Doing the same thing with Array::Compare my $compared = Array::Compare->new(DefFull => 1); my $differences = $compared -> full_compare(\@xml_files, \@bak_files); + # Full comparison say for $differences; __END__ my $are_equal = compare_arrays( \@xml_files, \@bak_files ); sub compare_arrays{ my( $first, $second ) = @_; # any array used by code or cmd return 0 unless @$first == @$second; for ( my $i = 0, $i < $first, $i++ ){ return 0 if $first -> [$i] ne $second -> [$i]; } return 1; } ###################################################################### +## if (compare ( glob( *bak, *xml) ) == 0) { print "They're equal\n"; } ###################################################################### +### foreach my $file( @files ){ if (compare ( glob(*.bak, *.xml) ) == 0) { print "They're equal\n"; } } my @files = $ARGV[0]; ###################################################################### +### foreach my $element( @xml_files ){ if ( $element ~~ $bak_files[$counter] ){ say "equal!!!"; } else { say "not equal!!!"; say $element; } $counter++; }

Replies are listed 'Best First'.
Re: comparing contents of two arrays and output differences
by RichardK (Parson) on Jan 02, 2015 at 11:27 UTC

    Why not just check if the backup exists for each input file using one of the filetest operators -X ?

    for my $file (glob('*.xml') ) { if (-r "$file.bak" ) { print "backup found for $file\n"; } }

      Dear RichardK

      Thanks a mil for your reply. To be honest I have not yet considered the possibility of checking if there was a backup of the files ( since I trust the script :) ). I will surely implement your suggestion. It complements the code nicely, espcially if one considers that I won't be the one using the script

      I am however more interested in what was changed in each of the files ( if changes were made ), hence the reference to diff utilities.

      I am sorry if the description of my problem might have been a misleading or not that well written

      Thanks a mil again for your input

      Kind regards

      C.

        The code you posted only compares the filenames not the content of the files, hence my confusion.

        why not shell out to `diff` if that's giving you the results you need? or try something from cpan like Text::Diff ?

Re: comparing contents of two arrays and output differences
by Anonymous Monk on Jan 02, 2015 at 11:07 UTC
    Some thoughts
    my @xml_files = glob( '*xml' ); #say for @xml_files; my @bak_files = glob( '*bak' ); #say for @bak_files; dddie("impossible, uneven number of files ", \@xml_files, \@bak_files +) if @xml_files != @bak_files; my %seen; for my $orig ( @xml_files ) { my $bak = "$orig.bak"; $seen{ $orig } ++; $seen{ $bak } ++; } for my $bak ( @bak_files ){ my $orig = $bak; $orig =~ s/\.bak$//; die "IMPOSSIBLE" if not exists $seen{ $orig } and not exists $seen +{ $bak }; }
    </c>

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1111985]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (4)
As of 2024-04-19 15:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found