Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

HTML tag compares between similar files

by jlawrenc (Scribe)
on Sep 15, 2000 at 23:28 UTC ( [id://32721]=CUFP: print w/replies, xml ) Need Help??

A simple program to assist you to sytematically identify what sort of changes occured between that last version of a document and this one. Specifically geared at picking up subtle changes like table cell widths, etc.
I was thinking of using more elaborate means to diff a couple of HTML documents but this serves my needs when they've (our friends the HTML designers) just been fiddling and the two docs are basically the same.
#!/usr/bin/perl # -w not used because of a few noisy warnings in write's # tag_comp.pl # - jlawrenc@infonium.com - use at your own risk # # A quick 'n dirty to help you compare HTML tags across two similar do +cuments. # # This happens to me from time to time. We have an HTML template that +has been # adapted for server-side use. Then the graphic designer goes off and +reformats # with different fonts, tag sizes or whatever. It could be easer to sc +ope out the # changes and then just re-edit our template document rather than rewo +rking the # supplied HTML back into a template. # # Invoke thusly: # tag_comp fn1 fn2 [tag [shift]] # # ie/ # tag_comp index.html new_index.html table # generates a report of how the <table> tag is used differently bet +ween the two # documents # # tag_comp index.html new_index.html img 2 # a report of how <img> tags have changed shifting the left col up +a couple # of rows to help line up the differences # # # Things to consider # a - tag regex is real simple "<" + not > 1 or more times + ">" # this may not always work for you # b - tag compares are lowercased # # It would be nice to try and line up the matches more effectively bu +t a humon # will do the job for now. # Report header format STDOUT_TOP = ---------------------------------------------------------------------- +----------- @|||||||||||||||||||||||||||||||||||||| | @||||||||||||||||||||||||||| +||||||||||| $fn1, $fn2 ---------------------------------------------------------------------- +----------- . # Report body - lines that do not match format STDOUT = ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<~~ | ^<<<<<<<<<<<<<<<<<<<<<<<<<<< +<<<<<<<<<~~ $srch1[$i], $srch2[$i] ---------------------------------------------------------------------- +----------- . # Report body - lines that do match format STDOUT_MATCH = * match: ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< +<<<<<<<<<~~ $srch1[$i] ---------------------------------------------------------------------- +----------- . # Our input arguments - file name1, file name2, tag to report on, shif +t value ($fn1, $fn2, $tag, $shift) = @ARGV; if (!$fn1 or !$fn2) { die "Please supply two file names to compare."; } # Default to "img" tags if (!$tag) { $tag="img"; print STDERR "Defaulting to search for <$tag>s\n\n"; } # Check for positive shift if ($shift<0) { print STDERR "shift only works with positive vals.\n"; print STDERR "if you want to shift the other way then try reversing +your file names. :)\n"; } # Slurp our files undef $/; open FIN, $fn1; $file1=<FIN>; open FIN, $fn2; $file2=<FIN>; # Grab our tags - real crude regex that may not always do the trick while ($file1 =~ /(<[^>]+>)/gms) { push @tags1, $1; } while ($file2 =~ /(<[^>]+>)/gms) { push @tags2, $1; } # Get our list of matching tags @srch1=grep /^<$tag(\s|>)/i, @tags1; @srch2=grep /^<$tag(\s|>)/i, @tags2; # Shift first search result if needed for ($i=0; $i<$shift; $i++) { unshift @srch1, ""; } # Find out who has more rows - set1 or 2 $rows=$#srch1 > $#srch2 ? $#srch1 : $#srch2; # Write our header $~="STDOUT_TOP"; write; # Write report body foreach ($i=0; $i<=$rows; $i++) { # One format for rows that are the same, another for those that are +not if (lc $srch1[$i] ne lc $srch2[$i]) { $~="STDOUT"; write; } else { $~="STDOUT_MATCH"; write; } } # Done - coffee time

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: CUFP [id://32721]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (4)
As of 2024-04-19 21:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found