Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re: Website for small perl scripts

by marto (Cardinal)
on Oct 22, 2018 at 06:47 UTC ( [id://1224445]=note: print w/replies, xml ) Need Help??


in reply to Website for small perl scripts

"If two JPG files have the exact same size, then we shall open both files for reading and we'll compare the first 70000 bytes. And if they are an exact match, then I assume that the two photos are the same."

That's a bad assumption, and one you don't need to make. If file sizes match just compare hashes (Digest::MD5, Digest::MurmurHash).

Replies are listed 'Best First'.
Re^2: Website for small perl scripts
by cavac (Parson) on Oct 22, 2018 at 09:45 UTC
Re^2: Website for small perl scripts
by harangzsolt33 (Chaplain) on Oct 22, 2018 at 07:23 UTC
    Thank you for all the links! I am very glad. I do not have Linux, but I will try to install one to find the Perl scripts. :)

    I think, in order to calculate the MD5 hash, we have to read the entire file. But if we're going to read the entire file, then why not just compare every byte? It would be less work.

      I think, in order to calculate the MD5 hash, we have to read the entire file. But if we're going to read the entire file, then why not just compare every byte? It would be less work.

      That's true, and see File::Compare. Hashes have the advantage that they can be cached and written to a file for later comparisons; personally I always usually use hashes (although typically one of the bigger SHA variants).

      "Thank you for all the links! I am very glad. I do not have Linux, but I will try to install one to find the Perl scripts. :)"

      If you're on Windows or Mac you should also be able to install modules. What issue do you have?

      "But if we're going to read the entire file, then why not just compare every byte? It would be less work."

      If there were only two matching files a direct comparison would be quicker, since you don't know that this is going to be the case a hash makes more sense from a performance perspective.

        If you're on Windows or Mac you should also be able to install modules. What issue do you have?

        I'm using tinyperl on Windows XP. I am not looking for modules but ready-to-use scripts that do various things such as what I am working on right now, the "duplicate file remover."

        This project is more like a programming exercise for me. So, even if I find that someone had written this script, I still want to finish writing my own. But it would be neat to see more working scripts. I will probably get a copy of linux and install it on my computer, because I want those perl scripts, especially if they are old, because tinyperl is kind of old.

        I like strawberry perl, but I like tinyperl better, because it is tiny. Lol It only takes me 10 seconds to install on a new computer. It's very convenient, and doesn't take up much space.

        Btw JPG photos are not like random binary files. I think, it is safe to assume that if I compare the first 70000 bytes, then the photos are the same. Why? JPG photos are so special that even if you try to take two shots of the clear blue sky, you're going to end up with two different files. If you zoom in, there is not a single pixel that is the same! Also, JPG files have a header that contains the name of the camera, the precise date & time the photo was taken and sometimes even the GPS location. If you edit a photo and change just one pixel and save it, the entire file changes and all the header info changes. So, the chances of having two different JPG photos whose sizes match and the first 70000 bytes match is infinitely small.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1224445]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (7)
As of 2024-04-23 19:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found