Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

[OT] Tar file with non-identical duplicate files and no paths?

by BrowserUk (Patriarch)
on Sep 13, 2008 at 08:45 UTC ( [id://711066]=perlquestion: print w/replies, xml ) Need Help??

BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

The following shows the list of a tar file. Note that Pg.html and Pg.pm are duplicated. Whislts there are many 'directories', they are all empty and when you untar it, everything is dumped into the current directory.

For the duplicates, it extracts the first of each and refuses to overwite it with the second. It is the second (of each) that I need. How can I get them?

Is this a GNU vs. POSIX tar problem> Is there some magic switch I'm missing?

C:>tar -tf DBD-Pg-2.10.0-Perl5.8.tar .exists .exists Pg.bs Pg.dll Pg.dll.manifest Pg.exp Pg.lib Pg.pdb Pg DBD auto arch .exists bin Pg.html DBD Bundle Pg.html DBD lib site html .exists Pg DBD auto Pg.pm DBD Bundle .exists Pg.pm DBD lib .exists script blib

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re: [OT] Tar file with non-identical duplicate files and no paths?
by johngg (Canon) on Sep 13, 2008 at 10:50 UTC
    I don't know if this is any help at all but I had to write a script to restore files from archives supposedy written in tar format on a Pr1me mini. The trouble was, the utility had truncated all of the filenames to some arbitrary length so there were loads of duplicates to resolve.

    Obviously this was written for the peculiarities of the Pr1me utility but you might be able to adapt it. It was written a long time ago before I had heard of strictures etc.

    I hope this is of use.

    Cheers,

    JohnGG

      Now that deserves all the ++'s it can get!

      Mike
Re: [OT] Tar file with non-identical duplicate files and no paths?
by repellent (Priest) on Sep 13, 2008 at 09:09 UTC
    Perhaps the -U option would help during extraction? Since it's not allowing you to overwrite, maybe unlinking the files first would do the trick.
    -U (x mode only) Unlink files before creating them. Without + this option, tar overwrites existing files, which preserves ex +isting hardlinks. With this option, existing hardlinks will be +broken, as will any symlink that would affect the location of an + extracted file.

      -U didn't work because the first one out has the readonly attribute set, so it can't be unlinked. However, whist looking for an option to discard attributes (which doesn't appear to exist), I noticed -w. By saying no to the first chance to extract the duplicated files, it allows me to get the second ones. Thanks :)


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        D'oh! I missed the -w option. Good thing you found it :)

          -U didn't work because the first one out has the readonly attribute set, so it can't be unlinked.

        OK, something is amiss here. I sometimes get confused about this issue, which is why I made the following notes for myself:
        # a file and its filename are different! # permissions are on files, not filenames => control of filenames is l +eft up to the directory # directories contain filenames (not files!), that themselves refer to + files file readable # may examine file contents file writable # may alter file contents file xecutable # may run file contents directory readable # may examine directory contents (list filename +s in directory) directory writable # may alter directory contents (remove or ren +ame filenames) directory xecutable # may use directory as component in pathnam +es or chdir to that directory # a read-only file is only protecting data (not its filename) from bei +ng changed

        As I understand it: unlinking a file is not determined by the permissions set on the file itself. It is the permissions set on the directory where the file resides in that determines whether the "filename" can be unlinked from the directory aforementioned.
Re: [OT] Tar file with non-identical duplicate files and no paths?
by RMGir (Prior) on Sep 13, 2008 at 09:33 UTC
    Ouch! Doesn't sound like much fun :(

    The good news is that the tar format is fairly straightforward, so you could probably roll your own extractor without TOO much pain.

    But the better news is that the Archive::Tar perl module looks like it might let you iterate to the file you want and then just extract it...


    Mike
Re: [OT] Tar file with non-identical duplicate files and no paths?
by eosbuddy (Scribe) on Sep 13, 2008 at 09:12 UTC
    Looks like both you and I need some more information :-) - what is the output of
    tar -tvf DBD-Pg-2.10.0-Perl5.8.tar
    that might perhaps indicate why those duplicate files exist (my guess is that those files maybe having some extra tags that are missing in the non-verbose version of the tar directive). Update: It is definitely a good idea to move the file to a separate folder before unpacking it

      Does this help any?

      c:\pgsql>tar -tvf DBD-Pg-2.10.0-Perl5.8.tar -rwSrwSrw- unknown/unknown 0 2008-08-27 18:45 .exists -rwSrwSrw- unknown/unknown 0 2008-08-27 18:45 .exists -rwSrwSrw- unknown/unknown 0 2008-08-27 18:45 Pg.bs -rwSrwSrw- unknown/unknown 233472 2008-08-27 18:45 Pg.dll -rwSrwSrw- unknown/unknown 380 2008-08-27 18:45 Pg.dll.manifest -rwSrwSrw- unknown/unknown 743 2008-08-27 18:45 Pg.exp -rwSrwSrw- unknown/unknown 1854 2008-08-27 18:45 Pg.lib -rwSrwSrw- unknown/unknown 568320 2008-08-27 18:45 Pg.pdb drwsrwsrwx unknown/unknown 0 2008-08-27 18:45 Pg drwsrwsrwx unknown/unknown 0 2008-08-27 18:45 DBD drwsrwsrwx unknown/unknown 0 2008-08-27 18:45 auto drwsrwsrwx unknown/unknown 0 2008-08-27 18:45 arch -rwSrwSrw- unknown/unknown 0 2008-08-27 18:45 .exists drwsrwsrwx unknown/unknown 0 2008-08-27 18:45 bin -rwSrwSrw- unknown/unknown 2040 2008-08-27 18:46 Pg.html drwsrwsrwx unknown/unknown 0 2008-08-27 18:46 DBD drwsrwsrwx unknown/unknown 0 2008-08-27 18:45 Bundle -rwSrwSrw- unknown/unknown 225260 2008-08-27 18:46 Pg.html drwsrwsrwx unknown/unknown 0 2008-08-27 18:46 DBD drwsrwsrwx unknown/unknown 0 2008-08-27 18:46 lib drwsrwsrwx unknown/unknown 0 2008-08-27 18:45 site drwsrwsrwx unknown/unknown 0 2008-08-27 18:45 html -rwSrwSrw- unknown/unknown 0 2008-08-27 18:45 .exists drwsrwsrwx unknown/unknown 0 2008-08-27 18:45 Pg drwsrwsrwx unknown/unknown 0 2008-08-27 18:45 DBD drwsrwsrwx unknown/unknown 0 2008-08-27 18:45 auto -r-Sr-Sr-- unknown/unknown 545 2008-08-20 19:41 Pg.pm drwsrwsrwx unknown/unknown 0 2008-08-27 18:45 DBD drwsrwsrwx unknown/unknown 0 2008-08-27 18:45 Bundle -rwSrwSrw- unknown/unknown 0 2008-08-27 18:45 .exists -r-Sr-Sr-- unknown/unknown 160452 2008-08-22 22:25 Pg.pm drwsrwsrwx unknown/unknown 0 2008-08-27 18:45 DBD drwsrwsrwx unknown/unknown 0 2008-08-27 18:45 lib -rwSrwSrw- unknown/unknown 0 2008-08-27 18:45 .exists drwsrwsrwx unknown/unknown 0 2008-08-27 18:45 script drwsrwsrwx unknown/unknown 0 2008-08-27 18:45 blib

      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Can you get that tar file to somewhere with a GUI?

        I don't often use a GUI approach to untarring on Linux but I have 7-zip on a 'doze box.

        That whole tar file might display okay in a GUI e.g. KDE's Konqueror if you have KDE or 7-Zip on 'doze. Then you could extract the files manually.

        The reason you have all those multiple files is because of the sticky bit set:
        rwSrwSrw-
        The S means that they're immutable so the system creates new ones whenever you untar the files. I would suggest using a GUI and extracting the tar ball and copying just the files you need - after that you will need to modify the properties so that the sticky bit is unset (apologies for not finding an easy way out). Alternatively, you could write a perl script to grab the files you want and delete the folder after untarring (making sure the tar file is extracted into a new folder than in the current working directory).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://711066]
Approved by bingos
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (2)
As of 2024-04-25 23:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found