Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Find Created and Last Modified Date of Excel xls file

by gauss76 (Scribe)
on Jun 29, 2017 at 13:01 UTC ( [id://1193822]=perlquestion: print w/replies, xml ) Need Help??

gauss76 has asked for the wisdom of the Perl Monks concerning the following question:

Hi All,

I have a question regarding an Excel file. I currently use the Perl modules:

Spreadsheet::ParseExcel

Spreadsheet::ParseXLSX

to read Excel .xls and .xlsx files respectively.

I would now like to be able to access the files created and last modified date information with a Perl program. I can do this for the .xlsx files because my Perl program can open the relevant xml file contained within the zip file and extract the required info.

However, I have no means of doing this for the older, binary format, .xls Excel files. I have checked the documentation for the module

Spreadsheet::ParseExcel

but cannot find any mention of a method for finding the information I am after. I have also consulted Microsoft's documentation on the Excel binary file format but found the underlying structure very difficult to understand.

So, my question: Does anyone know how I might go about using Perl to extract the required date information?

Perl Version: 5.10.1 running on Linux workstation.

Any help much appreciated.

gauss76

Replies are listed 'Best First'.
Re: Find Created and Last Modified Date of Excel xls file
by 1nickt (Canon) on Jun 29, 2017 at 13:43 UTC

    Hi, I think you would enjoy the Perl built-in function stat.


    The way forward always starts with a minimal test.

      I was 8 seconds late... :P

      Seeking for Perl wisdom...on the process of learning...not there...yet!
Re: Find Created and Last Modified Date of Excel xls file
by Marshall (Canon) on Jun 29, 2017 at 19:20 UTC
    Update: Oh, I just noticed: Perl Version: 5.10.1 running on Linux workstation.

    You can't get the creation time from the file system because it doesn't exist on Unix. I don't even know if "creation time" is stored in the older .xls format.
    ======

    There are some complications with achieving your goals. Perl has built in file test operators. BUT, these operators only supply information that would be available on a Unix file system. Unix does not have the idea of a "created on" or "Born On" date. Windows does have that concept and you will need to use a Windows specific api to get it.

    In Unix you can get the mtime(last modified), atime (last accessed) and ctime (last inode change time). ctime is not creation time.

    The easiest way to get mtime: time = -M "filepath";. seefile test operators.

    For the Windows creation time, I think Win32::API:File will get you there. Although there may be a better module for this or a command line command that you could run. In this Win32 api, they labeled ctime as "creation time". I would run some tests on Windows to make sure that is really what you are getting.

    Summary, Last Modified Time is easy. Creation Time is only available on Windows and is more difficult to get because there is no "standard" Perl function for it.

Re: Find Created and Last Modified Date of Excel xls file
by thanos1983 (Parson) on Jun 29, 2017 at 13:51 UTC

    Hello gauss76,

    I am not aware of any module in Perl XLSX that can return what you want. But you can use other build in modules function such as stat.

    Example from the documentation:

    #!usr/bin/perl use strict; use warnings; use File::stat; # use Benchmark qw(:all) ; # WindowsOS # use Benchmark::Forking qw( timethese cmpthese ); # LinuxOS my $filename = 'one.xls'; my $sb = stat($filename); printf "File is %s, size is %s, perm %04o, mtime %s\n", $filename, $sb->size, $sb->mode & 07777, scalar localtime $sb->mtime; __END__ $ perl test.pl File is one.xls, size is 5632, perm 0664, mtime Thu Jun 8 13:23:19 20 +17

    Hope this helps, BR.

    Seeking for Perl wisdom...on the process of learning...not there...yet!

      Thanks to everyone that replied to my question. However, I don't think I have explained my problem in enough detail...Apologies.

      The created and last modified dates I am after are not the dates that are given by the particular filesystem the file is on but are actually embedded in the file itself. So the suggested stat function will not retrieve the required information (I have tested this and shown that the dates reported by the stat function are not the ones I need).

      I can find the required dates by opening the file in Excel and looking at its properties. These dates are NOT the same as the filesystem reports.

      In general I will not have Excel installed and am therefore not able to use any vba code, for example, to retrieve the required information.

      In fact I will be using a Linux based system to retrieve the info where Perl in installed.

      I have already written a Perl module that can retrieve the correct information from the new Excel 2007+ file types as these are simply a set of zipped xml files.

      However the older Excel files that I need to analyse are fundamentally binary files as explained in my initial post.

      I hope my exact needs are clearer than before.

      gauss76

        When I move (sftp) a spreadsheet created in Windows to a linux machine and look at the ctime using (stat)[10], it does not match the windows (stat)[10]. If I sftp the file back from the linux machine to the same location on the windows machine (even if I deleted the file in the mean time), it keeps the same (stat)[10] and the Excel Info CreateDate remains the same. If I sftp the file from the linux machine to a new location on the windows machine, the (stat)[10] changes, but the Excel Info CreateDate remains the same. So it appears that it is embedded in the .xls format somewhere, not just using filesystem information.

        I created two .xls spreadsheets with the same value in A1, a couple minutes apart, then did a hexdump and diff'd those: there were differences, but nothing jumped out and said "Create Date is encoded here", though looking at some of the strings, there are sections with ID="{...}", CMG="...", and DPB="..." that I think are likely candidates for encoded binary information. Probably someone more familiar with the .xls binary format (such as the author of Spreadsheet::ParseExcel) might be able to find it more easily. I tried things the hard way (see spoiler below): I did a key-by-key comparison of the ->parse($fn) objects, but nothing jumped out as being truly different (other than filename and format font numbering, which seem irrelevant).

        I was even able to change the Create Date in excel, using the VBA code ActiveWorkbook.BuiltinDocumentProperties("Creation Date") = Date, and re-saved. When I did that, the Excel Info CreateDate changed, and ID="{...}" section moved and changed, which is one of the reasons I think it might hold the Builtin Document Properties. (And "DPB" could be abbreviation for "Document Properties - BuiltIn". But that's just guessing.)

        Inspired by that VBA snippet, if I create a VBA function,

        function CreateDate() CreateDate = ActiveWorkbook.BuiltinDocumentProperties("Creation Da +te") end_function

        and set A1's formula to =text(CreateDate(), "yyyy-mmm-dd hh:mm:ss"), then save the .xls spreadsheet, I can use ParseExcel to grab the value of A1 and grab that CreateDate. So if you have control over the original spreadsheet creation, you can add the information you want into an accessible location; but I get the impression that you want to be able to do it for an arbitrary .xsl, not one that you control the creation of. :-)

        With no further ado, the ParseExcel compar program:

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1193822]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (4)
As of 2024-04-24 19:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found