Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Joininig excel columns from many files

by dtm81 (Initiate)
on Jul 27, 2011 at 11:17 UTC ( [id://916995]=perlquestion: print w/replies, xml ) Need Help??

dtm81 has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, This is my first thread on perlmonks and i m feeling in a tricky situation with the task i need to solve it :) The task is: i have like 20 excel files, with same columns, but different data, and the columns are randomized in each file. What i need to do, is to join all the files into a "master" file, based on the column data from each file. For a quick solution i was thinking to map each column into a hash key and column data into an array of values for each column-hash, but i need to store these hashes somewhere until all the files are parsed. Any suggestion will be appreciate, Thanks

Replies are listed 'Best First'.
Re: Joininig excel columns from many files
by Anonymous Monk on Jul 27, 2011 at 11:28 UTC
Re: Joininig excel columns from many files
by kcott (Archbishop) on Jul 27, 2011 at 18:57 UTC
Re: Joininig excel columns from many files
by itsscott (Sexton) on Jul 27, 2011 at 17:46 UTC

    We've had to do similar in the past, we did use ansi C mind you, but the 'method' may be of use to you.

    We exported the xls files to csv with headers (ensuring all the headers were the same text/name, reguardless of order.) Then we made a 'hash' table of arrays for each column. We then processed each file in turn, adding each row's of data into the correct 'hash->array', at the end we then simply wrote out a new csv file with all the combined data. It worked well and was only an hour or so of coding and testing.

    In C we used a structure and had to manage all our memory etc. Thankfully this is not necessary in Perl!

    I hope the method we've used successfully numerous times in the past gives you a lead on how to do it yourself. Good luck! (btw I did see this cvs parser on cpan and it looks as if it might work for you as well. http://search.cpan.org/~makamaka/Text-CSV-1.21/lib/Text/CSV.pm). We didn't access the xls directly as it was on our unix servers, but the client was pleased with the simple 2 step solution (export csv, upload csv to the cgi that processed it and returned a downloadable csv back to the client). I know there are modules to access Excel and it's files, but I've never worked with them and not sure if they will work for you.
Re: Joininig excel columns from many files
by wallisds (Beadle) on Jul 27, 2011 at 18:15 UTC

    I think you can accomplish your goal by using a combination of two CPAN modules: Spreadsheet::ParseExcel and Spreadsheet::WriteExcel. Use ParseExcel to get the data from the 20 excel files while using WriteExcel to add the data (organized as you like) to a new sheet using WriteExcel.

    Since the columns are not in the same order on each sheet you'll need to get the value of A1, B1, .... G1 for the names of the columns. As long as the naming convention is the same in each sheet you should be able to keep all the data together by an IF statement.

Re: Joininig excel columns from many files
by dcmertens (Scribe) on Jul 27, 2011 at 15:02 UTC

    I'm confused. What do you mean you need to store the data 'somewhere until all the files are parsed'? Does the amount of data exceed a RAM limitation or are you just inquiring how you would build the nested data structure?

Re: Joininig excel columns from many files
by Not_a_Number (Prior) on Jul 27, 2011 at 19:28 UTC

    Hi, dtm81, and welcome to the Monastery.

    First of all, I'd like to point out that when you post a question here, and one of the earliest replies requests further elucidation, it is considered bad form not to respond. In your case, dcmertens asked you:

    What do you mean you need to store the data 'somewhere until all the files are parsed'? Does the amount of data exceed a RAM limitation or are you just inquiring how you would build the nested data structure?

    More than four hours later, you still haven't replied.

    Of course, you might have been unavoidably detained elsewhere, eg by having your house burned down. In which case, no apologies are necessary. Otherwise, if you return to this thread, please do dcmertens the decency to reply to his post.

    Thank you

      Fortunatelly i am currently working as well, so maybe you should think that i am busy working to something else... And i am very greatfull that peoples are trying to help here, thank you all for the quick solutions that you bother gave me :) To respond to our colegue's question, i usually write a script that will be OK for 20 lines as well as for 2 billions of lines, so i was thinking i will use threads to split the file in. As a quick solution, indeed i will export the file into a csv format and just the task to be done quick, but i ll want to "beatify" the script as i mentioned earlier. I think the solutions are more than enough, so i ll stop here... Cheers, to all!
        Had time this weekend, to "meditate" on it... The implementation was without external module, but with a "big" - hash of arrays, that i parsed it later on... It was in a way funny in the end, because i forgot to set the IFS to \r\n, reading from win files :) but finally thought of it... Any way i found a nice module: Spreadsheet::ExcelHashTable that is parsing the excel files straight into the hash... Thanks to all! Proud to be a part of this community.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://916995]
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (5)
As of 2024-04-24 07:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found