Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

uploading files in CGI

by Anonymous Monk
on Sep 28, 2000 at 10:37 UTC ( [id://34337]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Could someone please help me? I'm sort of a newbie. My problem is I'm writing a program in Perl in which I ask the user to upload Microsoft Word file. I use this code to try to write their file to a Unix text file. $file = param('upload'); open (UPLOAD, "unixtextfile"); while ($bytesread=read($file, $buffer, 1024)){ print UPLOAD $buffer; } My problem is that when I test this, I get nothing but garbage. I'm writing this program for a former professor and he needs it by Sunday. I'd be forever grateful for a reply or an email. Thanks in advance. Ed edkuehnel@aol.com

Replies are listed 'Best First'.
RE: uploading files in CGI
by tilly (Archbishop) on Sep 28, 2000 at 13:29 UTC
    It looks to me like you should get a copy of the actual word file uploaded. Test that by ftping the original document over in binary (NOT TEXT) mode and seeing that the two files are the same.

    After that you will need to do a conversion. This is not a simple job, so don't try to figure it out for yourself. Just use a utility. A brief search turned up word2x, give that a shot and report back whether it worked. If it does not you should be able to find another conversion utility out there.

    UPDATE
    Having read it again, I notice that your open does not have a die on failure. That is probably because you are copying the code example. But still in a CGI script you should:

    use CGI::Carp 'fatalsToBrowser';
    at the top so dies go back to the user, and then in your open:
    open(UPLOAD, ">$tempfile") or die "Cannot write $tempfile: $!";
    You should also have some sort of locking logic for cases where two people both launch the CGI at the same time (or one person double-clicks on submit). Simple Locking shows one approach you might want to use.
Re: uploading files in CGI
by toadi (Chaplain) on Sep 28, 2000 at 15:30 UTC
    helo,

    for binary files do:

    open(FILE1, ">$file") or die "uploaden not working"; binmode FILE1; while (my $bytesread = read($file1, my $buffer, 1024)) { print FILE1 $buffer; }

    also don't forget to give in the form tag: ENCTYPE="multipart/form-data"

    Hope this helps you on your way.

    Oh btw look in the code section of this site, there's a script to upload file. Or use supersearch, this is a question which is answered before :-)


    My opinions may have changed,
    but not the fact that I am right

      He said he was on Linux. On Linux binmode is a no-op.

      But for portable code, absolutely.

RE: uploading files in CGI
by geektron (Curate) on Sep 28, 2000 at 12:23 UTC
    problem one: MS Word files aren't text files. they're binary-encoded, IIRC, so you're not going to be able to do a simple conversion.

    second: you haven't read the docs for CGI.pm. (found here) the explanation for how to handle this (along with code) is right there. i re-read it today to handle this problem.

    update: yes, as tilly noted, there's a word missing. should've been 'IF you haven't read. . . "

      I think you meant to say, If you haven't read....

      Looking at the code example it looks like he not only has read the docs, but correctly figured out that he wants to use the code from here for binary files.

      Unfortunately I notice that the example has an open without checking for success. Oops. Since this is an example that people are likely to copy without understanding, I think this is important.

Re: uploading files in CGI
by AgentM (Curate) on Sep 28, 2000 at 23:53 UTC
    I would have expected something in the direction of extracting text in an MSWord doc on CPAN but several searches revealed nothing except a PDF2text type module. perhaps you can ask your uploading public to upload text files or some other format. (You'll have a hard time uploading and extracting info from an MSWord doc since MS keeps this probably secret and copyrighted in some manner.) Of course, if you're storing this file just for a later download, then you should be all set. But you should still set a $CGI::POST_MAX and scan the file appropriate to your file type to ensure its integrity.
    AgentM Systems or Nasca Enterprises is not responsible for the comments made by AgentM- anywhere.
RE: uploading files in CGI
by jepri (Parson) on Sep 29, 2000 at 09:05 UTC
    I wasn't too impressed with my experience with word2x. It tended to munge even simple MSWord docos (you may get better mileage from new versions).

    At work we are writing code to do exactly what you want, and we finally settled on StarOffice as the converter. We just dump the uploaded file to disk, call StarOffice to convert it and then use the output. It's not an ideal solution at >80Mb, but it's free and does the job.

    HTH,
    Jeremy

Re: uploading files in CGI
by mattr (Curate) on Sep 30, 2000 at 20:25 UTC
    It seems you are getting what looks like junk because you are trying to read a binary file. A Word file is a stream of objects that need to be decoded. It also has things like endianness, hierarchy, and possibly old versions of the text all jumbled together but all is not lost!

    Here are some results from my own research in the recent past. Note: I do not have experience with any of these, or with word2x (which also looks good). StarOffice might work well too, especially if you have a massive amount of memory, or if you can use just the converter part..

    Why not try wvWare (http://www.wvWare.com/) which should support Word 2000 (Word version 9) files as well and is being used in KDE among other places. It comes with scripts that automate various conversions such as Word to HTML 4.0. I have not used it myself yet but it sounds good and is free (GPL) software.

    There are some other pieces of software which you may want to look at in the future but probably not by Sunday..

    lv, a multilingual file viewer (in case you are dealing with non-English encoding, you could drop your file on this after running it through wvWare).
    http://www.ff.iij4u.or.jp/~nrt/lv/

    If you want to access OLE streams (overkill I expect):

    olemsword.pl, a program which uses Win32::OLE to read MS Word files (at least to version 8 I believe). Available in association with www.namazu.org, a Japanese search engine. Win32::OLE at CPAN is based on ActiveState Perl's Windows library.
    But they use wvWare mainly, saying olemsword.pl is only for Windows platform (possibly not so any more)

    Filters Web (http://arturo.directmail.org/filtersweb/) Ole libraries.

    OLE::Storage (the perl4 version was called LAOLA)
    Its "lhalw" utility doesn't fully support Word 8.
    I have used the related excel converter which is not perfect.
    Available on CPAN, or see http://user.cs.tu-berlin.de/~schwartz/perl/

    Good luck and please tell us how it goes.

    Matt

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://34337]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (2)
As of 2024-04-20 10:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found