Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Read doc/docx in Linux

by Anonymous Monk
on Jul 15, 2010 at 00:10 UTC ( [id://849659]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

How can i read doc/docx files using perl ... in Linux

Replies are listed 'Best First'.
Re: Read doc/docx in Linux
by fod (Friar) on Jul 15, 2010 at 01:49 UTC
    Text::Extract::Word will do that (for .doc files anyway) but you'll have to convert the windows newlines to unix ones.Maybe something like:

    perl -M'Text::Extract::Word q(get_all_text)' -e 'print get_all_text(q(document.doc))' | dos2unix | less

    if you just want a quick look at it.

    update: you might find these nodes helpful re docx:

    using Perl to generate docx file

    docx and Perl

Re: Read doc/docx in Linux
by TedPride (Priest) on Jul 15, 2010 at 02:57 UTC
    There's a command-line utility called catdoc which does that supposedly, haven't tried it myself, however. It might help to tell us WHY you need the files converted, this would let us know whether it's something you need to do on an ongoing basis or something you could just do once using software (often the simpler solution).
Re: Read doc/docx in Linux
by philipbailey (Curate) on Jul 15, 2010 at 14:22 UTC

    I have used antiword successfully in the past for reading the text of Word files at the command line. It doesn't seem to be actively maintained any more, though.

    I also notice that AbiWord has a command line option for converting Word to other formats. You could of course use the full GUI version of AbiWord, or indeed OpenOffice.

    (Update) I realise of course that none of my answer directly answers the question of reading these files in Perl, but in practice the command line possibilities mentioned are often a practical way to go.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://849659]
Approved by rovf
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (5)
As of 2024-03-28 20:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found