Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: Extracting text from MS Word files on a Linux box

by hippo (Bishop)
on Jun 21, 2018 at 11:09 UTC ( [id://1217112]=note: print w/replies, xml ) Need Help??


in reply to Extracting text from MS Word files on a Linux box

Have you tried strings? Always used to do the trick before the MS format changed.

  • Comment on Re: Extracting text from MS Word files on a Linux box

Replies are listed 'Best First'.
Re^2: Extracting text from MS Word files on a Linux box
by afoken (Chancellor) on Jun 21, 2018 at 20:18 UTC
    Have you tried strings? Always used to do the trick before the MS format changed.

    docx is just a bunch of zipped XML files and some misc files. strings will fail due to ZIP, but once unpacked, strings will happily dig through the XML files.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re^2: Extracting text from MS Word files on a Linux box
by Laurent_R (Canon) on Jun 21, 2018 at 11:55 UTC
    I just did not think about it. That's a very good idea, I'll try it. I don't know how it works under the hood, but I know that the Linux grep command is able to find strings in a MS word file, so, if it works similarly, the Linux string command might be all I need.

    Thanks hippo.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1217112]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (4)
As of 2024-04-20 03:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found