Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: acessing the data from word(.doc) file in linux environment

by mwah (Hermit)
on Sep 17, 2009 at 08:40 UTC ( #795812=note: print w/replies, xml ) Need Help??


in reply to acessing the data from word(.doc) file in linux environment

word file in linux environment

There seem to exist a lot of different options with very dfferent complexities, paired with different word-formats.

If it's a plain old word 2000-2003 file and you already know what your tables look like and you need only some data from within some cells, you could do simply a:

$> abiword --to=rtf myworddocument.doc
and then:
$> perl extract-table-cells.pl myworddocument.rtf

in the latter (extract-table-cells.pl), you would simply search for:

[pseudo] ... # table content part already extracted to $tablecontent @cells = $tablecontent =~ /} ([^}]*) }\\cell{/xgs; ...

which might give you the cells in @cells.

But it depends on your problem. Of what scale and purpose is your attempt?

Regards

mwa

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://795812]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (2)
As of 2021-04-15 03:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?