http://qs321.pair.com?node_id=795812


in reply to acessing the data from word(.doc) file in linux environment

word file in linux environment

There seem to exist a lot of different options with very dfferent complexities, paired with different word-formats.

If it's a plain old word 2000-2003 file and you already know what your tables look like and you need only some data from within some cells, you could do simply a:

$> abiword --to=rtf myworddocument.doc
and then:
$> perl extract-table-cells.pl myworddocument.rtf

in the latter (extract-table-cells.pl), you would simply search for:

[pseudo] ... # table content part already extracted to $tablecontent @cells = $tablecontent =~ /} ([^}]*) }\\cell{/xgs; ...

which might give you the cells in @cells.

But it depends on your problem. Of what scale and purpose is your attempt?

Regards

mwa