Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: Indexing of Word documents

by rpnoble419 (Pilgrim)
on Jun 10, 2013 at 04:31 UTC ( [id://1037982]=note: print w/replies, xml ) Need Help??


in reply to Indexing of Word documents

As you are indexing the data from word, the exact page number does not matter given the reasons stated by davies. A better solution is to index the document and use a paragraph counter to index any key words from. The number of paragraphs remain the same regardless of how the document re-flows. Only an edit to the document can change the paragraph count.

Replies are listed 'Best First'.
Re^2: Indexing of Word documents
by axiomcrs (Initiate) on Jun 10, 2013 at 19:10 UTC
    Thanks for all suggestions. Here are some more details. This script is to create an index for a book. The word files will only reside on one computer, and so, the issues with changing computers and different printers goes away. Using paragraphs does not work since any paragraph could be on 2 pages at once and then a page number associated with a name would be wrong. I am not forced to do this with Word. So changing to pdf could be an option since an index for a book can be provided with a pdf file. I was asking about using pdf, but could not determine if page numbers are associated with the text. For instance, if I search for bob jones in the pdf file, is there meta-data that tells what page number that name appears?
      hey flexvault, does the pdftohtml program give page numbers as a metadata for the text?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1037982]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (7)
As of 2024-04-19 15:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found