Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re^3: Build a PDF book index

by markong (Monk)
on Mar 17, 2018 at 16:09 UTC ( #1211146=note: print w/replies, xml ) Need Help??


in reply to Re^2: Build a PDF book index
in thread Build a PDF book index

Thank you, this tool extracted the text contents successfully, with apostrophes and (prolonged) dashes encoded as Latin-1!

Replies are listed 'Best First'.
Re^4: Build a PDF book index
by LanX (Cardinal) on Mar 17, 2018 at 16:51 UTC
    You're welcome!

    Please note: The -xml switch gives you also the font-number and text- position in case you need to adjust characters like described.

    I had to do this in the past.

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)
    Wikisyntax for the Monastery

      I am curious to what are you referring to. Are you referring to the content of the XML file output-ed ? e.g.:

      <text top="78" left="108" width="540" height="21" font="3">The develop +er, on the other hand, feels like hes interrupted several times a da +y for</text>
      Or are you talking about some cli option to give to the command? In this case, I don't see anything related (pdftohtml version 0.24.3).
        > Are you referring to the content of the XML file output-ed

        yes, this includes

        • fontnumber font="3" some fonts may need special translations
        • box geometry top="78" left="108" width="540" height="21" if you want to exclude special areas (footnotes, pagenumber, ...)

        Cheers Rolf
        (addicted to the Perl Programming Language and ☆☆☆☆ :)
        Wikisyntax for the Monastery

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1211146]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (4)
As of 2021-01-26 07:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?