Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re: Extracting text from PDF. No really

by Fletch (Bishop)
on Mar 28, 2008 at 12:05 UTC ( #676956=note: print w/replies, xml ) Need Help??


in reply to Extracting text from PDF. No really

xpdf comes with a pdftotext which I've had fairly good luck with. It also is smart enough to extract and preserve (most) formatting (or at least most of what's been in what I've run through it . . . :). Perhaps install that if you don't already have it and open a pipe from it.

Update: MENTAL NOTE: Wait until morning caffeine has taken effect enough for reading comprehension to function before attempting to solve problems. KTHXBAI.

The cake is a lie.
The cake is a lie.
The cake is a lie.

  • Comment on Re: Extracting text from PDF. No really

Replies are listed 'Best First'.
Re^2: Extracting text from PDF. No really
by clinton (Priest) on Mar 28, 2008 at 12:07 UTC
    Thanks Fletch, but you'll see that my second example in the root node already uses pdftotext, and it is dropping the first character on many lines. Yet xpdf displays the PDF correctly!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://676956]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (5)
As of 2023-01-27 01:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?