Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re^2: Converting PDF file to text

by cerian (Novice)
on May 12, 2017 at 16:26 UTC ( #1190137=note: print w/replies, xml ) Need Help??


in reply to Re: Converting PDF file to text
in thread Converting PDF file to text

"Nothing had worked" meant that the resulting text files were filled with non-ascii gibberish and bore no resemblance to the pdf file.

In fact, pdftohtml works just fine. Trouble is, it's an executable. A condition I did not mention in the original post, was that this needs to be done by a script within a website's CGI directory. The server is configured not to allow the running of executables in cgi-bin. I do not have admin rights on the server and can not change this.

So, more specifically, I am looking for a perl based solution to this problem.

Replies are listed 'Best First'.
Re^3: Converting PDF file to text
by runrig (Abbot) on May 12, 2017 at 21:54 UTC
    pdftotext is probably the best pdf to text converter. So don't put the executable in cgi-bin...write a script that makes a system call. Please don't tell me that you can't make any system calls from your cgi script?
      > pdftotext is probably the best pdf to text converter.

      I disagree :)

      > ...write a script that makes a system call. Please don't tell me that you can't make any system calls from your cgi script?

      I agree. :)

      Cheers Rolf
      (addicted to the Perl Programming Language and ☆☆☆☆ :)
      Je suis Charlie!

Re^3: Converting PDF file to text
by LanX (Sage) on May 12, 2017 at 17:08 UTC
    I once took a look into the source of pdftohtml and porting it to Perl shouldn't be too difficult. ..

    BUT

    ... it's based on a call to ghostscript which does the hard part.

    And I doubt it can be done otherwise, I can't imagine anyone reimplementing PostScript in Perl.

    So if

    >  is configured not to allow the running of executables in cgi-bin. 

    Then you should start looking for a new server.

    I doubt it's possible to find an open solution not based on ghostscript.

    (Except you find a Web service doing the hard part for you)

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)
    Je suis Charlie!

    update

    Well can you run executables outside cgi-bin ? And is ghostscript installed?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1190137]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (3)
As of 2022-09-25 14:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    I prefer my indexes to start at:




    Results (116 votes). Check out past polls.

    Notices?