http://qs321.pair.com?node_id=1190137


in reply to Re: Converting PDF file to text
in thread Converting PDF file to text

"Nothing had worked" meant that the resulting text files were filled with non-ascii gibberish and bore no resemblance to the pdf file.

In fact, pdftohtml works just fine. Trouble is, it's an executable. A condition I did not mention in the original post, was that this needs to be done by a script within a website's CGI directory. The server is configured not to allow the running of executables in cgi-bin. I do not have admin rights on the server and can not change this.

So, more specifically, I am looking for a perl based solution to this problem.

Replies are listed 'Best First'.
Re^3: Converting PDF file to text
by runrig (Abbot) on May 12, 2017 at 21:54 UTC
    pdftotext is probably the best pdf to text converter. So don't put the executable in cgi-bin...write a script that makes a system call. Please don't tell me that you can't make any system calls from your cgi script?
      > pdftotext is probably the best pdf to text converter.

      I disagree :)

      > ...write a script that makes a system call. Please don't tell me that you can't make any system calls from your cgi script?

      I agree. :)

      Cheers Rolf
      (addicted to the Perl Programming Language and ☆☆☆☆ :)
      Je suis Charlie!

Re^3: Converting PDF file to text
by LanX (Saint) on May 12, 2017 at 17:08 UTC
    I once took a look into the source of pdftohtml and porting it to Perl shouldn't be too difficult. ..

    BUT

    ... it's based on a call to ghostscript which does the hard part.

    And I doubt it can be done otherwise, I can't imagine anyone reimplementing PostScript in Perl.

    So if

    >  is configured not to allow the running of executables in cgi-bin. 

    Then you should start looking for a new server.

    I doubt it's possible to find an open solution not based on ghostscript.

    (Except you find a Web service doing the hard part for you)

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)
    Je suis Charlie!

    update

    Well can you run executables outside cgi-bin ? And is ghostscript installed?