Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

converting doc to pdf

by kcella (Beadle)
on Jul 04, 2011 at 03:01 UTC ( #912582=perlquestion: print w/replies, xml ) Need Help??

kcella has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to automate the conversion of a Word document to a PDF file. Ideally I would prefer a pure Perl solution. I did find one on this site, but it uses HTML::Tidy and installing that is a major pain in my neck.

The problem is that the hosting service I use does not have OpenOffice or the tidyp library installed by default. If I need to use it, then it has to be installed locally in my home directory. I'm not very savvy in reagrd to configuring the default installation directory so nothing has been working.

Does anyone have any alternate solutions or can you point me to step-by-step instructions to install things to alternate locations? This is very frustrating and I've had zero luck using Google.

Replies are listed 'Best First'.
Re: converting doc to pdf
by Khen1950fx (Canon) on Jul 04, 2011 at 05:18 UTC
    Maybe this will get you up and running. It requires Text::FromAny and a2pdf:
    #!/usr/bin/perl use strict; use warnings; use Text::FromAny; my $file = '/root/Desktop/test.doc'; my $tfile = '/root/Desktop/text.txt'; open STDOUT, '>', $tfile; my $tFromAny = Text::FromAny->new(file => $file); my $text = $tFromAny->text; print $text, "\n"; system("a2pdf -o /root/Desktop/text_is_pdf.pdf $tfile"); close STDOUT;
Re: converting doc to pdf
by biohisham (Priest) on Jul 04, 2011 at 08:55 UTC
    There isn't a way out from having to deal with installing modules be it now -for this case- or later, a time will come where you may just need to do that. It serves two major advantages to know the following; how to install modules on your machine, how to read their documentations and how to consistently use them:
    • First, you don't reinvent the wheel (someone has figured a solution and was generous enough to contribute it to CPAN), so wizard-craft and witch-craft for that matter are to be based on using these magic boxes to your heart-content and imagination limits.
    • Second,OSOM (Out of Sight Out of Mind) quality, the burden of implementing mutli-step algorithms has been uplifted since many modules provide abstractions to tasks that could otherwise be lengthy to write (in terms of time and effort), think about writing a taxonomy tree algorithm for instance, or risky to write (computation inadvertent inaccuracies, buggy and error-prone) so having a module that has been already tested and used by many is a wiser option.

    Not to deviate from the topic, in CPAN there are many PDF modules to enable manipulation of PDF files to different levels of complexity and they're robust and more appropriate to pick instead of thinking of writing your own code. If you have issues with installing modules you can read at the Tutorials the guides for Installing Modules ...

    On another note, there are websites that do format conversions, however, they put a cap on the maximum file size you can submit, the output quality or the number of submission you're allowed, I find to be very good and reliable.

    David R. Gergen said "We know that second terms have historically been marred by hubris and by scandal." and I am a two y.o. monk today :D, June,12th, 2011...

      I was able to finally install tidyp and HTML::Tidy. It turns out the conversion produces a flat text file in PDF format. I lose all the formatting from the original Word document. It appears the only solution is to convert it through OpenOffice or Word which is not something that comes installed by the webhost.

      Even if I could install OpenOffice locally, I imagine it would take up more of my alloted space than I can afford.

      I was hoping that like most things, there would be several solutions to this problem. It seemed like a very common thing that many people would need to accomplish in a CLI mode.

      My next attempt will be to find a way to convert a word document to postscript and then use ps2pdf to convert it to a PDF thereby hoping to maintain the formatting.

Re: converting doc to pdf
by tmaly (Monk) on Jul 06, 2011 at 13:16 UTC

    I ended up using a script with OpenOffice to accomplish this. It does a fairly good job

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://912582]
Approved by philipbailey
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (7)
As of 2020-07-02 09:06 GMT
Find Nodes?
    Voting Booth?

    No recent polls found