Re: Need Help for Convert PDF to HTML


more useful options
	PerlMonks

Re: Need Help for Convert PDF to HTML

by steve (Deacon)

on Feb 11, 2011 at 16:11 UTC ( [id://887640]=note: print w/replies, xml )

Need Help??

in reply to Need Help for Convert PDF to HTML

Another difficulty I do not see listed among the replies here is the issue of embedded fonts. PDF documents allow for embedding of fonts, and HTML does not. If usage of non-standard (non-web) fonts is embedded in the source PDF, then extraction of the font becomes a significant challenge. Some tools are available to do just that. CAM::PDF can Extract Font Info from PDF, but when brian_d_foy asked about extracting the fonts themselves Chris Dolan intends to never add that feature.

If you happen to have the font, that may be easier. It really depends on your source PDF document.

CSS can be used to specify such fonts (see FontSpring "Bulletproof" Method, Smiley Variation among many).

There are also licensing issues in play for many fonts. Depending on your circumstances (and perhaps the font requirements) this may be of concern/interest to you.

Comment on Re: Need Help for Convert PDF to HTML

Replies are listed 'Best First'.
Re^2: Need Help for Convert PDF to HTML by inman2787 (Initiate) on Mar 26, 2011 at 04:29 UTC
1. Convert PDF file to text file using Acrobat Reader or any program similiar. Just save it as a text file, no need for pro or extended versions of reader. 2. Open TextEdit.app, open up the text file you've created, copy/paste whole thing to a new document window. - Open Preferences in TextEdit - Go to the "Open/Save" Tab - Change Document Type to HTML Strict or XHTML strict depending on your needs. In Styling, select No CSS. - Go back and save the new document now as a html file. There is a step by step instruction on how to convert PDF to HTML. Hope that helps !	[reply]
Re^2: Need Help for Convert PDF to HTML by Anonymous Monk on Dec 31, 2011 at 15:25 UTC
HTML 5 has embedded fonts via JavaScript	[reply]

In Section Seekers of Perl Wisdom

Domain Nodelet^?

www.com | www.net | www.org

Node Status^?

node history
Node Type: note [id://887640]
help

Chatterbox^?

How do I use this? • Last hour • Other CB clients

Other Users^?

Others avoiding work at the Monastery: (3)

As of 2024-04-24 22:54 GMT

Sections^?

Information^?

Find Nodes^?

Leftovers^?

Today I Learned

Voting Booth^?

No recent polls found