more useful options | |
PerlMonks |
Re: Preserving layout in pdf to text or html to text conversionby cbrandtbuffalo (Deacon) |
on Apr 10, 2007 at 20:11 UTC ( [id://609248]=note: print w/replies, xml ) | Need Help?? |
Maybe this is obvious, but if you're going to try to add some of this functionality yourself, consider subclassing or otherwise building on one of the existing parser modules you mentioned. If you can use the existing module to do most of the work, you could focus on processing the DIV tags and the information in them. You need to find a parser module that keeps the CSS info rather than immediately throwing it out.
In Section
Seekers of Perl Wisdom
|
|