http://qs321.pair.com?node_id=930807


in reply to Re: Example Of Using CAM::PDF Like HTML::TokeParser
in thread Example Of Using CAM::PDF Like HTML::TokeParser

Anonymous Monk,
If you are referring to the non-existant PDF parser that this thread is about, then no. The internal structure of a PDF wouldn't lend itself to XPath diving.

If you are referring to the way I go about creating an parser using HTML::TokeParser then the answer is "it depends". Node traversal is usually the last tool in the box I reach for. I am not even opposed to using regular expressions (*gasp*) if each page is consistent enough. It all depends on how consistent one page is to the next.

Cheers - L~R

  • Comment on Re^2: Example Of Using CAM::PDF Like HTML::TokeParser