Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: PDF alternative to mudrow to get XML structure

by jcb (Parson)
on Mar 06, 2020 at 00:22 UTC ( [id://11113891]=note: print w/replies, xml ) Need Help??


in reply to PDF alternative to mudrow to get XML structure

You are confused. PDF does not have an XML structure, aside from some metadata blocks in some PDF files. The PDF structure itself is not XML because (among other reasons) PDF is an older format than XML.

I am unfamiliar with mudraw; perhaps it translates PDF structure into XML? Try searching CPAN for "PDF" and see what you find.

Replies are listed 'Best First'.
Re^2: PDF alternative to mudrow to get XML structure
by marto (Cardinal) on Mar 06, 2020 at 03:33 UTC

    "You are confused..."

    Looks like you are the one who is confused here. OP specifically shows what they are doing, tells us how they are generating XML from PDF.

    "I am unfamiliar with mudraw; perhaps it translates PDF structure into XML? Try searching CPAN for "PDF" and see what you find.”

    It'd have take seconds to confirm what mudraw does.

      A PDF file does not have an XML structure. Our questioner is using a tool that produces XML output describing PDF structure and now needs to replace that tool. There is no standard translation from PDF to XML. There is no easy replacement for mudraw because the XML our questioner is using is a mudraw-specific format because there is no standard XML mapping for PDF. The best solution is to process the PDF directly.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11113891]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (4)
As of 2024-03-29 07:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found