Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^3: Can Perl generate a page break character that Microsoft Word will recognize?

by jcb (Parson)
on Jan 01, 2020 at 02:43 UTC ( [id://11110813]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Can Perl generate a page break character that Microsoft Word will recognize?
in thread Can Perl generate a page break character that Microsoft Word will recognize?

Interesting. Word seems to use ASCII CR as paragraph break, so does it use ASCII LF or ASCII FF as page break? (There is also a forced end-of-line produced by Shift-Enter that does not start a new paragraph. Simply pressing Enter actually starts a new paragraph, which starts a new line as a side-effect.)

If we want to consider producing DOCX, it would be fairly easy to input AAA [Control-Enter to insert a page break] BBB and see what turns up in document.xml. Word DOC format uses Microsoft's "OLE Container" format, which turns out to be a miniature FAT filesystem, complete with its own allocation tables, and (if I remember correctly) a second FAT filesystem with smaller blocks stored inside a "file" in the outer container file. At least they only did that to one level of recursion, instead of producing a "filesystems all the way down" crawling horror.

  • Comment on Re^3: Can Perl generate a page break character that Microsoft Word will recognize?

Replies are listed 'Best First'.
Re^4: Can Perl generate a page break character that Microsoft Word will recognize?
by marto (Cardinal) on Jan 01, 2020 at 10:30 UTC

    Or just look up the XML to do what you want:

    <?xml version="1.0" encoding="UTF-8"?> <w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingm +l/2006/main" xmlns:m="http://schemas.openxmlformats.org/officeDocumen +t/2006/math" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns: +r="http://schemas.openxmlformats.org/officeDocument/2006/relationship +s" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:ve="http://schemas.o +penxmlformats.org/markup-compatibility/2006" xmlns:w10="urn:schemas-m +icrosoft-com:office:word" xmlns:wne="http://schemas.microsoft.com/off +ice/word/2006/wordml" xmlns:wp="http://schemas.openxmlformats.org/dra +wingml/2006/wordprocessingDrawing"> <w:body> <w:p w:rsidR="00D479B1" w:rsidRDefault="00D479B1"> <w:r> <w:t>1234</w:t> </w:r> </w:p> <w:p w:rsidR="00D479B1" w:rsidRDefault="00D479B1"> <w:r> <w:t>5678</w:t> </w:r> </w:p> <w:sectPr w:rsidR="00D479B1" w:rsidSect="00D479B1"> <w:pgSz w:w="11906" w:h="16838" /> <w:pgMar w:top="1440" w:right="1800" w:bottom="1440" w:left=" +1800" w:header="708" w:footer="708" w:gutter="0" /> <w:cols w:space="708" /> <w:docGrid w:linePitch="360" /> </w:sectPr> </w:body> </w:document>

    becomes:

    <?xml version="1.0" encoding="UTF-8"?> <w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingm +l/2006/main" xmlns:m="http://schemas.openxmlformats.org/officeDocumen +t/2006/math" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns: +r="http://schemas.openxmlformats.org/officeDocument/2006/relationship +s" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:ve="http://schemas.o +penxmlformats.org/markup-compatibility/2006" xmlns:w10="urn:schemas-m +icrosoft-com:office:word" xmlns:wne="http://schemas.microsoft.com/off +ice/word/2006/wordml" xmlns:wp="http://schemas.openxmlformats.org/dra +wingml/2006/wordprocessingDrawing"> <w:body> <w:p w:rsidR="00D479B1" w:rsidRDefault="00D479B1"> <w:r> <w:t>1234</w:t> </w:r> </w:p> <w:p> <w:r> <w:br w:type="page" /> </w:r> </w:p> <w:p w:rsidR="00D479B1" w:rsidRDefault="00D479B1"> <w:r> <w:t>5678</w:t> </w:r> </w:p> <w:sectPr w:rsidR="00D479B1" w:rsidSect="00D479B1"> <w:pgSz w:w="11906" w:h="16838" /> <w:pgMar w:top="1440" w:right="1800" w:bottom="1440" w:left=" +1800" w:header="708" w:footer="708" w:gutter="0" /> <w:cols w:space="708" /> <w:docGrid w:linePitch="360" /> </w:sectPr> </w:body> </w:document>

    See also the other links already provided in this thread, and their associated links. To be honest your work flow ('I'm using Perl to scrape text from a JavaScript that printed out one page at a time..') seems somewhat convoluted, but you don't go into much detail.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11110813]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (2)
As of 2024-04-24 23:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found