Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re: Escape newlines in POD / (Selectively) don't generate space characters instead

by haukex (Archbishop)
on Nov 08, 2020 at 10:01 UTC ( [id://11123499]=note: print w/replies, xml ) Need Help??


in reply to Escape newlines in POD / (Selectively) don't generate space characters instead

Doing so, I noticed that perlpod seems to generally replace newlines with space characters (sometimes even multiple ones).

At least for Pod::Simple, this appears to happen deep within the parser, specifically, here. The preserve_whitespace probably wouldn't help here because it would just preserve the newlines. Unfortunately, it looks to me like the "best" choice is to not break L<>s across multiple lines.

but two(!) space characters between continues. and This. I don't want this to happen.

Looking at the output of pod2man, this appears to be an effect of *roff typesetting, see e.g. Sentences and .ss. An argument could be made that Pod::Man should be merging POD paragraphs into one line to avoid the issue you're seeing.

Replies are listed 'Best First'.
Re^2: Escape newlines in POD / (Selectively) don't generate space characters instead
by Ionic (Acolyte) on Nov 09, 2020 at 01:42 UTC

    Oh, those are some excellent points. I probably should have looked at more code than just Pod::Simple::HTML.

    At least for Pod::Simple, this appears to happen deep within the parser, specifically, here.
    Right, and to specify, both Pod::Man and Pod::Simple::HTML naturally use it.

    The preserve_whitespace probably wouldn't help here because it would just preserve the newlines.
    Yes, it does quite exactly the opposite of what I'm looking for - selectively squash whitespace completely. But, oh, look at what does use preserve_whitespace! Just like Pod::Text, really. But even quickly scanning the code reveals some inconsistencies between code and comments in Pod::Text and Pod::Man - although for a completely different parsing aspect. In recap, Pod::Man seems to handle such line-broken L<> tags correctly, but that's almost not surprising since man doesn't have any (real) notion of (hyper-) links, as far as I know.

    To elaborate a bit: I don't even really plan on generating HTML documentation (at least not personally), but the HTML output was a prime candidate for testing my internal and external links within the documentation. podchecker does a basic sanity checks on links, but nothing's better than actually seeing your generated links in a fully generated documentation. I guess I could have also used pod2texi, but I try to avoid GNU Info because I can never remember how to use it correctly …

    Unfortunately, it looks to me like the "best" choice is to not break L<>s across multiple lines.
    Unfortunately it looks like that. However, I think that this could be classified as a bug or at least a somewhat valid intent to add a new feature for escaping capabilities to Pod::Simple.

    Looking at the output of pod2man, this appears to be an effect of *roff typesetting, see e.g. Sentences and .ss.
    Exactly - and I have been misinterpreting this as the same "space instead of newline" behavior, when in reality it's some special property of the roff language family. From the groff manual, I've learned that it's only applying that special rule if the punctuation character is located at exactly the end of the line. Interestingly, and I also learned that, the groff style guide recommends to start each sentence (and thus also end each sentence) on a proper line. Since Pod::Man is not changing whitespace (mostly), this explains why I noticed that behavior at times and didn't at other times.

    An argument could be made that Pod::Man should be merging POD paragraphs into one line to avoid the issue you're seeing.
    Yep, that would help. Either that, or split up sentences on punctuation marks so that each sentence starts/ends on its proper line, to adhere to groff's recommendations (in which case every sentence would get a double space by default - which is fine, since then at least it would be consistent), although that would probably quickly get unwieldy. That would require a parser that is sophisticated enough to detect sentences (and, e.g., ignore mid-sentence punctuation characters or those surrounding the special characters given in the groff manual) and that, frankly, is probably too much to ask for. Instead, I could format my POD directly in a groff-recommended way, which generally shouldn't have any negative impact on other POD renderers.


    OT: what's the proper way to quote on PerlMonks? Just copying the text loses all formatting, naturally, so I had to re-add the most important ones. Yes, fetching the data out of the browser would be possible, but then it's processed and not the bare markup any longer.

    OT2: [apc://] seems to be broken since the repository was renamed to "perl5". Given that, it's easier to link to GitHub, which is also one of the documented source code locations.

      However, I think that this could be classified as a bug or at least a somewhat valid intent to add a new feature for escaping capabilities to Pod::Simple.

      I think Pod::Simple is pretty complex (see e.g. Pod::Simple::BlackBox), I'm not sure if changes like that are easy to implement and test, especially given the huge amount of POD out in the wild. An argument could also be made that perlpodspec is simply missing a more clear statement of "don't break the link part of L<> across lines"...

      That would require a parser that is sophisticated enough to detect sentences ... and that, frankly, is probably too much to ask for. Instead, I could format my POD directly in a groff-recommended way, which generally shouldn't have any negative impact on other POD renderers.

      Yes, I agree, and if you don't mind formatting your POD in a *roff-friendly way (POD is a pretty old format) then that's probably the best solution. (Update: Personally, I wouldn't put in the effort to do this, because I don't use man to read Perl docs, I always use perldoc, which doesn't do the double space after sentences. But if you don't mind doing this, then the nice thing about it is your docs will look consistent no matter if read via man, perldoc, or HTML.)

      OT: what's the proper way to quote on PerlMonks? Just copying the text loses all formatting

      I often re-add formatting by hand, but you could also access the XML version via the link at the top of the page under the title to get to the original markup.

      [apc://] seems to be broken

      Yeah, that one seems to be outdated. A hyperlink to GitHub is probably best nowadays, hopefully that repository won't change so soon (the renaming of the repo on perl5.git.perl.org unfortunately broke a bunch of my links that I need to go back and fix sometime...).

        I think Pod::Simple is pretty complex […] I'm not sure if changes like that are easy to implement and test, especially given the huge amount of POD out in the wild.
        Well, I don't care about old documents anyway!
        Jokes aside, I took the time and created a new bug report for this. In there, I tried to verbosely explain the problem, the unsatisfactory workarounds that aren't really working universally anyway, and, crucially, ask for new formatter code called M<> to be introduced, that just eats any data/text passed to it. It could also be called H<> for "hide", but these are just specifics. I believe that this is a good option for several reasons:
        • It doesn't change any previous behavior or markup. That's the main reason why I didn't ask for extending the spec of Z<> to be more lenient and take any text while just consuming it - the behavior of the Z<> formatting code would differ between Perl versions and that can be a real pain.
        • Parsing formatting codes is already implemented, introducing a new one should be relatively easy. The only exception to that is that, theoretically, any text wrapped in M<> would need to be taken verbatim and not POD-interpreted, but I don't see this as a huge implementation issue.
        • The documentation for the L<> formatting code can easily be extended to show line-breaking capabilities with the new M<> code.

        Yes, I agree, and if you don't mind formatting your POD in a *roff-friendly way […] then that's probably the best solution.
        But if you don't mind doing this, then the nice thing about it is your docs will look consistent no matter if read via man, perldoc, or HTML.
        I did that for the documentation I've written yesterday and it really hasn't been a big deal. Now, all generated documentation is consistent. man pages have two spaces after each sentence, all the other formats just show up normally. Pretty nice.
        I should probably change the perldoc/perldocspec documentation to add this as a recommendation. That won't help "other" people/older documentation to be more consistent, unless they re-read the changed parts, but at least new developers/documentation could benefit.

        Personally, I wouldn't put in the effort to do this, because I don't use man to read Perl docs, I always use perldoc, which doesn't do the double space after sentences.
        To be honest, I avoid the perldoc command like the plague. So far, whenever I tried it, it just generated plain text, broken at 80 chars per line, with control characters replaced by plain ASCII characters like * and _.
        There seem to be multiple weirdnesses:
        • It seems to default to the Pod::Perldoc::ToText formatter, even though I'm using a capable terminal on a Linux system. This seems to have been introduced in 5.27.5.
        • Overriding the default formatter via -o man makes it pick the Pod::Perldoc::ToMan formatter, which generates the man page internally and then renders it out as plain text (with a bit of control characters). I'd rather like to get *roff output, though, to pass it to man
        • Forcing the Pod::Man formatter via -M Pod::Man actually generates *roff, yay! However, it then goes ahead to call manpager (likely because my environment says MANPAGER=manpager, a Gentoo-specific wrapper to "enable colored man output" and which really just calls $PAGER/less). While GNU less can parse *roff nowadays, I don't want that either …
        • Finally, I can enforce the usage of man via PERLDOC_PAGER='man -l -'. So, to get the output I'd like, I'd have to replace perldoc with a shell function calling PERLDOC_PAGER='man -l -' perldoc -M 'Pod::Man' "${@}" but that also disables all its internal magic …
        I guess I'll just stick to man itself. :)

        […] but you could also access the XML version via the link at the top of the page under the title to get to the original markup.
        D'oh, thanks. I tried doing so, but used the XML link while writing the reply, which naturally gave me a pretty empty doctext entry. I should have used it on the parent's node page …

      since man doesn't have any (real) notion of (hyper-) links, as far as I know.

      Its been almost two decades, but I remember clicking on links in perlpod manpages ... I think using bin/info ...

      didn't take me long to switch to a web browser for clicking on docs

        Its been almost two decades, but I remember clicking on links in perlpod manpages ... I think using bin/info ...
        Heh, yeah, but those are "hacks" in the viewers, as also explained in the linked answer:
        […] most pages aren't really designed for hypertext, and the default man program doesn't support it […]
        There are however man page viewing programs that reconstruct some hyperlinks […]

        My point was just that the underlying language doesn't support hyperlinks and, as such, a generator needn't check the validity of links. This is probably different for formats like Texinfo (or, more prominently, (X)HTML), which do understand and use the concept of hyperlinks.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11123499]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (7)
As of 2024-04-23 12:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found