Re: Escape newlines in POD / (Selectively) don't generate space characters instead

Replies are listed 'Best First'.
Re^2: Escape newlines in POD / (Selectively) don't generate space characters instead by Ionic (Acolyte) on Nov 09, 2020 at 01:42 UTC
Oh, those are some excellent points. I probably should have looked at more code than just Pod::Simple::HTML. At least for Pod::Simple, this appears to happen deep within the parser, specifically, here. Right, and to specify, both Pod::Man and Pod::Simple::HTML naturally use it. The `preserve_whitespace` probably wouldn't help here because it would just preserve the newlines. Yes, it does quite exactly the opposite of what I'm looking for - selectively squash whitespace completely. But, oh, look at what does use `preserve_whitespace`! Just like Pod::Text, really. But even quickly scanning the code reveals some inconsistencies between code and comments in Pod::Text and Pod::Man - although for a completely different parsing aspect. In recap, Pod::Man seems to handle such line-broken `L<>` tags correctly, but that's almost not surprising since man doesn't have any (real) notion of (hyper-) links, as far as I know. To elaborate a bit: I don't even really plan on generating HTML documentation (at least not personally), but the HTML output was a prime candidate for testing my internal and external links within the documentation. podchecker does a basic sanity checks on links, but nothing's better than actually seeing your generated links in a fully generated documentation. I guess I could have also used pod2texi, but I try to avoid GNU Info because I can never remember how to use it correctly … Unfortunately, it looks to me like the "best" choice is to not break `L<>`s across multiple lines. Unfortunately it looks like that. However, I think that this could be classified as a bug or at least a somewhat valid intent to add a new feature for escaping capabilities to Pod::Simple. Looking at the output of pod2man, this appears to be an effect of `*roff` typesetting, see e.g. Sentences and `.ss`. Exactly - and I have been misinterpreting this as the same "space instead of newline" behavior, when in reality it's some special property of the `roff` language family. From the `groff` manual, I've learned that it's only applying that special rule if the punctuation character is located at exactly the end of the line. Interestingly, and I also learned that, the `groff` style guide recommends to start each sentence (and thus also end each sentence) on a proper line. Since Pod::Man is not changing whitespace (mostly), this explains why I noticed that behavior at times and didn't at other times. An argument could be made that Pod::Man should be merging POD paragraphs into one line to avoid the issue you're seeing. Yep, that would help. Either that, or split up sentences on punctuation marks so that each sentence starts/ends on its proper line, to adhere to `groff`'s recommendations (in which case every sentence would get a double space by default - which is fine, since then at least it would be consistent), although that would probably quickly get unwieldy. That would require a parser that is sophisticated enough to detect sentences (and, e.g., ignore mid-sentence punctuation characters or those surrounding the special characters given in the `groff` manual) and that, frankly, is probably too much to ask for. Instead, I could format my POD directly in a `groff`-recommended way, which generally shouldn't have any negative impact on other POD renderers. OT: what's the proper way to quote on PerlMonks? Just copying the text loses all formatting, naturally, so I had to re-add the most important ones. Yes, fetching the data out of the browser would be possible, but then it's processed and not the bare markup any longer. OT2: `[apc://]` seems to be broken since the repository was renamed to "perl5". Given that, it's easier to link to GitHub, which is also one of the documented source code locations.	[reply] [d/l] [select]
Re^3: Escape newlines in POD / (Selectively) don't generate space characters instead by haukex (Archbishop) on Nov 09, 2020 at 21:02 UTC
However, I think that this could be classified as a bug or at least a somewhat valid intent to add a new feature for escaping capabilities to Pod::Simple. I think Pod::Simple is pretty complex (see e.g. Pod::Simple::BlackBox), I'm not sure if changes like that are easy to implement and test, especially given the huge amount of POD out in the wild. An argument could also be made that perlpodspec is simply missing a more clear statement of "don't break the link part of `L<>` across lines"... That would require a parser that is sophisticated enough to detect sentences ... and that, frankly, is probably too much to ask for. Instead, I could format my POD directly in a groff-recommended way, which generally shouldn't have any negative impact on other POD renderers. Yes, I agree, and if you don't mind formatting your POD in a `roff`-friendly way (POD is a pretty old format) then that's probably the best solution. (Update:* Personally, I wouldn't put in the effort to do this, because I don't use `man` to read Perl docs, I always use `perldoc`, which doesn't do the double space after sentences. But if you don't mind doing this, then the nice thing about it is your docs will look consistent no matter if read via `man`, `perldoc`, or HTML.) OT: what's the proper way to quote on PerlMonks? Just copying the text loses all formatting I often re-add formatting by hand, but you could also access the XML version via the link at the top of the page under the title to get to the original markup. `[apc://]` seems to be broken Yeah, that one seems to be outdated. A hyperlink to GitHub is probably best nowadays, hopefully that repository won't change so soon (the renaming of the repo on `perl5.git.perl.org` unfortunately broke a bunch of my links that I need to go back and fix sometime...).	[reply] [d/l] [select]
Re^4: Escape newlines in POD / (Selectively) don't generate space characters instead by Ionic (Acolyte) on Nov 10, 2020 at 05:05 UTC
I think Pod::Simple is pretty complex […] I'm not sure if changes like that are easy to implement and test, especially given the huge amount of POD out in the wild. Well, I don't care about old documents anyway! Jokes aside, I took the time and created a new bug report for this. In there, I tried to verbosely explain the problem, the unsatisfactory workarounds that aren't really working universally anyway, and, crucially, ask for new formatter code called `M<>` to be introduced, that just eats any data/text passed to it. It could also be called `H<>` for "hide", but these are just specifics. I believe that this is a good option for several reasons: It doesn't change any previous behavior or markup. That's the main reason why I didn't ask for extending the spec of `Z<>` to be more lenient and take any text while just consuming it - the behavior of the `Z<>` formatting code would differ between Perl versions and that can be a real pain. Parsing formatting codes is already implemented, introducing a new one should be relatively easy. The only exception to that is that, theoretically, any text wrapped in `M<>` would need to be taken verbatim and not POD-interpreted, but I don't see this as a huge implementation issue. The documentation for the `L<>` formatting code can easily be extended to show line-breaking capabilities with the new `M<>` code. Yes, I agree, and if you don't mind formatting your POD in a `roff`-friendly way […] then that's probably the best solution.* But if you don't mind doing this, then the nice thing about it is your docs will look consistent no matter if read via `man`, `perldoc`, or HTML. I did that for the documentation I've written yesterday and it really hasn't been a big deal. Now, all generated documentation is consistent. `man` pages have two spaces after each sentence, all the other formats just show up normally. Pretty nice. I should probably change the `perldoc`/`perldocspec` documentation to add this as a recommendation. That won't help "other" people/older documentation to be more consistent, unless they re-read the changed parts, but at least new developers/documentation could benefit. Personally, I wouldn't put in the effort to do this, because I don't use `man` to read Perl docs, I always use `perldoc`, which doesn't do the double space after sentences. To be honest, I avoid the `perldoc` command like the plague. So far, whenever I tried it, it just generated plain text, broken at 80 chars per line, with control characters replaced by plain ASCII characters like `` and `_`. There seem to be multiple weirdnesses: It seems to default to the `Pod::Perldoc::ToText` formatter, even though I'm using a capable terminal on a Linux system. This seems to have been introduced in 5.27.5. Overriding the default formatter via `-o man` makes it pick the `Pod::Perldoc::ToMan` formatter, which generates the man page internally and then renders it out as plain text (with a bit of control characters). I'd rather like to get `roff` output, though, to pass it to `man` … Forcing the `Pod::Man` formatter via `-M Pod::Man` actually generates `roff`, yay! However, it then goes ahead to call `manpager` (likely because my environment says `MANPAGER=manpager`, a Gentoo-specific wrapper to "enable colored man output" and which really just calls `$PAGER`/`less`). While `GNU less` can parse `roff` nowadays, I don't want that either … Finally, I can enforce the usage of `man` via `PERLDOC_PAGER='man -l -'`. So, to get the output I'd like, I'd have to replace `perldoc` with a shell function calling `PERLDOC_PAGER='man -l -' perldoc -M 'Pod::Man' "${@}"` but that also disables all its internal magic … I guess I'll just stick to `man` itself. :) […] but you could also access the XML version via the link at the top of the page under the title to get to the original markup. D'oh, thanks. I tried doing so, but used the XML link while writing the reply, which naturally gave me a pretty empty `doctext` entry. I should have used it on the parent's node page …	[reply] [d/l] [select]
Re^5: Escape newlines in POD / (Selectively) don't generate space characters instead by haukex (Archbishop) on Nov 12, 2020 at 20:21 UTC
Re^3: Escape newlines in POD / (Selectively) don't generate space characters instead by Anonymous Monk on Nov 09, 2020 at 02:10 UTC
since man doesn't have any (real) notion of (hyper-) links, as far as I know. Its been almost two decades, but I remember clicking on links in perlpod manpages ... I think using bin/info ... didn't take me long to switch to a web browser for clicking on docs	[reply]
Re^4: Escape newlines in POD / (Selectively) don't generate space characters instead by Ionic (Acolyte) on Nov 09, 2020 at 02:28 UTC
Its been almost two decades, but I remember clicking on links in perlpod manpages ... I think using bin/info ... Heh, yeah, but those are "hacks" in the viewers, as also explained in the linked answer: […] most pages aren't really designed for hypertext, and the default `man` program doesn't support it […] There are however man page viewing programs that reconstruct some hyperlinks […] My point was just that the underlying language doesn't support hyperlinks and, as such, a generator needn't check the validity of links. This is probably different for formats like Texinfo (or, more prominently, (X)HTML), which do understand and use the concept of hyperlinks.	[reply] [d/l]


Think about Loose Coupling
	PerlMonks