Pursuant to the thread started in Embedding pod in C with suggestions from John M. Dlugosz and JavaFan (including starting a new thread to get attract fresh eyes and request a serious review), enclosed is a first cut at a preprocessor. It will extract pod (see perlpod) from languages other than perl, where commenting conventions or stylistic preferences prevent starting everything in column 0. This version shows how it is easy to extend the preprocessing to languages other than C or even to perl itself, to slightly relax the column 0 restrictions. The example is not comprehensive or documented, awaiting comments.
Finding the right set of options will be important. I favor language-wide behavior to encourage "standards", but the model will allow fine tuning of the control over how to recognize the start and stop of the pod, and how to trim it to generate genuine pod. Verbatim lines need some sort of special identification (currently =v followed by whitespace) to allow processed lines to begin in some column other than 0. As requested, a newline is added between blocks of pod (as needed).
The code is mostly initialization, to give a sense of how to control things. Processing the input is quite straightforward, and could be even simpler, if we drop control over whether the start and stop sequences are, themselves, included in the output. Comments, please.
#!/usr/bin/perl -w
use strict;
use Getopt::Long;
my %languages = (
'c' => [
'^\s*#\s*ifdef\s+pod\b', 0,
'^\s*#\s*endif\s*/\*\s*pod\s*\*/', 0,
'^\s*', '^\s*=v\s',
],
'awk' => [
'^\s*#\s*=pod\b', 0,
'^\s*#\s*=cut\b', 0,
'^\s*#\s*', '^\s*#\s*=v\s',
],
'perl' => [
'^\s*=pod$', 0, '^\s*=cut$', 0,
'^\s*', '^\s*=v\s',
],
);
for my $l qw( C c++ C++ ) {
$languages{$l} = $languages{c};
}
my $language = 'c';
my ( $start, $showstart, $stop, $showstop, $trim, $verbatim ) =
@{ $languages{c} };
my $result = GetOptions(
"language=s" => \$language,
"start=s" => \$start,
"stop=s" => \$stop,
"trim=s" => \$trim,
"verbatim=s" => \$verbatim,
"showstart" => \$showstart,
"showstop" => \$showstop,
);
exit(1) unless ($result);
if ( $language ne 'c' ) {
unless ( exists( $languages{$language} ) ) {
die("Language '$language' not recognized\n");
}
( $start, $showstart, $stop, $showstop, $trim, $verbatim ) =
@{ $languages{$language} };
}
$start = qr{$start};
$stop = qr{$stop};
$trim = qr{$trim};
$verbatim = qr{$verbatim};
my $show = 0;
my $lastempty = 1;
while ( my $line = <DATA> ) {
if ( $line =~ $start ) {
unless ($lastempty) {
$lastempty = 1;
print "\n";
}
$show = 1;
next unless ($showstart);
}
elsif ( $line =~ $stop ) {
$show = 0;
goto SHOWSTOPPER if ($showstop);
}
if ($show) {
SHOWSTOPPER:
chomp($line);
$line =~ s/$trim//;
$line =~ s/$verbatim/ /;
$lastempty = ( $line eq '' );
print $line, "\n";
}
}
__DATA__
This could be anything
#ifdef pod
=head2 title
blah, blah, blah,
blah, blah
=v indent 1
#endif /* pod */
This could be anything, too
#ifdef pod
=head2 another title
yo ho ho
#endif /* pod */
more anything
Updated: changed
^.* to
^\s* for
c patterns, which was my original intent. Thanks for spotting the error,
John M. Dlugosz!