Have you ever found yourself working on code with a lot
of here-documents and wished you could read the here-documents
outside the context of the source file?
I'm working on a CGI application with a lot of here-docs, and
this little chunk of code is proving itself useful. The output
includes tags with the subroutine name (if applicable, and line
number) in which the here-document was found, and BEGIN:line,
END:line tags (planning ahead for other uses of the output).
Running the code without a filename argument dups the DATA
filehandle and runs the program against the test input in
the __END__ section. The output from the test input:
[here-docs in source file: <DATA>]
@@sub:bar:1@@
@@BEGIN:2@@
print <<'FOO';
blah
FOO
@@END:4@@
@@sub:ralph:7@@
@@BEGIN:7@@
sub ralph { print <<EOF }
This is a test!
Here's some more text.
EOF
@@END:10@@
@@BEGIN:13@@
$foo = <<STRING;
blah blah blah blah.
how's that?
STRING
@@END:16@@
Follow the Read More link for the code:
#!/usr/bin/perl
# usage: grep.html-here-docs filename
# usage: grep.html-here-docs
# (without filename argument, runs
# against test data in the __END__ section)
# grep out all here-documents:
# prints the source filename at the top of the output.
# prints the last subroutine definition name seen
# for each here-document.
use warnings;
use strict;
use re 'eval';
my $R_SP = qq{[\x20\t]};
my $R_QUOTE = qq{[\'\"]};
my $TERM = '';
my $p = qr!
^ # at beginning of line
[^\x23]*? # match one or more non-comment chars
(?:
print ${R_SP}* # match print, one or more space
| # OR
\w+ ${R_SP}* = ${R_SP}* # assignment
)
<< ${R_SP}* # begin here-doc, zero or more spaces
(${R_QUOTE}?) # an optional quote character
(\w+) # match/capture one or more word character
+s
(?(1) \1 ) # if a quote was matched, look for another
+.
(?(2) (?{$TERM = $2}) )
!x;
my @sub;
my $new_here_doc = 1;
if (@ARGV) {
open FH, $ARGV[0] or die "error opening input: $!";
} else {
open FH, "<&DATA" or die "error duping DATA!: $!";
}
while (<FH>) {
if (1 == $.) { print "[here-docs in source file: @{[@ARGV ? $ARGV[
+0] : '<DATA>']}]\n\n" }
if (/^[ \t]*sub[ \t]*(\w+)/) {
@sub = ($1,$.);
}
if (my $s = /$p/../^${TERM}/) {
if ((1 == $s) && @sub) {
print "\n\@\@sub:$sub[0]:$sub[1]\@\@\n\n";
@sub = ();
}
if ($new_here_doc) {
print "\@\@BEGIN:$.\@\@\n";
$new_here_doc = 0;
}
print;
if ($s =~ /E0$/) {
print "\@\@END:$.\@\@\n\n";
$TERM = '';
$new_here_doc = 1;
}
}
}
close FH;
__END__
sub bar {
print <<'FOO';
blah
FOO
}
sub ralph { print <<EOF }
This is a test!
Here's some more text.
EOF
# try matching an assignment:
$foo = <<STRING;
blah blah blah blah.
how's that?
STRING
Update: Fixed thinko in error msg when dup DATA
filehandle fails