Here-document extraction

Have you ever found yourself working on code with a lot of here-documents and wished you could read the here-documents outside the context of the source file?

I'm working on a CGI application with a lot of here-docs, and this little chunk of code is proving itself useful. The output includes tags with the subroutine name (if applicable, and line number) in which the here-document was found, and BEGIN:line, END:line tags (planning ahead for other uses of the output).

Running the code without a filename argument dups the DATA filehandle and runs the program against the test input in the __END__ section. The output from the test input:

[here-docs in source file: <DATA>]


@@sub:bar:1@@

@@BEGIN:2@@
print <<'FOO';
blah
FOO
@@END:4@@


@@sub:ralph:7@@

@@BEGIN:7@@
sub ralph { print <<EOF }
This is a test!
Here's some more text.
EOF
@@END:10@@

@@BEGIN:13@@
$foo = <<STRING;
blah blah blah blah.
how's that?
STRING
@@END:16@@
[download]

Follow the Read More link for the code:

#!/usr/bin/perl

# usage: grep.html-here-docs filename
# usage: grep.html-here-docs
# (without filename argument, runs
# against test data in the __END__ section)

# grep out all here-documents:
# prints the source filename at the top of the output.
# prints the last subroutine definition name seen
# for each here-document.

use warnings;
use strict;

use re 'eval';

my $R_SP     = qq{[\x20\t]};
my $R_QUOTE  = qq{[\'\"]};

my $TERM = '';
my $p = qr!
    ^                       # at beginning of line
        [^\x23]*?           # match one or more non-comment chars
        (?:
            print ${R_SP}*          # match print, one or more space
            |                       # OR
            \w+ ${R_SP}* = ${R_SP}* # assignment
        )
        << ${R_SP}*         # begin here-doc, zero or more spaces
        (${R_QUOTE}?)       # an optional quote character
        (\w+)               # match/capture one or more word character
+s
        (?(1) \1 )          # if a quote was matched, look for another
+.
        (?(2) (?{$TERM = $2}) )
!x;

my @sub;
my $new_here_doc = 1;

if (@ARGV) {
    open FH, $ARGV[0] or die "error opening input: $!";
} else {
    open FH, "<&DATA" or die "error duping DATA!: $!";
}

while (<FH>) {
    if (1 == $.) { print "[here-docs in source file: @{[@ARGV ? $ARGV[
+0] : '<DATA>']}]\n\n" }
    if (/^[ \t]*sub[ \t]*(\w+)/) {
        @sub = ($1,$.);
    }
    if (my $s = /$p/../^${TERM}/) {
        if ((1 == $s) && @sub) {
            print "\n\@\@sub:$sub[0]:$sub[1]\@\@\n\n";
            @sub = ();
        }
        if ($new_here_doc) {
            print "\@\@BEGIN:$.\@\@\n";
            $new_here_doc = 0;
        }
        print;
        if ($s =~ /E0$/) {
            print "\@\@END:$.\@\@\n\n";
            $TERM = '';
            $new_here_doc = 1;
        }
    }
}

close FH;

__END__
sub bar {
print <<'FOO';
blah
FOO
}

sub ralph { print <<EOF }
This is a test!
Here's some more text.
EOF

# try matching an assignment:
$foo = <<STRING;
blah blah blah blah.
how's that?
STRING
[download]

Update: Fixed thinko in error msg when dup DATA filehandle fails

Comment on Here-document extraction Select or Download Code


more useful options
	PerlMonks