Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

parsing an ASP file

by dada (Chaplain)
on May 12, 2004 at 13:27 UTC ( #352721=snippet: print w/replies, xml ) Need Help??
Description: this function reads an ASP file (yes, sometimes I have to fight with such stuff) separating code blocks from HTML blocks. the function returns an array where each element is a 2-element array. the first element is either "HTM" or "ASP", and the second one is the block itself. ASP tags (<% and %>) are removed.

cheers,
Aldo

sub get_asp_blocks {
    my($file) = @_;
    open(FILE, $file) or die "can't open '$file': $!\n";

    my @blocks = ( ["HTM", ""] );
    my $state = "HTM";
    my $last;
    while(read(FILE, $char, 1)) {
        if($last eq "<" && $char eq "%" && $state eq "HTM") {
            chop $blocks[-1][1];
            $state = "ASP";
            push(@blocks, ["ASP", ""]);
        } elsif($last eq "%" && $char eq ">" && $state eq "ASP") {
            chop $blocks[-1][1];
            $state = "HTM";
            push(@blocks,  ["HTM", ""]);
        } else {
            $blocks[-1][1] .= $char;
        }
        $last = $char;
    }
    close(FILE);
    return @blocks;
}
Replies are listed 'Best First'.
Re: parsing an ASP file
by Juerd (Abbot) on May 12, 2004 at 17:15 UTC

    I think (but have not tested) that even an inefficient regex is faster than reading one character at a time. It is certainly easier to write :)

    my @parsed; while ($asp =~ /\G ((?: [^<]+ | <(?!%) )*) (?: <%(.*?)%> | ((?=<%)) )? + /gsx) { $1 and push @parsed, [ html => $1 ]; $2 and push @parsed, [ asp => $2 ]; defined $3 and die "Unclosed ASP code block near '", $asp =~ /\G(<%\s*\n?.*)/g, "'.\n"; }
    But, of course,
    <% foo = "a mere %> breaks either simple minded solution." %>

    Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

      yep. one thing I forgot to mention is that, for the application I'm currently writing (which is basically an ASP cross-reference generator) I need to have the line number where each block appears. so, the code I'm using is something more like:
      sub get_asp_blocks { my($file) = @_; open(FILE, $file) or die "can't open '$file': $!\n"; my $dot = 1; my @blocks = ( ["HTM", $dot, ""] ); my $state = "HTM"; my $last; while(read(FILE, $char, 1)) { $dot++ if $char eq "\n"; if($last eq "<" && $char eq "%" && $state eq "HTM") { chop $blocks[-1][-1]; $state = "ASP"; push(@blocks, ["ASP", $dot, ""]); } elsif($last eq "%" && $char eq ">" && $state eq "ASP") { chop $blocks[-1][-1]; $state = "HTM"; push(@blocks, ["HTM", $dot, ""]); } else { $blocks[-1][-1] .= $char; } $last = $char; } close(FILE); return @blocks; }
      this way, each element of the returned array contains three elements: the type (ASP or HTM), the line number, and the block itself.

      cheers,
      Aldo

      King of Laziness, Wizard of Impatience, Lord of Hubris

        my $state = "HTM";

        The state is what I don't like. It means that everything needs to be done manually. So to get the line numbers, I'd probably just extend the regex with one set of all-enclosing parens (or for simple stand-alone scripts just use $&), and then count the number of \n characters found in it.

        my @parsed; my $line = 1; while ($asp =~ /\G( ((?: [^<]+ | <(?!%) )*) (?: <%(.*?)%> | ((?=<%)) ) +? )/gsx) { $2 and push @parsed, [ $line, html => $2 ]; $3 and push @parsed, [ $line, asp => $3 ]; defined $4 and die "Unclosed ASP code block starting on line $line + near '", $asp =~ /\G(<%\s*\n?.*)/g, "'.\n"; $line += $1 =~ tr/\n//; }

        Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

      Ah, but a more complete version is easy to write too! :) (Although, I admit, a bit more longwinded...)

      use re 'eval'; my $string = qr[ " [^"\\]* (?:\\.|[^"\\])* " | ' [^'\\]* (?:\\.|[^'\\])* ' ]x; my $alist = qr[(?: [^"'>]* | $string )*]x; my $ehead = qr[ <\w+ $alist /? > ]x; my $textarea = qr[ <textarea $alist> (?: [^<]* | < (?!/textarea>) )* </textarea> ]x; my $asp = qr[ <% (?: (?> [^%"']* ) | $string | % (?! > ) )+ %> ]x; my $html = qr[ (?: (?> [^<"'] ) | $textarea | $ehead | </\w+> )+ ]x; my @parsed; () = $string =~ / ($asp) (?{ push @parsed, [asp => $1] }) | ($html) (?{ push @parsed, [html => $2] }) /gx;
Re: parsing an ASP file
by perlinux (Deacon) on May 12, 2004 at 14:03 UTC
    Good job! Do you think it can work for other "tag" languages (PHP, JSP), only changing the tag? Or changing the code... Does it needs great changes? I don't ever worked with ASP...

    Italian: ho seguito la discussione di questo code in chat... :-)
      as far as I know (which isn't very far :-) PHP uses <? .. ?> as delimiters, so you could just change "%" to "?" in the eqs above (and of course, change "ASP" to "PHP" if you prefer) and it should work.

      JSP seems to be using its own tag library (things like <jsp:getProperty .. /> and so on) as well as blocks like <% .. %>. perhaps you could use this tool and a full-blown HTML (or XHTML) parser to recognize JSP tags in HTML blocks, but I really don't know.

      cheers,
      Aldo

      King of Laziness, Wizard of Impatience, Lord of Hubris

        PHP is configurable to use [http://us4.php.net/basic-syntax|many tags: <? <% <?php and <script language="php">

        --
        I'm not belgian but I play one on TV.

Re: parsing an ASP file
by iburrell (Chaplain) on May 25, 2004 at 21:43 UTC
    The type element ('HTM' or 'ASP') is not required if you leave the ASP tags in the strigns. The parser would effectively split the files into a list of chunks. It would be easy to tell ASP chunks because they start with '<%'.

    I have seen a regex-based XML parser that works this way. It breaks the XML into strings which can be identified by looking at the first couple of characters.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: snippet [id://352721]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (2)
As of 2022-01-19 22:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    In 2022, my preferred method to securely store passwords is:












    Results (56 votes). Check out past polls.

    Notices?