Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: Re: Re: parsing an ASP file

by Juerd (Abbot)
on May 23, 2004 at 22:57 UTC ( [id://355785]=note: print w/replies, xml ) Need Help??


in reply to Re: Re: parsing an ASP file
in thread parsing an ASP file

my $state = "HTM";

The state is what I don't like. It means that everything needs to be done manually. So to get the line numbers, I'd probably just extend the regex with one set of all-enclosing parens (or for simple stand-alone scripts just use $&), and then count the number of \n characters found in it.

my @parsed; my $line = 1; while ($asp =~ /\G( ((?: [^<]+ | <(?!%) )*) (?: <%(.*?)%> | ((?=<%)) ) +? )/gsx) { $2 and push @parsed, [ $line, html => $2 ]; $3 and push @parsed, [ $line, asp => $3 ]; defined $4 and die "Unclosed ASP code block starting on line $line + near '", $asp =~ /\G(<%\s*\n?.*)/g, "'.\n"; $line += $1 =~ tr/\n//; }

Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

Replies are listed 'Best First'.
Re: Re: Re: Re: parsing an ASP file
by dada (Chaplain) on May 25, 2004 at 09:21 UTC
    well, your code looks surely good, but seems to be failing line count. on a simple ASP page of mine I get these results:

    mine yours
    HTM 1 HTM 1
    ASP 31 ASP 1
    HTM 31 HTM 31
    ASP 44 ASP 31
    HTM 46 HTM 46
    ASP 50 ASP 46
    HTM 50 HTM 50
    ASP 55 ASP 50
    HTM 59 HTM 59
    ASP 73 ASP 59
    HTM 75 HTM 75

    that is, it counts correctly for HTM blocks, but doesn't increment the line number for ASP blocks. I tried moving the line $line += ... before the push, but it didn't help.

    cheers,
    Aldo

    King of Laziness, Wizard of Impatience, Lord of Hubris

      seems to be failing line count.

      You're right. Because the regex can match a block of html and a block of asp in one go, in between $line already needs to be updated. So I removed the extra set of parens and the counter line again and added two new \n-counters: one for $1 and one for $2.

      my @parsed; my $line = 1; while ($asp =~ /\G((?: [^<]+ | <(?!%) )*) (?: <%(.*?)%> | ((?=<%)) )?/ +gsx) { $1 and push @parsed, [ $line, html => $1 ]; $line += $1 =~ tr/\n//; $2 and push @parsed, [ $line, asp => $1 ]; $line += $2 =~ tr/\n//; defined $3 and die "Unclosed ASP code block starting on line $line + near '", $asp =~ /\G(<%\s*\n?.*)/g, "'.\n"; }

      Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

        /\G((?: [^<]+ | <(?!%) )*) (?: <%(.*?)%> | ((?=<%)) )?/gsx

        Just for fun, I wanted to try to make a Perl 6 rule from this. Here it is, untested and mostly guessed. I have no idea how to do line numbers, so I cheated and imagined a method of .pos for that :)

        rule code_begin ($type) { <{ { asp => '<%', php => '<?', plp => '<:' }.{$type} // fail "Unknown type: $type" }> } rule code_end ($type) { <{ { asp => '%>', php => '?>', plp => ':>' }.{$type} // fail "Unknown type: $type" }> } rule code_block ($type) { <code_begin $type> (.*?) <code_end $type> } rule code_document ($type) { [ # First, match any number of subsequent code blocks. [ { $?line := .pos.line } <code_block $type> :: { push @?blocks, [ $?line, code => $?code_block ] } ]* # If there is now a code_begin, obviously that is an open bloc +k. # (In PLP, that is valid, but let's assume for now that it's n +ot.) [ { $?line := .pos.line } <code_begin $type> \n* $?context := (\N<,15>) { fail "Unclosed code block on line $?line, near '$?contex +t'" } ]? # And then a piece of text. # (At least one character, to avoid having empty blocks.) [ { $?line := .pos.line } $?html := (.+?) [ <before <code_begin $type>> | $ ] :: { push @?blocks, [ $?line, html => $?html ] } ]? ]* } my @parsed = ($asp ~~ /<code_document 'asp'>/).{blocks}

        Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://355785]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (4)
As of 2024-04-25 12:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found