parsing an ASP file

Replies are listed 'Best First'.
Re: parsing an ASP file by Juerd (Abbot) on May 12, 2004 at 17:15 UTC
I think (but have not tested) that even an inefficient regex is faster than reading one character at a time. It is certainly easier to write :) `my @parsed; while ($asp =~ /\G ((?: [^<]+ \| <(?!%) )) (?: <%(.?)%> \| ((?=<%)) )? + /gsx) { $1 and push @parsed, [ html => $1 ]; $2 and push @parsed, [ asp => $2 ]; defined $3 and die "Unclosed ASP code block near '", $asp =~ /\G(<%\s\n?.)/g, "'.\n"; }` [download] But, of course, `<% foo = "a mere %> breaks either simple minded solution." %>` [download] Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }	[reply] [d/l] [select]
Re: Re: parsing an ASP file by dada (Chaplain) on May 20, 2004 at 10:12 UTC
yep. one thing I forgot to mention is that, for the application I'm currently writing (which is basically an ASP cross-reference generator) I need to have the line number where each block appears. so, the code I'm using is something more like: sub get_asp_blocks { my($file) = @_; open(FILE, $file) or die "can't open '$file': $!\n"; my $dot = 1; my @blocks = ( ["HTM", $dot, ""] ); my $state = "HTM"; my $last; while(read(FILE, $char, 1)) { $dot++ if $char eq "\n"; if($last eq "<" && $char eq "%" && $state eq "HTM") { chop $blocks[-1][-1]; $state = "ASP"; push(@blocks, ["ASP", $dot, ""]); } elsif($last eq "%" && $char eq ">" && $state eq "ASP") { chop $blocks[-1][-1]; $state = "HTM"; push(@blocks, ["HTM", $dot, ""]); } else { $blocks[-1][-1] .= $char; } $last = $char; } close(FILE); return @blocks; } [download] this way, each element of the returned array contains three elements: the type (ASP or HTM), the line number, and the block itself. cheers, Aldo King of Laziness, Wizard of Impatience, Lord of Hubris	[reply] [d/l]
Re: Re: Re: parsing an ASP file by Juerd (Abbot) on May 23, 2004 at 22:57 UTC
my $state = "HTM"; The state is what I don't like. It means that everything needs to be done manually. So to get the line numbers, I'd probably just extend the regex with one set of all-enclosing parens (or for simple stand-alone scripts just use $&), and then count the number of \n characters found in it. `my @parsed; my $line = 1; while ($asp =~ /\G( ((?: [^<]+ \| <(?!%) )) (?: <%(.?)%> \| ((?=<%)) ) +? )/gsx) { $2 and push @parsed, [ $line, html => $2 ]; $3 and push @parsed, [ $line, asp => $3 ]; defined $4 and die "Unclosed ASP code block starting on line $line + near '", $asp =~ /\G(<%\s\n?.)/g, "'.\n"; $line += $1 =~ tr/\n//; }` [download] Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }	[reply] [d/l]
Re: Re: Re: Re: parsing an ASP file by dada (Chaplain) on May 25, 2004 at 09:21 UTC
Re: Re: Re: Re: Re: parsing an ASP file by Juerd (Abbot) on May 25, 2004 at 14:47 UTC
Some notes below your chosen depth have not been shown here
Re: Re: parsing an ASP file by jryan (Vicar) on May 13, 2004 at 08:16 UTC
Ah, but a more complete version is easy to write too! :) (Although, I admit, a bit more longwinded...) use re 'eval'; my $string = qr[ " [^"\\]* (?:\\.\|[^"\\])* " \| ' [^'\\]* (?:\\.\|[^'\\])* ' ]x; my $alist = qr[(?: [^"'>]* \| $string )]x; my $ehead = qr[ <\w+ $alist /? > ]x; my $textarea = qr[ <textarea $alist> (?: [^<] \| < (?!/textarea>) )* </textarea> ]x; my $asp = qr[ <% (?: (?> [^%"']* ) \| $string \| % (?! > ) )+ %> ]x; my $html = qr[ (?: (?> [^<"'] ) \| $textarea \| $ehead \| </\w+> )+ ]x; my @parsed; () = $string =~ / ($asp) (?{ push @parsed, [asp => $1] }) \| ($html) (?{ push @parsed, [html => $2] }) /gx; [download]	[reply] [d/l]
Re: parsing an ASP file by perlinux (Deacon) on May 12, 2004 at 14:03 UTC
Good job! Do you think it can work for other "tag" languages (PHP, JSP), only changing the tag? Or changing the code... Does it needs great changes? I don't ever worked with ASP... Italian: ho seguito la discussione di questo code in chat... :-)	[reply]
Re: Re: parsing an ASP file by dada (Chaplain) on May 12, 2004 at 15:27 UTC
as far as I know (which isn't very far :-) PHP uses `<? .. ?>` as delimiters, so you could just change "%" to "?" in the `eq`s above (and of course, change "ASP" to "PHP" if you prefer) and it should work. JSP seems to be using its own tag library (things like `<jsp:getProperty .. />` and so on) as well as blocks like `<% .. %>`. perhaps you could use this tool and a full-blown HTML (or XHTML) parser to recognize JSP tags in HTML blocks, but I really don't know. cheers, Aldo King of Laziness, Wizard of Impatience, Lord of Hubris	[reply]
Re: Re: Re: parsing an ASP file by belg4mit (Prior) on May 12, 2004 at 22:42 UTC
PHP is configurable to use [http://us4.php.net/basic-syntax\|many tags: <? <% <?php and <script language="php"> `-- I'm not belgian but I play one on TV.`	[reply]
Re: parsing an ASP file by iburrell (Chaplain) on May 25, 2004 at 21:43 UTC
The type element ('HTM' or 'ASP') is not required if you leave the ASP tags in the strigns. The parser would effectively split the files into a list of chunks. It would be easy to tell ASP chunks because they start with '<%'. I have seen a regex-based XML parser that works this way. It breaks the XML into strings which can be identified by looking at the first couple of characters.	[reply]


Don't ask to ask, just ask
	PerlMonks