rough start of an axml compiler

Logicus has asked for the wisdom of the Perl Monks concerning the following question:

Ok here is a very rough and ugly start to how I think your suggesting I should go about compiling aXML files. I know this code is _AWFUL_, but it does work and it's quite fast at turning an aXML file into a bunch of print statements and axml function calls which could be very quickly parsed by my existing parser system as they are very short. I guess I would want to save the output of this program on first run so I can skip this step on subsequent runs. This only supports <> type tags.

use Modern::Perl;

use Time::HiRes qw( gettimeofday tv_interval );

my $start = [ gettimeofday ];


my $aXML_ENV = {   qd => {   a => 'b', 
                             b => 'c',
                             c => '42' },
                 conf => {   x => 'y',
                           foo => '1',
                           bar => '2' }
               };


my $plugins = { qd => '$result = $aXML_ENV->{"qd"}->{$data}',
                conf => '$result = $aXML_ENV->{"conf"}->{$data}' };





my $aXML_string = qq@
some other data that needs to be output as well
<qd><qd><qd>a</qd></qd></qd> = 42

<conf>bar</conf>

<sometag>thatdoesnothing</sometag> but also needs
to be present in the output

<conf>foo</conf>

<someothertag>thatalsodoesnothing</someothertag>
as far as aXML is concerned, but obviously used by
whatever program we are sending this data too.
@;






sub sortofcompileit
  {
   my $aXML_ENV = $_[0];
   my $plugins = $_[1];
   my $aXML_string = $_[2];
   my $compiled_string_start;
   my $compiled_string_end;
   my $compiled_string_middle;
   my @commands;   
   my $command_opens_string = "(";
   my $command_closes_string = "(";
   my $mong_string;


   while ( my ($key, $value) = each(%$plugins) ) { push (@commands, $k
+ey); }   

   map { $command_opens_string .= "<$_>|" } @commands; 
   map { $command_closes_string .= "</$_>|" } @commands; 

   chop $command_opens_string;
   chop $command_closes_string;

   $command_opens_string .= ")";
   $command_closes_string .= ")";

   #find the position of the first command 
   #set everything before it to be printed

   if ($aXML_string =~ m@^(.*?)$command_opens_string@s)
     {
      $compiled_string_start = 'print qq@';
      $compiled_string_start .= $1;
      $compiled_string_start .= "@;\n\n";
     }

   $mong_string = 'use aXML;';
   $mong_string .= "\n";
   $mong_string .= $compiled_string_start;


   #find everything in the middle 

   if ($aXML_string =~ m@$command_opens_string(.*)$command_closes_stri
+ng@s)
     {
      $compiled_string_middle = "<axml>$1$2$3</axml>";
     }
   
   #find anything in the middle which is inbetween any type of close
   #and open and set it to be printed out

   my $replacement;

   $compiled_string_middle =~ s@$command_closes_string@`$1@gs;


   while ($compiled_string_middle =~ m@(.*?)$command_closes_string([^`
+]*?)$command_opens_string@gs)
        {
  
         $replacement = "$1$2</axml>\n\n";
         $replacement .= 'print qq@';
         $replacement .= $3;
         $replacement .= "@;\n\n<axml>$4";

         $mong_string .= $replacement;

        } 

    $compiled_string_middle =~ s@`@@gs;

   if ($compiled_string_middle =~ m@.*$command_opens_string(.*?)$comma
+nd_closes_string</axml>$@s) 
     {
      $mong_string .= "$2$3</axml>\n\n";
     }

     
   #find the position of the last close tag 
   #set everything after it to be printed

   if ($aXML_string =~ m@.*$command_closes_string(.*)$@s)
     {
      $compiled_string_end = 'print qq@';
      $compiled_string_end .= $2;
      $compiled_string_end .= '@;';
     }

   $mong_string .= $compiled_string_end;
   $mong_string =~ s@`@@gs;

   $mong_string =~ s/<axml>(.*?)<\/axml>/print axml\(qq\@$1\@\);/g;

   return $mong_string;
  }



my $sortofcompiled_string = sortofcompileit($aXML_ENV,$plugins,$aXML_s
+tring);

say $sortofcompiled_string;

my $end = [ gettimeofday ];
my $total_elapsed = tv_interval($start,$end);

say "elapsed = $total_elapsed";
[download]

Comment on rough start of an axml compiler Download Code

Replies are listed 'Best First'.
Re: rough start of an axml compiler by Boldra (Deacon) on Aug 01, 2011 at 12:02 UTC
You say in your "offtopic epiphony" that you believe `<<a>b</a>>c</<a>b</a>>` [download] to be unrepresentable in any kind of data structure, perl or otherwise. Here's a simple solution: `my @nodes = ( bless( { 'data' => 'c', 'tag' => bless( { 'data' => 'b', 'tag' => 'a' }, 'Node' ) }, 'Node' ) );` [download] The definition of the action to be performed on data 'c' is postponed until operation 'a' is performed on data 'b'. I think you wrote a parser already, so I'm sure you can adapt it to produce a structure like above. Once you have the structure, generating the output is also straightforward: `package Node; use Moose; has [ qw<data tag> ] => ( is => 'rw', isa => 'Any' ); sub as_text { my ($self) = shift; my $tag = $self->tag; my $tag_processing_method = ref $tag ? $tag->as_text : $tag; return $self->$tag_processing_method( $self->data ); } # Tag Processing Methods here: sub a { "super_$_[1]" } # prepend "super_" sub b { "b_$_[1]" } # prepend "b_" sub super_b { "B_$_[1]" } # prepend "B_"` [download] If you run it (`say for map { $_->as_text } @nodes`), you'll see that instead of sub `b` being called, `super_b` is called. I'd be very inclined to add string overloading to the `Node` package so: `use overload q{""} => 'as_text', fallback => 1, ;` [download] which could make the calls even simpler, (with a possible cost to debugging and maintainability). `as_text` becomes `sub as_text { my ($self) = shift; my $processing_method = $self->tag; return $self->$processing_method( $self->data ); }` [download] and generating output once you have your @nodes array is simply stringification. `print @nodes;` update fixed some typos	[reply] [d/l] [select]
Re^2: rough start of an axml compiler by Logicus (Initiate) on Aug 01, 2011 at 19:29 UTC
I said any kind I know of, but then I am renowned for being an uneducated thick-wit who won't listen to advice of my elders and betters. I'm going to have to have a good think about what you've put there above. Digestion should be complete in a few days, before which any comment I make will probably be seen as another example of my stupidity. The first thing that is running through the vacuous hole I refer to sometimes laughingly as my brain, is how to decompress this : `my @nodes = ( bless( { 'data' => 'c', 'tag' => bless( { 'data' => 'b', 'tag' => 'a' }, 'Node' ) }, 'Node' ) );` [download] From the source; `<<a>b</a>c</<a>b</a>>` [download] I have a pathological aversion to all things OOP, but the apparent simplicity of what you have shown above is strangely appealing. Thanks!	[reply] [d/l] [select]
Re^2: rough start of an axml compiler by Logicus (Initiate) on Aug 02, 2011 at 12:40 UTC
Well Boldra, you've thrown a proper little spanner into my works... I'm not complaining because I really like your example! I was going to run a small number of regex conversions on an aXML string and turn it into classic XML to feed XML::Simple for turning into a perl structure, but I can't do that now if I want to use the method above. .o0(~Hrm~) One quick question though, under this schema would every tag have to have a definition? As in what would happen to tags which are just markup around and within tags which have defined roles? Also there is another thought that I don't know exactly how to describe I guess you could call it orphan data, for example: `listing actions/default/body.aXML --------------------------------- <html> <head><title>acme products</title></head> <body> some orphan text that needs to be in the output <use>actions/<qd>action</qd>/main.aXML</use> some more orphan text </body> </html>` [download] I'm guessing that the above would be mapped to your moose solution thusly: package actions::default::body; my @nodes = ( bless ( { 'tag' => 'html', 'data' => [ bless ( { 'tag' => 'head', 'data' => bless ( { 'tag' => 'title', 'data' => 'acme products' }, 'Node' ), bless ( { 'tag' => 'body', 'data' => [ bless ( { 'tag' => 'orphan', 'data' => 'some orphan text that needs t +o be in the output' }, 'Node' ), bless ( { 'tag' => 'use' 'data' => [ bless ( { 'tag' => 'orphan', 'data' => 'action/'}, 'Node +' ), bless ( { 'tag' => 'qd' 'data' => 'action' }, 'Node +' ), bless ( { 'tag' => 'orphan', 'data' => '/main.aXML' }, ' +Node' ) ] }, 'Node' ) bless ( { 'tag' => 'orphan', 'data' => 'some more orphan text' ), 'Node' ) ] }, 'Node' ) ] }, 'Node' ) ); sub getNodes { return @nodes; } 1; [download]	[reply] [d/l] [select]
Re^3: rough start of an axml compiler by Boldra (Deacon) on Aug 02, 2011 at 13:06 UTC
Have you considered leaving the untagged content as plain text? my @nodes = ( bless ( { 'tag' => 'html', 'data' => [ bless ( { 'tag' => 'head', 'data' => bless ( { 'tag' => 'title', 'data' => 'acme products' }, 'Node' ), bless ( { 'tag' => 'body', 'data' => [ 'some orphan text that needs to be in the + output', bless ( { 'tag' => 'use' 'data' => [ bless ( { 'tag' => 'orphan', 'data' => 'action/'}, 'Node +' ), bless ( { 'tag' => 'qd' 'data' => 'action' }, 'Node +' ), bless ( { 'tag' => 'orphan', 'data' => '/main.aXML' }, ' +Node' ) ] }, 'Node' ) 'some more orphan text', ] }, 'Node' ) ] }, 'Node' ) ); [download] and it may interest you that with Moose buildargs, you can easily set up the Node constructor to expect a tag and data, e.g. `Node->new( qd => 'action' );`. The output of Data::Dumper would still contain the `bless { }, 'Node'` syntax, making it a good place to do debugging and testing. `my @nodes = ( Node->new( html => [ Node->new( head => Node->new( title => 'acme products' ), ), Node->new( body => [ 'some orphan text that needs to be in the output', Node->new( use => [ 'actions/', Node->new( qd => 'action'), '/main.aXML', ), 'some more orphan text', ], ), ] ), );` [download] but then why make nodes out of plain html if you have no action planned for them? Checking whether a tag is implemented during parsing is going to save you headaches later. `my @nodes = ( '<html> <head><title>acme products</title></head> <body> some orphan text that needs to be in the output', Node->new( use => [ 'actions/', Node->new( qd => 'action' ), ' +/main.aXML' ] ), 'some more orphan text </body> </html>', )` [download] with which `print @nodes` would just do the right thing.	[reply] [d/l] [select]
Re^4: rough start of an axml compiler by Logicus (Initiate) on Aug 02, 2011 at 15:03 UTC
Re^5: rough start of an axml compiler by Logicus (Initiate) on Aug 02, 2011 at 18:13 UTC
Re^2: rough start of an axml compiler by Anonymous Monk on Aug 02, 2011 at 02:34 UTC
Corion, muba, and a few others already explained this independent of each other, he is just playing dumb, you're feeding the troll	[reply]
Re^3: rough start of an axml compiler by Boldra (Deacon) on Aug 02, 2011 at 13:24 UTC
Look, I can feed two at once!	[reply]
Re^4: rough start of an axml compiler by Anonymous Monk on Aug 02, 2011 at 13:46 UTC
Re: rough start of an axml compiler by pemungkah (Priest) on Jul 21, 2011 at 16:48 UTC
The only comment I might make is that anytime you have a function you're passing craploads of parameters to, that you may be in a position where you'd be better served by using an object to manage the storage. Internally (in methods inside the class), you can go ahead and keep referencing things directly for speed. If I understand your architecture properly, you have several invariants that are getting set up in `sortofcompileit` on every call; if you pulled those out into package variables and set them up once (with an `init` method/sub), you'd save time on every subsequent call. And this it totally headed in a good direction - real "compilation" of the aXML code! If you memoized (cf. Memoize) the calls to `sortofcompile`, you might be able to get another free speedup from Memoize's caching. As long as a given parameter set always results in the same output, Memoize will help. If there are side effects that might change the result, then it won't help (e.g., memoizing a random number generator would make it seriously unusable, if very fast!).	[reply]
Re^2: rough start of an axml compiler by Logicus (Initiate) on Jul 21, 2011 at 19:53 UTC
The idea was to run `sortofcompileit` only once per page, the first time it is accessed and to save it's output so that henceforth you can skip that step unless the source-code has been updated. All it does is reorganise the raw aXML code into a more efficient layout which can be processed a lot faster for individual page hits. I don't like it because that method, no matter how cleverly implemented breaks certain plugins which are designed to exploit the runtime parsing setup. If you wanted to use aXML for a large scale site and server overhead was a real budgeting concern then it would be neccesary to sacrifice said plugins (and the groovy effects they achieve), in order to run a compilation/optimisation schema like what the above code is starting to do. TIMTOWTDI even with aXML/Perl	[reply] [d/l]
Re^3: rough start of an axml compiler by pemungkah (Priest) on Jul 22, 2011 at 06:35 UTC
Just as a throw-it-out-there, how about adding markup (or detecting, depending on how sophisticated you want to be) that delineates the "definitely dynamic" and "for-sure static" portions of a page? You could pre-build whatever was invariant (I seem to be using that word a lot lately...) and reserve the slower dynamic stuff for just the part(s) that needed it.	[reply]
Re^4: rough start of an axml compiler by Logicus (Initiate) on Jul 22, 2011 at 06:42 UTC
A reply falls below the community's threshold of quality. You may see it by logging in.


Come for the quick hacks, stay for the epiphanies.
	PerlMonks