Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

rough start of an axml compiler

by Logicus (Initiate)
on Jul 21, 2011 at 06:19 UTC ( [id://915797]=perlquestion: print w/replies, xml ) Need Help??

Logicus has asked for the wisdom of the Perl Monks concerning the following question:

Ok here is a very rough and ugly start to how I think your suggesting I should go about compiling aXML files. I know this code is _AWFUL_, but it does work and it's quite fast at turning an aXML file into a bunch of print statements and axml function calls which could be very quickly parsed by my existing parser system as they are very short. I guess I would want to save the output of this program on first run so I can skip this step on subsequent runs. This only supports <> type tags.

use Modern::Perl; use Time::HiRes qw( gettimeofday tv_interval ); my $start = [ gettimeofday ]; my $aXML_ENV = { qd => { a => 'b', b => 'c', c => '42' }, conf => { x => 'y', foo => '1', bar => '2' } }; my $plugins = { qd => '$result = $aXML_ENV->{"qd"}->{$data}', conf => '$result = $aXML_ENV->{"conf"}->{$data}' }; my $aXML_string = qq@ some other data that needs to be output as well <qd><qd><qd>a</qd></qd></qd> = 42 <conf>bar</conf> <sometag>thatdoesnothing</sometag> but also needs to be present in the output <conf>foo</conf> <someothertag>thatalsodoesnothing</someothertag> as far as aXML is concerned, but obviously used by whatever program we are sending this data too. @; sub sortofcompileit { my $aXML_ENV = $_[0]; my $plugins = $_[1]; my $aXML_string = $_[2]; my $compiled_string_start; my $compiled_string_end; my $compiled_string_middle; my @commands; my $command_opens_string = "("; my $command_closes_string = "("; my $mong_string; while ( my ($key, $value) = each(%$plugins) ) { push (@commands, $k +ey); } map { $command_opens_string .= "<$_>|" } @commands; map { $command_closes_string .= "</$_>|" } @commands; chop $command_opens_string; chop $command_closes_string; $command_opens_string .= ")"; $command_closes_string .= ")"; #find the position of the first command #set everything before it to be printed if ($aXML_string =~ m@^(.*?)$command_opens_string@s) { $compiled_string_start = 'print qq@'; $compiled_string_start .= $1; $compiled_string_start .= "@;\n\n"; } $mong_string = 'use aXML;'; $mong_string .= "\n"; $mong_string .= $compiled_string_start; #find everything in the middle if ($aXML_string =~ m@$command_opens_string(.*)$command_closes_stri +ng@s) { $compiled_string_middle = "<axml>$1$2$3</axml>"; } #find anything in the middle which is inbetween any type of close #and open and set it to be printed out my $replacement; $compiled_string_middle =~ s@$command_closes_string@`$1@gs; while ($compiled_string_middle =~ m@(.*?)$command_closes_string([^` +]*?)$command_opens_string@gs) { $replacement = "$1$2</axml>\n\n"; $replacement .= 'print qq@'; $replacement .= $3; $replacement .= "@;\n\n<axml>$4"; $mong_string .= $replacement; } $compiled_string_middle =~ s@`@@gs; if ($compiled_string_middle =~ m@.*$command_opens_string(.*?)$comma +nd_closes_string</axml>$@s) { $mong_string .= "$2$3</axml>\n\n"; } #find the position of the last close tag #set everything after it to be printed if ($aXML_string =~ m@.*$command_closes_string(.*)$@s) { $compiled_string_end = 'print qq@'; $compiled_string_end .= $2; $compiled_string_end .= '@;'; } $mong_string .= $compiled_string_end; $mong_string =~ s@`@@gs; $mong_string =~ s/<axml>(.*?)<\/axml>/print axml\(qq\@$1\@\);/g; return $mong_string; } my $sortofcompiled_string = sortofcompileit($aXML_ENV,$plugins,$aXML_s +tring); say $sortofcompiled_string; my $end = [ gettimeofday ]; my $total_elapsed = tv_interval($start,$end); say "elapsed = $total_elapsed";

Replies are listed 'Best First'.
Re: rough start of an axml compiler
by Boldra (Deacon) on Aug 01, 2011 at 12:02 UTC
    You say in your "offtopic epiphony" that you believe
    <<a>b</a>>c</<a>b</a>>
    to be unrepresentable in any kind of data structure, perl or otherwise. Here's a simple solution:
    my @nodes = ( bless( { 'data' => 'c', 'tag' => bless( { 'data' => 'b', 'tag' => 'a' }, 'Node' ) }, 'Node' ) );
    The definition of the action to be performed on data 'c' is postponed until operation 'a' is performed on data 'b'.

    I think you wrote a parser already, so I'm sure you can adapt it to produce a structure like above. Once you have the structure, generating the output is also straightforward:

    package Node; use Moose; has [ qw<data tag> ] => ( is => 'rw', isa => 'Any' ); sub as_text { my ($self) = shift; my $tag = $self->tag; my $tag_processing_method = ref $tag ? $tag->as_text : $tag; return $self->$tag_processing_method( $self->data ); } # Tag Processing Methods here: sub a { "super_$_[1]" } # prepend "super_" sub b { "b_$_[1]" } # prepend "b_" sub super_b { "B_$_[1]" } # prepend "B_"
    If you run it (say for map { $_->as_text } @nodes), you'll see that instead of sub b being called, super_b is called.

    I'd be very inclined to add string overloading to the Node package so:

    use overload q{""} => 'as_text', fallback => 1, ;
    which could make the calls even simpler, (with a possible cost to debugging and maintainability). as_text becomes
    sub as_text { my ($self) = shift; my $processing_method = $self->tag; return $self->$processing_method( $self->data ); }
    and generating output once you have your @nodes array is simply stringification. print @nodes;

    update fixed some typos

      I said any kind I know of, but then I am renowned for being an uneducated thick-wit who won't listen to advice of my elders and betters.

      I'm going to have to have a good think about what you've put there above. Digestion should be complete in a few days, before which any comment I make will probably be seen as another example of my stupidity.

      The first thing that is running through the vacuous hole I refer to sometimes laughingly as my brain, is how to decompress this :

      my @nodes = ( bless( { 'data' => 'c', 'tag' => bless( { 'data' => 'b', 'tag' => 'a' }, 'Node' ) }, 'Node' ) );

      From the source;

      <<a>b</a>c</<a>b</a>>

      I have a pathological aversion to all things OOP, but the apparent simplicity of what you have shown above is strangely appealing. Thanks!

      Well Boldra, you've thrown a proper little spanner into my works... I'm not complaining because I really like your example!

      I was going to run a small number of regex conversions on an aXML string and turn it into classic XML to feed XML::Simple for turning into a perl structure, but I can't do that now if I want to use the method above. .o0(~Hrm~)

      One quick question though, under this schema would every tag have to have a definition? As in what would happen to tags which are just markup around and within tags which have defined roles?

      Also there is another thought that I don't know exactly how to describe I guess you could call it orphan data, for example:

      listing actions/default/body.aXML --------------------------------- <html> <head><title>acme products</title></head> <body> some orphan text that needs to be in the output <use>actions/<qd>action</qd>/main.aXML</use> some more orphan text </body> </html>

      I'm guessing that the above would be mapped to your moose solution thusly:

      package actions::default::body; my @nodes = ( bless ( { 'tag' => 'html', 'data' => [ bless ( { 'tag' => 'head', 'data' => bless ( { 'tag' => 'title', 'data' => 'acme products' }, 'Node' ), bless ( { 'tag' => 'body', 'data' => [ bless ( { 'tag' => 'orphan', 'data' => 'some orphan text that needs t +o be in the output' }, 'Node' ), bless ( { 'tag' => 'use' 'data' => [ bless ( { 'tag' => 'orphan', 'data' => 'action/'}, 'Node +' ), bless ( { 'tag' => 'qd' 'data' => 'action' }, 'Node +' ), bless ( { 'tag' => 'orphan', 'data' => '/main.aXML' }, ' +Node' ) ] }, 'Node' ) bless ( { 'tag' => 'orphan', 'data' => 'some more orphan text' ), 'Node' ) ] }, 'Node' ) ] }, 'Node' ) ); sub getNodes { return @nodes; } 1;
        Have you considered leaving the untagged content as plain text?
        my @nodes = ( bless ( { 'tag' => 'html', 'data' => [ bless ( { 'tag' => 'head', 'data' => bless ( { 'tag' => 'title', 'data' => 'acme products' }, 'Node' ), bless ( { 'tag' => 'body', 'data' => [ 'some orphan text that needs to be in the + output', bless ( { 'tag' => 'use' 'data' => [ bless ( { 'tag' => 'orphan', 'data' => 'action/'}, 'Node +' ), bless ( { 'tag' => 'qd' 'data' => 'action' }, 'Node +' ), bless ( { 'tag' => 'orphan', 'data' => '/main.aXML' }, ' +Node' ) ] }, 'Node' ) 'some more orphan text', ] }, 'Node' ) ] }, 'Node' ) );
        and it may interest you that with Moose buildargs, you can easily set up the Node constructor to expect a tag and data, e.g. Node->new( qd => 'action' );. The output of Data::Dumper would still contain the bless { }, 'Node' syntax, making it a good place to do debugging and testing.
        my @nodes = ( Node->new( html => [ Node->new( head => Node->new( title => 'acme products' ), ), Node->new( body => [ 'some orphan text that needs to be in the output', Node->new( use => [ 'actions/', Node->new( qd => 'action'), '/main.aXML', ), 'some more orphan text', ], ), ] ), );
        but then why make nodes out of plain html if you have no action planned for them? Checking whether a tag is implemented during parsing is going to save you headaches later.
        my @nodes = ( '<html> <head><title>acme products</title></head> <body> some orphan text that needs to be in the output', Node->new( use => [ 'actions/', Node->new( qd => 'action' ), ' +/main.aXML' ] ), 'some more orphan text </body> </html>', )
        with which print @nodes would just do the right thing.
      Corion, muba, and a few others already explained this independent of each other, he is just playing dumb, you're feeding the troll
        Look, I can feed two at once!
Re: rough start of an axml compiler
by pemungkah (Priest) on Jul 21, 2011 at 16:48 UTC
    The only comment I might make is that anytime you have a function you're passing craploads of parameters to, that you may be in a position where you'd be better served by using an object to manage the storage. Internally (in methods inside the class), you can go ahead and keep referencing things directly for speed.

    If I understand your architecture properly, you have several invariants that are getting set up in sortofcompileit on every call; if you pulled those out into package variables and set them up once (with an init method/sub), you'd save time on every subsequent call.

    And this it totally headed in a good direction - real "compilation" of the aXML code! If you memoized (cf. Memoize) the calls to sortofcompile, you might be able to get another free speedup from Memoize's caching. As long as a given parameter set always results in the same output, Memoize will help. If there are side effects that might change the result, then it won't help (e.g., memoizing a random number generator would make it seriously unusable, if very fast!).

      The idea was to run sortofcompileit only once per page, the first time it is accessed and to save it's output so that henceforth you can skip that step unless the source-code has been updated. All it does is reorganise the raw aXML code into a more efficient layout which can be processed a lot faster for individual page hits.

      I don't like it because that method, no matter how cleverly implemented breaks certain plugins which are designed to exploit the runtime parsing setup.

      If you wanted to use aXML for a large scale site and server overhead was a real budgeting concern then it would be neccesary to sacrifice said plugins (and the groovy effects they achieve), in order to run a compilation/optimisation schema like what the above code is starting to do.

      TIMTOWTDI even with aXML/Perl

        Just as a throw-it-out-there, how about adding markup (or detecting, depending on how sophisticated you want to be) that delineates the "definitely dynamic" and "for-sure static" portions of a page? You could pre-build whatever was invariant (I seem to be using that word a lot lately...) and reserve the slower dynamic stuff for just the part(s) that needed it.
A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://915797]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (7)
As of 2024-04-18 07:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found