Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

the axml parser v3.4

by simonodell1 (Novice)
on Apr 12, 2007 at 07:35 UTC ( [id://609585]=perlquestion: print w/replies, xml ) Need Help??

simonodell1 has asked for the wisdom of the Perl Monks concerning the following question:

ok im sure most of you could write this better, im working on this, and its the first time ive shown anyone so be nice :)

#!c:/perl/bin/perl ############################################################### # # aXML parser v3.4 # # Author : Simon Odell # Email : simonodell@hotmail.com # # Description : A standalone single file parser for rendering # aXML into HTML documents. # ################################################################ my $debugcmds = 0; my $logcmds = 0; my $log; my $debugsys = 0; my $aXML_mem; my $qd; my $temp_debug = 0; if (($debugsys)||($debugcmds)||($temp_debug)) { print "Content-type: t +ext/html\n\n"; } ################################################################ # # File handling routines # routines for manipulating server side files # ################################################################ #get and return the contents of a given filename #getFile ("filename.file"); sub getFile { my $fileName = shift(@_); if ($debugsys) { print "GetFile : $fileName\n"; } undef $/; open $file, $fileName; my $buf = <$file>; close $file; $buf =~ s@\[@<lftsqbrk/>@g; $buf =~ s@\]@<rtsqbrk/>@g; $buf =~ s@\(@<lftbrk/>@g; $buf =~ s@\)@<rtbrk/>@g; $buf =~ s@\+@<plus/>@g; $buf =~ s@\?@<ques/>@g; return $buf; } sub getPlugin { my $fileName = shift(@_); if ($debugsys) { print "GetPlugin : $fileName\n"; } undef $/; open $file, $fileName; my $buf = <$file>; close $file; return $buf; } # write given data to a specified file # writeFile("<filename>filename</filename><data>$data</data>"); sub writeFile { my $data = shift(@_); if ($debugsys) { print "WriteFile : $data\n"; } if ($data =~ m@<filename>(.*?)</filename>@s) { $filename = $1; } if ($data =~ m@<data>(.*)</data>@s) { $filedata = $1; } open(DAT,">$ENV{DOCUMENT_ROOT}/$filename") || return 0; print DAT $filedata; close(DAT); return 1; } # delete specified file # delFile("filename.file"); sub delFile { my $data = shift(@_); if ($debugsys) { print "DelFile : $data\n"; } return unlink($data); } ################################################################ # # Load configuration file and create $config hash # ################################################################ $aXML_conf_file = getFile ("$ENV{DOCUMENT_ROOT}/aXML_conf.xml"); if ($debugsys) { print "axml conf file = \n\n$aXML_conf_file\n\n\n\n"; + } while ($aXML_conf_file =~ m@<(.*?)>(.*?)</(.*?)>@s) { $tagname = $1; $tagvalue = $2; if ($debugsys) { print "conf tag : $tagname = $tagvalue\n"; } $config->{$tagname} = $tagvalue; $aXML_conf_file =~ s@<$tagname>$tagvalue</$tagname>@@; } ################################################################ # # Query data handling section # builds a hash table called qd which stores the query data # values # ################################################################ if ($debugsys) { print "Parsing Query Data \n ----------------------\n +\n"; } if ($ENV{'REQUEST_METHOD'} eq 'GET') { @query_string_pairs = split(/&/, $ENV{'QUERY_STRING'}); } elsif ($ENV{'REQUEST_METHOD'} eq 'POST') { read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'}); @query_string_pairs = split(/&/, $buffer); } else { &error('request_method'); } foreach $pair (@query_string_pairs) { ($key,$value) = split /=/, $pair; $value =~ tr/+/ /; $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; $qd->{$key} = $value; } #if no get type specified assume axml if ($qd->{gettype}){ } else { if ($ENV{HTTP_USER_AGENT} =~ m@wirewolf@s) { $qd->{gettype} = "axml"; } else { $qd->{gettype} = "html"; } } #if no action load default action if ($qd->{action}){ } else { $qd->{action} = "default"; } #if no lang given take default lang from axml config file if ($qd->{lang}){ } else { $qd->{lang} = $config->{lang}; } if ($debugsys) { while ( my ($key, $value) = each(%$qd) ) { print "qd->{$key} => $value\n"; } print "\n\n\n"; } ################################################################ # # Setup Logging file # if logcmds flag set, load log file and append new entry # ################################################################ if ($logcmds) { $log = getFile("$ENV{DOCUMENT_ROOT}/logs/$qd->{action}.log"); $log .= "\n\n\n\n\n\n Action Called @ insert date/time here\n -------------------------------------\n\n"; + } ################################################################ # # Session handling routines # routines for keeping users signed in across multiple requests # ################################################################ @timeData = localtime(time); $currenttime = join(',', @timeData); #clear out old sessions if ($debugsys) { print "Clearing out old sessions \n ----------------- +-----\n\n Session Folder = $ENV{DOCUMENT_ROOT}/sessions\ +n\n"; } my @sessions = glob("$ENV{DOCUMENT_ROOT}/sessions/*.xml"); foreach $session (@sessions) { if ($debugsys) { print "Checking session : $session\n"; } $session_data = getFile($session); if ($session_data =~ m@<timestamp>(.*?)</timestamp>@s) { @timestamp = split /,/,$1; if (@timestamp[2] lt @timeData[2]) { delFile($session); } } } if ($debugsys) { print "\n\n\nUpdate current session \n -------------- +--------\n\n"; } $currentIP = $ENV{REMOTE_ADDR}; $currentIP =~ s@\.@@g; my $username; my $userlevel; my $loggedin = 0; if ($session_info = getFile("$ENV{DOCUMENT_ROOT}/sessions/$currentI +P.xml")) { if ($session_info =~ m@<username>(.*?)</username>@s) { $username + = $1; } if ($session_info =~ m@<userlevel>(.*?)</userlevel>@s) { $userle +vel = $1; } #update the file with new timestamp writeFile("<filename>sessions/$currentIP.xml</filename> <data><username>$username</username> <userlevel>$userlevel</userlevel> <timestamp>$currenttime</timestamp></data>"); $loggedin = 1; } ################################################################ # # Plugin handler # builds a hash table called plugins which stores the plugin code # ################################################################ #get a dir list of the plugin directory my @plugins = glob("$ENV{DOCUMENT_ROOT}/plugins/server/*.*"); my @commands; if ($debugsys) { print "\n\n\nLoading Plugins\n------------------\n\n" +; } #load the server side plugins into a hash table called plugins, #and save the names of the plugins into an array called commands foreach $plugin (@plugins) { if ($plugin =~ m@.*/(.*?).aXMLpi@s) { $command_name = $1; $plugins->{$command_name} = getPlugin ($plugin); push (@commands, $command_name); } } #add gettype specific plugins my @plugins = glob("$ENV{DOCUMENT_ROOT}/plugins/server/$qd->{gettype}/ +*.*"); foreach $plugin (@plugins) { if ($plugin =~ m@.*/(.*?).aXMLpi@s) { $command_name = $1; $plugins->{$command_name} = getPlugin ($plugin); push (@commands, $1); } } #add action specific plugins my @plugins = glob("$ENV{DOCUMENT_ROOT}/actions/$qd->{action}/plugins/ +*.*"); foreach $plugin (@plugins) { if ($plugin =~ m@.*/(.*?).aXMLpi@s) { $command_name = $1; $plugins->{$command_name} = getPlugin ($plugin); push (@commands, $1); } } ################################################################ # # Process Objects # # objects are like macros, scanned for and processed prior to # the main parsing, they allow for neater code in the action # file. # ################################################################ #load the specified action file my $actionFile = getFile("$ENV{DOCUMENT_ROOT}/actions/$qd->{action}.aX +ML"); #load the objects if ($debugsys) { print "\n\n Loading Objects\n---------------------\n\ +n"; } my @objects = glob("$ENV{DOCUMENT_ROOT}/objects/*.*"); foreach $object (@objects) { if ($object =~ m@.*/(.*?).aXMLobj@s) { $object_name = $1; $objects->{$object_name} = getFile ("$ENV{DOCUMENT_ROOT}/objects +/$object_name.aXMLobj"); push (@objects, $1); } } #parse action file for objects and insert if ($debugsys) { print "\n\n\nInserting Objects\n--------------------- +\n\n"; } $rescan = 1; while ($rescan) { $rescan = 0; foreach $object (@objects) { while ($actionFile =~ m@<$object(.*?)>(.*?)</$object>@s) { $raw_object_args = $1; $object_contents = $2; @raw_object_args = split / /,$raw_object_args; foreach $raw_object_arg (@raw_object_args) { if ($raw_object_arg =~ m@(.*?)="(.*?)"@s) { $handed_object_args->{$1} = $2; } } if ($debugsys) { print "Found Object ref : $object $ra +w_object_args\n"; } if ($objects->{$object} =~ m@<type>(.*?)</type>@s) { + $type = $1; } if ($objects->{$object} =~ m@<args>(.*?)</args>@s) { $ +args = $1; } if ($objects->{$object} =~ m@<contents>(.*?)</contents +>@s) { $contents = $1; $contents =~ s@<contents/>@$object_contents@s; } else { $contents = $object_contents; } my $object_args; $find_more_args = 1; while ($find_more_args) { $find_more_args = 0; if ($args =~ m@<(.*?)>@s) { $tagname = $1; if ($args =~ m@<$tagname>(.*?)</$tagname>@s) { $this_arg = $1; $object_args.= " $tagname=\"$this_arg\""; $find_more_args = 1; } $args =~ s@<$tagname>$this_arg</$tagname>@@; + } } $object_code = "<$type$object_args>$contents</$type>"; while ( my ($key, $value) = each(%$handed_object_args) + ) { $object_code =~ s@<$key/>@$value@g; } $actionFile =~ s@<$object$raw_object_args>$object_cont +ents</$object>@$object_code@; $rescan = 1; } } } ################################################################ # # Main Parser # loads the action specified in query data, scans for commands # runs the appropriate plugins and builds the output # ################################################################ #load the action file specified in the query data my $redo = 1; # set the redo flag, the command parser will reiterate w +hile this is set my $rescan = 1; # ser rescan flag, this can be tripped by a plugin suc +h as <insertfile> to restart parsing while ($rescan) { $rescan = 0; $redo = 1; #print "$actionFile\n\n\n\n\n\n\n\n\n\n\n"; while ($redo) #scan for () commands { $redo = 0; foreach $command (@commands) { if ($debugcmds) { print "scanning for ($command)\n"; } $break = 0; while (($actionFile =~ m@<lftbrk/>$command(.*?)<rtbrk/>(.*?)<lft +brk/>/$command<rtbrk/>@s)&&($break eq 0)&&($rescan eq 0)) { my $raw_command_args = $1; my $command_contents = $2; foreach $sub_command (@commands) { if ($command_contents =~ m@<lftbrk/>$sub_command(.*?)<rtb +rk/>@s) { if ($sub_command eq $command) { $command_contents .= "<lftbrk/>/$command<rtbrk/>"; if ($command_contents =~ m@.*<lftbrk/>$command(.*?) +<rtbrk/>(.*?)<lftbrk/>/$command<rtbrk/>$@s) { $raw_command_args = $1; $command_contents = $2; } } else { $redo = 1; $break = 1; } } } if ($break eq 0) { $data = $command_contents; if ($debugcmds) { print "running $command\n\n "; } if ($logcmds) { $log .= "ran ($command)\n"; } $result = "aXML error : command $command not found, or m +alformed args"; my $command_args; my @raw_command_args = split / /,$raw_command_args; foreach $raw_command_arg (@raw_command_args) { if ($raw_command_arg =~ m@(.*?)="(.*?)"@s) { $command_args->{$1} = $2; if ($debugcmds) { print "$1 = $command_args->{$1}\ +n"; } } } if ($debugcmds) { print "command contents = $data\n\n\n\ +n"; } eval ( $plugins->{$command} ); $result =~ s@\[@<lftsqbrk/>@g; $result =~ s@\]@<rtsqbrk/>@g; $result =~ s@\(@<lftbrk/>@g; $result =~ s@\)@<rtbrk/>@g; $result =~ s@\+@<plus/>@g; if ($debugcmds) { print "result = $result\n\n\n\n"; } if ($logcmds) { $log .= "command result = $result \n\ +n\n\n"; } $actionFile =~ s@<lftbrk/>$command$raw_command_args<rtbr +k/>$command_contents<lftbrk/>/$command<rtbrk/>@$result@s; if ($debugcmds) { print "document now reads: \n\n $actio +nFile \n\n\n\n" }; } } } } $redo = 1; # set the redo flag, the command parser will reiterate whil +e this is set while ($redo) #scan for <> commands { $redo = 0; foreach $command (@commands) { if ($debugcmds) { print "scanning for <$command>\n"; } $break = 0; while (($actionFile =~ m@<$command(.*?)>(.*?)<\/$command>@s)&&($ +break eq 0)&&($rescan eq 0)) { my $raw_command_args = $1; my $command_contents = $2; foreach $sub_command (@commands) { if ($command_contents =~ m@<$sub_command(.*?)>@s) { if ($sub_command eq $command) { $command_contents .= "</$command>"; if ($command_contents =~ m@.*<$command(.*?)>(.*?)<\ +/$command>$@s) { $raw_command_args = $1; $command_contents = $2; } } else { $redo = 1; $break = 1; } } } if ($break eq 0) { $data = $command_contents; if ($debugcmds) { print "running $command\n\n "; } if ($logcmds) { $log .= "ran <$command>\n"; } $result = "aXML error : command $command not found, or m +alformed args"; my $command_args; my @raw_command_args = split / /,$raw_command_args; + foreach $raw_command_arg (@raw_command_args) { if ($raw_command_arg =~ m@(.*?)="(.*?)"@s) { $command_args->{$1} = $2; if ($debugcmds) { print "$1 = $command_args->{$1}\ +n"; } } } if ($debugcmds) { print "command contents = $data\n\n\n\ +n"; } eval ( $plugins->{$command} ); $result =~ s@\[@<lftsqbrk/>@g; $result =~ s@\]@<rtsqbrk/>@g; $result =~ s@\(@<lftbrk/>@g; $result =~ s@\)@<rtbrk/>@g; $result =~ s@\+@<plus/>@g; if ($debugcmds) { print "result = $result\n\n\n\n"; } if ($logcmds) { $log .= "command result = $result \n\ +n\n\n"; } $actionFile =~ s@<$command$raw_command_args>$command_con +tents<\/$command>@$result@s; if ($debugcmds) { print "document now reads: \n\n $actio +nFile \n\n\n\n" }; } } } } } #scan for remaining [ ] tags $redo = 1; # set the redo flag, the command parser will reiterate whil +e this is set while ($redo) { $redo = 0; foreach $command (@commands) { if ($debugcmds) { print "scanning for [$command]\n"; } $break = 0; while (($actionFile =~ m@<lftsqbrk/>$command(.*?)<rtsqbrk/>(.*?) +<lftsqbrk/>/$command<rtsqbrk/>@s)&&($break eq 0)&&($rescan eq 0)) { my $raw_command_args = $1; my $command_contents = $2; foreach $sub_command (@commands) { if ($command_contents =~ m@<lftsqbrk/>$sub_command(.*?)<r +tsqbrk/>@s) { if ($sub_command eq $command) { $command_contents .= "\/$command\<rtsqbrk/>"; if ($command_contents =~ m@.*<lftsqbrk(.*?)/>$comma +nd<rtsqbrk/>(.*?)<lftsqbrk/>/$command<rtsqbrk/>$@s) { $raw_command_args = $1; $command_contents = $2; } } else { $redo = 1; $break = 1; } } } if ($break eq 0) { $data = $command_contents; if ($debugcmds) { print "running $command\n\n "; } if ($logcmds) { $log .= "ran [$command]\n"; } $result = "aXML error : command $command not found, or m +alformed args"; my $command_args; my @raw_command_args = split / /,$raw_command_args; + foreach $raw_command_arg (@raw_command_args) { if ($raw_command_arg =~ m@(.*?)="(.*?)"@s) { $command_args->{$1} = $2; if ($debugcmds) { print "$1 = $command_args->{$1}\ +n"; } } } if ($debugcmds) { print "command contents = $data\n\n\n\ +n"; } if ($logcmds) { $log .= "command result = $result \n\ +n\n\n"; } eval ( $plugins->{$command} ); $result =~ s@\[@<lftsqbrk/>@g; $result =~ s@\]@<rtsqbrk/>@g; $result =~ s@\(@<lftbrk/>@g; $result =~ s@\)@<rtbrk/>@g; $result =~ s@\+@<plus/>@g; if ($debugcmds) { print "result = $result\n\n\n\n"; } $actionFile =~ s@<lftsqbrk/>$command$raw_command_args<rt +sqbrk/>$command_contents<lftsqbrk/>/$command<rtsqbrk/>@$result@s; if ($debugcmds) { print "document now reads: \n\n $actio +nFile \n\n\n\n" }; } } } } ################################################################ # # Clean up and output # final modifications to the remaining document # inserts files that are not to be processed # ################################################################ while ($actionFile =~ m@<editfile>(.*?)</editfile>@s) { $edit_filename = $1; my $editFile = getFile("$ENV{DOCUMENT_ROOT}/$edit_filename"); $actionFile =~ s@<editfile>$edit_filename</editfile>@$editFile@ +gs; } $actionFile =~ s@<currentuser/>@$username@g; $actionFile =~ s@<userlevel/>@$userlevel@g; $actionFile =~ s@<ques/>@\?@gs; $actionFile =~ s@<lftbrk/>@\(@g; $actionFile =~ s@<rtbrk/>@\)@g; $actionFile =~ s@<lftsqbrk/>@\[@g; $actionFile =~ s@<rtsqbrk/>@\]@g; $actionFile =~ s@<plus/>@\+@g; if ($qd->{gettype} eq "html") { print "Content-type: text/html\n\n"; +} if ($qd->{gettype} eq "axml") { print "Content-type: text/axml\n\n"; +} print $actionFile; ################################################################ # # Save Logging file # if logcmds flag set, write new log file # ################################################################ if ($logcmds) { writeFile("<filename>logs/$qd->{action}.log</filename> <data>$log</data>"); }

Replies are listed 'Best First'.
Re: the axml parser v3.4
by GrandFather (Saint) on Apr 12, 2007 at 08:05 UTC

    I'd strongly recommend that you use strictures (use strict; use warnings;) and declare variables so their scope is obvious. Global variables are generally frowned on as it can be pretty hard to figure out the relationship between setting and using them.

    The three parameter form of open is much preferred to improve security and make io direction more explicit.

    Code of the form:

    for (...) { if (...) { ... } }

    can be rewritten:

    for (...) { next unless ...; ... }

    to save a level of indentation and make the flow clearer. This becomes especially important where loops are nested.

    ... if ...;

    is generally more Perlish than

    if (...) {...;}

    Perl allows labelled loops so rather than using a $break variable you can:

    SCANNING: while (...) { ... for (...) { ... next SCANNING if ...; } ... }

    DWIM is Perl's answer to Gödel
Re: the axml parser v3.4
by gloryhack (Deacon) on Apr 12, 2007 at 09:05 UTC
    GrandFather makes some very good points. I'll add a few:

    First of all, your file writes are going to collide if two instances of your program run at the same time and try to write the same file. You need some kind of race-free file locking mechanism to prevent that. See perlopentut for a good starting point.

    Your code is susceptible to the poison null byte in the request parser. Ouch! Google for that. And while you're at it, use CGI::Simple or CGI to get away from the many potential failures that exist in your request parsing code.

    If you take GrandFather's very fine advice and use the strict pragma, your code will throw an ugly Error 500 if a file open fails. You should check that the opens succeed before trying to read or write a filehandle, and fail or recover gracefully if the open fails.

    There are three things, I'll leave some for the other fine monks to pick on.

    And again, don't be afraid of CPAN. Wouldja rather learn to work with CPAN, or release code that makes you look like a dork?

Re: the axml parser v3.4
by jdporter (Paladin) on Apr 12, 2007 at 18:32 UTC

    My question is, WTF is aXML? Which (if any) of the dozen or so defitions of "aXML" Google turns up is the one you have in mind? Please link!

    Secondarily, wherefore does it need parsing capabilities not present in any of the existing XML modules?

    Seems (to my untrained eye) that this transformation task could probably be done in about a dozen lines of XSLT.

    A word spoken in Mind will reach its own level, in the objective world, by its own weight
      Finally, somebody mentioned XSLT. As I am highly superficially familiar with XSLT, I did not mention it when I first read the previous aXML post.
      The closest thing I could find was "Active XML".

      -InjunJoel
      "I do not feel obliged to believe that the same God who endowed us with sense, reason and intellect has intended us to forego their use." -Galileo
Re: the axml parser v3.4
by Jenda (Abbot) on Apr 12, 2007 at 10:41 UTC

    Why do you keep escaping and unescaping the []()?+? And why do you try to parse the XML yourself instead of using a parser? The code would be much much simpler and safer if you did that ... parsing XML with regexps is generally a bad idea. Also I find your s@...@...@ and m@...@ hard to read. If you do not want to use slashes (as they are present in the end tags) use s{...}{} and m{...}. I do think that's much more readable.

      Re : Grandfather; thanks I will look into those suggestions in my next verion!

      Re : glory hack; thanks for the constructive part of the critiscm, however imho posting ameture code makes a person look like they are willing to learn something new, saying someone looks a like a dork for being a begginer makes a person look like a snob... whilst I'm sure your perl coding could make mark j dominus look like a noob, manners are universal Sir.

      Re : Jenda, i found that using the ( ) and other such chars caused my regex's to fail, and after many frustrating hours trying to figure out why that is, i decided on the solution you see here.

      I wasn't intending this post to be some sort of statement of programming prowess, I needed some info on capturing data from a given url and the resulting conversation on the chatter box prompted me to share this code. I'm fully aware its both ameture and in need of a lot of work, and thanks to some of the more mature responses here I have some directions in which to take it.

      The code aside, I still think the aim of aXML is valid and worthy of being understood. The ability to nest xml tags which correspond to plugins gives a freedom of expression which makes my development work so much easier.

      Many thanks.

        If you allow the later example then it's no longer XML. Therefore there is no need for this to look like XML. Especially since it's supposed to be used within HTML. You'd rather use some other syntax ... something that stands out more and that doesn't get confused with the HTML. Or use one of the many existing templating modules.

        Of course if you looked at them and none of them fits your needs, go ahead and build your own. It seems to me like it will be a good learning experience, but nothing more.

        Regarding the () and friends ... you have to escape those in regexps as they have a special meaning. You'd better try to fix the problems with the regexps than to try to circumvent them like this. Post an example of text you are trying to match and fail and the regexp and maybe we can help.

        I strongly recommend that you separate out the parser into its own, reusable module, if you expect other people to pick it up. The fact that it's entwined monolithically with other junk (like a hand-rolled CGI handler) is going to be a major stumbling block for many people who might otherwise be interested in it.

        If you need help modularizing your code, there's some tutorials on the topic here at PerlMonks; and of course you can always ask in the CB or post a SOPW.

        A word spoken in Mind will reach its own level, in the objective world, by its own weight
        I like to use # in place of /..... it doesn't look too bad, and works just fine. m# # or s# # # macdaddy.
Re: the axml parser v3.4
by Mr. Muskrat (Canon) on Apr 12, 2007 at 18:03 UTC

    Get over your fear of CPAN.

    What makes your script better than the myriad of other templating systems on CPAN?

Re: the axml parser v3.4
by robot_tourist (Hermit) on Apr 12, 2007 at 09:08 UTC

    ++ for comments. I know I have not yet learned or used pod properly, so I'm a big hairy hypocrite, but for public code I think it would be useful. It is a 'standard' that other people will be able to follow if they have to maintain your code and the pod can be extracted easily for the standalone documentation.

    And don't forget the <readmore> tags for posting long messages in the monastery :)

    How can you feel when you're made of steel? I am made of steel. I am the Robot Tourist.
    Robot Tourist, by Ten Benson

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://609585]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (5)
As of 2024-04-26 08:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found