Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

XML documentation formatting and transformations

by John M. Dlugosz (Monsignor)
on Nov 01, 2001 at 05:27 UTC ( #122473=perlquestion: print w/replies, xml ) Need Help??

John M. Dlugosz has asked for the wisdom of the Perl Monks concerning the following question:

Recall my musings at HTML documentation system - design and planning. Although it would be much work to redo existing docs, upon starting a new project I thought I'd try the new system from the beginning.

So, I came up with a simple schema to get me started with capturing the necessary pieces of information. And here is a document that uses it.

So... now what? I want to process a body of XML documents and produce a body of HTML files that can be browsed as files (no server involved). Each document like the one shown will directly produce one or more HTML files, and contribute to at least two others (index, master contents, that kind of stuff).

I suppose I could pull out one of the Perl XML modules, load the thing in, and start writing code to do what I want. Spit out the whole docs, and remember stuff for the other docs and generate those after iterating over all input files.

But some people tout the XLS thing, and say it should basically do all this for me. Is that true, or will it fall down when I reach a certain level of complexity?

My immedate question is: what technology and approach should I take, to avoid blind alleys and excessive work?

I'd also love to see something presentable (a single readable HTML doc based on that input file) with minimal effort, so these files may be read now even before the fancy stuff is written.

Any clues?


  • Comment on XML documentation formatting and transformations

Replies are listed 'Best First'.
Re: XML documentation formatting and transformations (boo)
by boo_radley (Parson) on Nov 01, 2001 at 07:02 UTC
    XML is pretty amazing when it's paired with XSLT -- Firstly, read through the ORA XSLT reference guide. If you're pressed for time in the bookstore, go directly to chapter 9, the case study.
    Secondly, realize that most of the modules on CPAN dealing with XML transformation are lacking (not XML parsing, mind, but transformation). Look here or here (windows) for adaptations of Gnome's XML and XSLT libraries.
    They're incredibly flexible, and are so far, performed admirably regardless of what I through at them -- they allow for some of the more esoteric functions of the XSLT specification (allowing external processors to run over data, or certain namespaces, for instance), and I have yet to run into limitations.
    If all you're looking for is a transformation from XML to HTML, a good style sheet will typically get you started. If you want to rearrange or reorder the way elements are displayed, use XSLT.
    Also useful with XML transformation libraries is the idea of allowing users to develop their own XSL docs so that they can present data in a form useful to them -- doing this might involve running the transformation through CGI, and that might be quite expensive.
    So to sum up, XSLT sounds like it's worth your time.

    And regarding your example document : it seems like it'd be ripe for something similar to a cross-ref tag. For instance :

    This returns the generated callback function that is a thunk to turn a WNDPROC or WNDPROC_2 signature into a call to @handle_message on this instance.

    might have (e.g.) <XREF> tags around WNDPROC, WNDPROC_2 and @handle_message to provide links to other documents as appropriate.

    update I meant "cascading stylesheet", to clarify.

      Re my document and cross-ref: the @ symbol indicates that the following word is set in code-style, and also will automatically generate cross references. Any @-word that can't be matched to a cross-reference would generate a warning. I have in mind (ultimatly) that when the mouse passes over one that names a parameter, the parameter in the signatures is highlighted. It would be smart enough to automatically link to other members in the same class, quallified symbols, and other class names. Details of a mapping, if needed, could be specified in another element. Anything that's not automatic would be an element embedded in the text.

      Also, if the mouse passes over a reference that's not local, it can show details in the lower window or a popup, or expand. E.g. the view just shows "sMSG", but when you hover it tells you it's actually "class ratwin::message::sMSG".

      re "a good style sheet will typically get you started. If you want to rearrange or reorder the way elements are displayed, use XSLT."

      What is the difference between "a style sheet" and XSL(T)?

      As far as finding a XSLT implementation, Microsoft has one with OLE interface. A Perl-native one is probably superior, though.

        Well, I cant contribute too much with regard to using XML with XSLT in perl, but I have used the two together just on the command line, using a package called sabalot, quite successfully.

        As for what you can do with XSLT, well quite a lot. For instance we were converting XML DB models into a perl object model. This involved converting XML into code, which required the use of conditional blocks, transformation, selective presentation etc. Not sure what a 'style sheet' is in XML, if its not XSLT, but XSLT has capabilities far beyond what CSS gives you in HTML. Bear in mind though that XSLT is a bit nasty in terms of syntax but thats why you use a WYSIWIG tool to write it...

        If you want to play around with a WYSIWIG XML/XSLT tool on a W32 platform then have a look at XML Stylus Studio its available free for trial. We found it usefull enough at work to buy a few dev seats worth. Other than that sablotron is a good open source command line etc version of the same thing.

        I'd be interested in how you get on with this, maybe a follow up post letting us know how you played it? Anyway, HTH

        Oops, last thought, I very seriously doubt that a Perl-Native XSLT implementation would be superior, my guess is it would be slow as hell. Use sablotron, but control it from perl. Sablotron has some new documentation on using it with Perl, I havent looked at it in detail but its here

        Yves / DeMerphq
        Have you registered your Name Space?

        A Perl-native XSLT processor does exists. Unfortunately it hasn't been fully compliant with XSLT 1.0 specs, and it is unacceptably slow. I recall someone said in perl-xml mailing list that it's even faster to spawn a Saxon process. I believe that XML::XSLT could be much better than it is now, since there are lots of very good XSLT processors written in Java, so I think it's more a choice of good algorithm and proper data structures rather than a language problem.

        I've run XSLTMark on my own, comparing the XSLT 1.0 compliance and performance of Saxon (Java), Sablotron, LibXSLT, and Xalan C++. Sablotron 0.70 - the latest version at this moment - fails on several tests, which indicates that it isn't fully XSLT 1.0 compliant, and the performance is not impressive. The rest of them - LibXSLT, Saxon, and Xalan - are fully compliant. And among these, LibXSLT is undoubtedly the speed demon. Xalan is the second.

        But if you're afraid that at some level of complexity you can't deal with XSLT alone (or it's too complicated to solve in XSLT, see Things XSLT Can't Do), you may try XML::Xalan, the perl binding to Xalan C++. It allows users to write an XSLT extension function in Perl. I know it lacks of several features, such as document validation, but currently it serves my need very well. Untested on Win32, though, just on Linux only.

Re: XML documentation formatting and transformations
by stefan k (Curate) on Nov 01, 2001 at 16:14 UTC
    the combination of XML and XSLT is really quite powerful (assuming you got a decent XSL(T) parser). The XSL(T) language is plain ugly to my eyes (and my fingers), though.

    Still there is one thing I found no other solution than coding myself when I started learning those techniques. When you got several XML dokuments (content) and push them through a XSL(T) stylesheet (layout) you won't get automagic links between all available documents.

    This is where w3make enters the room.
    Since I wrote that on my own I won't go into detail here (pushing the selfad too far ;-) just say, that w3make adresses exactly this problem and that I'll be happy to answer your questions should you consider using it.

    Regards... Stefan
    you begin bashing the string with a +42 regexp of confusion

(arturo) Re: XML documentation formatting and transformations
by arturo (Vicar) on Nov 01, 2001 at 19:46 UTC

    As a side note in favor of an XSLT approach, the KDE project's documentation is stored in DocBook XML format, and transformed on-the-fly to HTML for (well understood) rendering. All in all, it's a pretty slick system. We're implementing something disturbingly similar over here, although it's based on Java and it's getting delivered via the web.

    As far as an XSL-based transformation toolkit goes, check out XML::Sablotron, a Perl interface to the Sablotron XSLT processor -- combine that with AxKit in a mod_perl environment and you have opened up a new world. Whether or not you need to deliver the docs over HTTP, I've found XML::Sablotron to be pretty easy to use, even if in the end you're using C libraries.

    perl -e 'print "How sweet does a rose smell? "; chomp ($n = <STDIN>); +$rose = "smells sweet to degree $n"; *other_name = *rose; print "$oth +er_name\n"'
Re: XML documentation formatting and transformations
by hsmyers (Canon) on Nov 01, 2001 at 21:42 UTC

    Here are some links that might lead you where you want to go:

    There are more out there, but not yet a lot. The key here is to combine your search on XML with a search on Literate Programming. Because that is pretty much what you describe- literate programming that is. In fact, excepting the small matter of coding (SMOC tm), you not only should get every thing on your list (index, master table ofcontents, etc.), but you also get your source code out of the same black box. If you extend your schema, you also get a cross reference for all parameters used and if you want, all variables cross referenced with scoping information etc. Point of fact, if you tag it, you can manipulate it! Again, this is what LP is all about.

    Having said that, there are two problems that come to mind. The first problem is that most of the magic involved around LP typically revolves around translating some initial form into LaTeX, and you've already indicated that, that wouldn't be desirable. Second problem, is tied to XML. While the idea of LP is comparative old (20 years comes to mind), expressing this in XML is pretty much a new thing. New enough that there aren't a lot of ready made tools that spring to mind. Most of them have been mentioned already in either this thread or the earlier one.

    All of that leads me to this:

    • Q. XLS, will it fall down?
    • A. Nope. XLS can do pretty much everything you've mentioned, it's just SMOC again.

    • Q. What technology and approach should you take?
    • A. Well either lose your hang-up over LaTex and use well featured, tested, off of the shelf software…or do it your self. This last will give you the most control over the output- I'd still keep the LP approach (it's sound and it works), but a combination of XSL and Perl should take you to you destination. Keep in mind that you would have to pay the 'early adopter' penalty no matter what approach you choose.

    I've given this topic quite some thought, since it is pretty much what I'm getting paid for at the moment! I've looked at virtually every solution to this problem out there, where 'looked at' means actually tried. To be truthful, I haven't found a single solution yet. Which same implies that I'm currently leaning in the direction of 'roll your own'. My general idea is to write a filter that uses a hybrid approach, i.e. part XML, part LP, part target language (Perl at the moment) to generate source, reference and help documents in html, and pdf. And of course, I'd write this in Perl- as a module, in the usual form of a 'something' 2 'something' app (pod2html, pod2pdf, and similar).

    To grok the LP thing here is the single best pointer that I know:


      The whole idea behind Literate Programming is that someone can read the single source like a book. That is, it has the approachability of a "whitepaper" explaining it, but can spit out the real source code as opposed to referencing it in seperatly-maintained files.

      Obviously, that's not XML.

      What I really want is a simple markup that's easy to type, but I'll process that into XML as a stand-alone component. Then the real work transforms XML into my formatted documentation.

      As far as mixing the code and docs in the same file, I like the idea for some types of work, especially the code I write as part of a magazine article! That would be utterly perfect, if the LP file were a readable tutorial as it was.


        Obviously, that's not XML.

        If this is based on experience, please relate the details. Point of fact, no has ever said that the raw source to LP has to be readable. Nor is the output limited to 'Whitepaper' style prose— I use it for normal day to day library documentation. To paraphrase, the whole point of XML is information tagging…once that is done, you transform it into whatever you want. And there is nothing in that concept that precludes LP or for that matter XML-LP. Have you actually looked at approaches like CWEB and NUWEB? If “What I really want is a simple markup that's easy to type…” then you are talking about a wheel that has already been invented, tested and used to a fare-thee-well.

        All of that aside, the reason I keep harping on the combination of XML and LP is that regardless of how you get there, XML is a stronger base for transformation than the original LP concept. So if you take the LP idea of starting with a simple markup language and combine that with conversion into XML, then I believe the sky is the limit on the possible transformations. Further, if you embrace the perlish 'Lazy is good!', then using a hybrid system makes good sense. For those things that LP already does, use an existing setup, typically a language non-specific LP, like NUWEB or one of the others (there are many.) For output not anticipated by LP, slap a Perl filter on your easy markup and let it rip! The posted comments about XSL being obviated by CSS is arguable…actual textual transformations can't be done in CSS, but are the bread and butter of XSL. I hadn't thought of it as an example before, but I actually use of form of this hybrid every time I post a bibliography on one of my web pages. The source is kept in a text file formatted in what is called BibTeX. It can be used in LP (gets converted into either html, pdf, or LaTeX) as is. For the web, I perl-filter it to XML and display using XSLT. This leverages 'working' code written by others with my own efforts, and is an open ended solution too boot!

        Obviously all of this babble is based on my idea of a good time, and at a guess, your mileage probably would vary. Not important, what is important is your goal. The more people that make this journey, the more choices we will all have, simply because there will be more solutions! Good luck,and let the monastery follow along.


        p.s. Source into whatever form your editors want shouldn't be a problem…
Re: XML documentation formatting and transformations
by alien_life_form (Pilgrim) on Nov 01, 2001 at 22:49 UTC

    Some flamage (start of message) and one advice (end of the message).

    This is my anecdotical, highly opinionated take on XSLT:

    ME : "Looks we have to adjust this another bit..."
    Coworker: "Do you think we can get this to work before XML falls out of fashion?"

    Basically, it looks like every time I fish out my XSLT, the task at hand is either:

    1. Nothing a good CSS could not handle


    2. Nothing I would not rather do with a glue language directly from the DOM.
    And my question would be, why do we need a highly idiosyncrasiac, verbose, devious programming language for transformations? How is it superior to direct interaction with the DOM?
    Besides it appears to have a slew of limitations - just by looking at the mailing list, it appears that every tiny departure from the text book examples require some extension (Saxon is often mentioned).
    So I have long ceased to try to wrestle with XSLT. When CSS is not up to snuff I usually do (on Win32):
    use Win32::OLE; my $parser=Win32::OLE->new('msxml.DOMdocument'); my stuff
    On linux, you can get something similar from either the perl XML modules OR the Xerces-C module.

    AxKit has a thing that is called (I think) XpathScript, which does much what XSLT promises to do, but with much saner syntax and looks (IMHO). I have never used it, but it would probably be my choice after going straight to the DOM.

    You can't have everything: where would you put it?

      Yea, it means learning another language. I could do it using Perl and the DOM now and it's fairly obvious just how to do it.

      If XSLT can't do what I want, I wouldn't even try, but go right to the Perl program.

      What does CSS do with XML? I know how to use Cascading Style Sheets level 1 for my HTML, but I've never heard of it transforming XML into HTML or otherwise enableing a browser like IE to present it nicely. Can you elaborate on that?


        I do this kind of stuff occasionally, so I am of little use without a reference (which I do not have by now) however: say

        <foo>this is your tag</foo>

        then you can style it in a CSS2 stylesheet like this:
        foo { display:block; color:red; }
        Which tells the formatter you have got a block level element, and to color it in red...

        See for instance:

        You can't have everything: where would you put it?

      On Win32, you say you use Win32::OLE->new('msxml.DOMdocument');.

      Why do you use that instead of the various Perl modules? My first impression, formed by experience with DBI vs. ODB etc. and using Word and VSS from Perl, is that the OLE interface is clunky compared to a module designed with Perl in mind.

      So, does msxml do something that the CPAN modules don't? Or is there some specific advantage to using it?



        I should perhaps have told that my XML parsing needs do not have a very high byte count - so I am not very concerned about efficiency at this point.

        So the main reason for using the msxml COM engine is the same as for climbing mount Everest: it's there.

        I do not need to install anything ,and I get an alternate drop-inimplementation from Xerces COM interface (that I do have to install - check's site for xerces-c, not the like named java implementation).

        I could (but have not done it) get the native perl/xerces interface, whose performance should be comparable to xerces.

        As for perl's XML:: hierarchy, I do not know. I was never able to get a clean windoze install, from CPAN or otherwise though things hav eprobably improved since my last attempt. I heard people that went through it and were much less then awed by the performance level. (And truly, a large DOM tree made entirely of perl objects must be an unwieldy beast..)

        --<br: You can't have everything: where would you put it?

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://122473]
Approved by root
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (5)
As of 2023-02-05 06:48 GMT
Find Nodes?
    Voting Booth?
    I prefer not to run the latest version of Perl because:

    Results (31 votes). Check out past polls.