Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

General Transformation API

by hanenkamp (Pilgrim)
on Oct 23, 2003 at 16:52 UTC ( [id://301639]=perlquestion: print w/replies, xml ) Need Help??

hanenkamp has asked for the wisdom of the Perl Monks concerning the following question:

I've been discussing an idea of mine for a generalized Transformation API on the Module Author's list and wondered what the Monks might have to say about it.

The idea is this: I would like to have a semi-intelligent way of translating files from one format to another. I would like this system to be able to, at least partially, detect some aspects of the source and intermediate format of the file to determine how to reach the eventual destination format. I envision the ability to (eventually) translate one or more source files (or file handles) into one or more sink files (or file handles).

Example 1: Two-step Image Conversion

That's vague, so let me describe a couple examples (use-cases for you Software Engineers). First, let's say I want to convert a TGA file into a PDF. I have a converter that will translate the TGA into a PS file and a converter from PS to PDF. The user supplies the TGA file and says, "Make it a PDF." The system the finds a path from TGA to PDF using the available converters, runs the transformations, and returns the result.

Example 2: Three-Step Document Conversion

Second, let's say I have an XML file in DocBook format that includes some other markup in custom namespaces for some specialized information. It also includes some TAL information (just read the latest TPJ) that needs to generate some info. I want it to be converted to HTML. So, the system looks at the configuration and determines that I am converting an XML file that contains the TAL and other custom namespaces. I have an XSL stylesheet that can translate the specialized markup into pure DocBook, the TAL can be evaluated using Petal, I can use the DocBook XSL stylesheets to generate HTML.

The system should see a path like this:

  1. Use Petal to eliminate the TAL from the document.
  2. Use the XSL stylesheet to translate the custom markup into DocBook; we should now have a pure DocBook file.
  3. Use the DocBook transformation to translate the file into HTML.

Questions...

So, does such an API sound interesting? Why? Why not? What caveats would you anticipate? I've thought of several, but I want to make sure I know the risks. Anyone have a suggestion for an API or utility that is similar in Perl or another language? (That is, metaconversion tools, not actual converters like XSLT or ImageMagick.)

I honorably request the wisdom of the Monks. Thank you for your thoughts.

Replies are listed 'Best First'.
Re: General Transformation API
by Corion (Patriarch) on Oct 23, 2003 at 17:06 UTC

    I see many pitfalls for a General Conversion API, or rather only a single, as it will be either too generic, because it has to accomodate everything, or too special for its title.

    The pipelines I know are:

    • AxKit - XML::SAX::Machines - an XML transformation pipeline
    • My homegrown data transformation and reconcilation engine (written in Python for work). It takes input files, and converts them into hash-lists, the single filters operate on these hashes or the whole list, and the output is a report in the end. It has evolved far enough so that a reconcilation report can now be written in a mostly declarative style.
    • A realtime movie/animation transformation engine, much like Adobe AfterEffects, but realtime. This is a toy idea I tinker with. Here, the problem would be that the consumer/producer relation is not entirely clear and must be arbitrated somehow, as you have a target framerate and a (different) framerate at which every producer works, and the arbitrator must make good decisions about where to spend the CPU resources, and every filter involved has to at least export some metadata on how it intends to spend resources allocated to it.

    As long as you reduce your problem domain close enough, you can keep most of the meta-data problem out of your house, as the meta-data is common to all filters involved.

    As soon as you go "general", your filter API can be reduced to the following interface:

    sub filter::can_connect_to { my ($self,$input) = @_; return 1; }; sub filter::connect_output { my ($self,$next_filter) = @_; # ... do magic }; sub filter::on_input_connect { my ($self,$input) = @_; }; sub filter::process { my ($self, $data) = @_; # ... do magic }; sub filter::describe { return "Magic filter"; };

    This hides all the "arbitration magic" in the connect and on_connect methods, where both filters have to decide on something they can do to each other... I think that without a specific goal, you won't get far with a general concept.

    perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web
      I see many pitfalls for a General Conversion API, or rather only a single, as it will be either too generic, because it has to accomodate everything, or too special for its title.

      I recall similar arguments being used when DBI's development got started. Sure, there will be big bumps to take, but I think this is worthwhile pursuit.

      I would definitely investigate first which similar approaches have been done in the past: there are several packages capable of conversion between different image formats already. This API could serve as a glue for such conversions.

      I think it would also be important to focus on capabilities rather than efficiency. There will always be more efficient ways to do specific conversions. The generic API is for ease of use, efficiency can be built in later if people have an itch for it.

      Liz

Re: General Transformation API
by pg (Canon) on Oct 23, 2003 at 17:34 UTC

    It is an interesting idea, and probably worth a try.

    I envision it as a set of XML tag describes things like:

    • Steps
    • The original format for each step
    • The destination format for each step
    • The tool used for each step

    And a set of programs interpret those tags and execute them.

    I suggest you spend time to make the tags right, and their meanings of really clear. Do it right from the beginning.

      At the very least, for my own needs, I need a tool that can auto-detect namespaces and the stylesheets that can get them from a source format to a sink. The other transformation ideas may be desireable, but I haven't even sold myself on those.

      I may end up just doing the XML namespace bit and leave it at that and perhaps continue to hypothesize about the other until I get tired of the idea or come up with a clever solution.

Re: General Transformation API
by Anonymous Monk on Oct 23, 2003 at 17:22 UTC
    So, does such an API sound interesting? Why? Why not? What caveats would you anticipate? I've thought of several, but I want to make sure I know the risks. Anyone have a suggestion for an API or utility that is similar in Perl or another language? (That is, metaconversion tools, not actual converters like XSLT or ImageMagick.)
    Well, print filters are one mechanism to control document conversion (/etc/printcap and magicfilter for example). But these specify all paths directly, multi source / single target. Yours sounds like a nice general document transformation API (probably easily implemented as a graph) that would be useful and easy to update.
Re: General Transformation API
by lachoy (Parson) on Oct 24, 2003 at 11:41 UTC

    Just a quick thought, but you could build something on top of the Pipeline project. At least the infrastructure for asynchronous transformation declarations is there.

    Chris
    M-x auto-bs-mode

      Cool. This is actually pretty similar to what I was thinking. Perhaps, this would work as a foundation. I don't know. I'm starting to backpedal out from under this idea as I've found another way around the problem using a faster/easier, but less idealistic solution.

      I'll keep my brain on this one since it is still an interesting idea to me.

      Thanks to everyone for your replies. Your suggestions have been instructive and constructive.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://301639]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (8)
As of 2024-04-25 15:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found