Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

I have a generalized "from XML to SQL" chore... best way?

by sundialsvc4 (Abbot)
on Feb 28, 2008 at 18:46 UTC ( [id://670977]=perlquestion: print w/replies, xml ) Need Help??

sundialsvc4 has asked for the wisdom of the Perl Monks concerning the following question:

I've got several occurrences of a very general requirement:   “Here is a periodic data-feed consisting of a well-formed XML document and corresponding DTD definition, and what we need to do with that information is to update an SQL database. ” We're going to have lots and lots of instances of this very same requirement, some with large files some with small, which differ only in the particulars.

As I see it, the general requirements are (and I know them to be quite commonplace...):

  • Within the overall document, we're going to need to take slices of it, e.g. using XPath-expressions.
  • We're going to need to be able to apply filters to the fields, either in the form of data-massagers or data-validators (“shall we or shall we not accept this record?”) or both.
  • We're going to need to easily map the XML elements to the SQL tables and fields.
Uh huh, yep, simple. There is obviously nothing really spectacular about this chore; it's just a chore.

Now I know that CPAN has a plethora of high-level tools such as XML::Twig but I fully expect that there are higher-level tools even than this. And so I, brashly exhibiting the Core Value of “Laziness,” :-) wish to find it, not write it Yet Again.

Hence my humble supplication to the Monks. Prithee, where the magic-bullet that I have not stumbled-upon yet?

To clarify...   Yes, of course, I've done a CPAN search for “XML SQL” and I see many modules here. What would really help me most is your actual hands-on experiences, your been-there done-that. If I can go as straight-a-way as possible to the best overall solution for my chore, in one swell foop, I Will Be A Very Happy Boy.

Thanks!

Replies are listed 'Best First'.
Re: I have a generalized "from XML to SQL" chore... best way?
by blahblahblah (Priest) on Feb 29, 2008 at 00:46 UTC

      Hmmm... that term sounds vaguely familiar.   What does “MS” stand for, again?

      ;-)

      From the look of things, it seems like XML::XPath might do such a good job of ripping into the XML structure that a few short scripts might just do it for us.

Re: I have a generalized "from XML to SQL" chore... best way?
by aufflick (Deacon) on Mar 03, 2008 at 07:27 UTC
    This is not Perl specific, but you can transform your xml into sql using xslt. Stealing some bits of code from elsewhere on the net, I can use the following xslt to convert my del.icio.us bookmarks into a set of insert statements:

    <?xml version="1.0"?> <!DOCTYPE rdf:RDF [ <!ENTITY rdf 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'> <!ENTITY rss 'http://purl.org/rss/1.0/'> ]> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:rdf="&rdf;" xmlns:rss="&rss;"> <xsl:output method="text"/> <xsl:template match="/rdf:RDF"> <xsl:apply-templates select="rss:item" /> </xsl:template> <xsl:template match="rss:item"> insert into links (url, title) values ( '<xsl:call-template name="sql-escape"><xsl:with-param name="text" +select="rss:title"/></xsl:call-template>', '<xsl:call-template name="sql-escape"><xsl:with-param name="text" +select="rss:link"/></xsl:call-template>' ); </xsl:template> <xsl:template name="sql-escape"> <xsl:param name="text"/> <xsl:variable name="tmp"> <xsl:call-template name="replace-substring"> <xsl:with-param name="from">'</xsl:with-param> <xsl:with-param name="to">''</xsl:with-param> <xsl:with-param name="value" select="$text"/> </xsl:call-template> </xsl:variable> <xsl:value-of select="$tmp"/> </xsl:template> <xsl:template name="replace-substring"> <xsl:param name="value" /> <xsl:param name="from" /> <xsl:param name="to" /> <xsl:choose> <xsl:when test="contains($value,$from)"> <xsl:value-of select="substring-before($value,$from)" /> <xsl:value-of select="$to" /> <xsl:call-template name="replace-substring"> <xsl:with-param name="value" select="substring-after($value, +$from)" /> <xsl:with-param name="from" select="$from" /> <xsl:with-param name="to" select="$to" /> </xsl:call-template> </xsl:when> <xsl:otherwise> <xsl:value-of select="$value" /> </xsl:otherwise> </xsl:choose> </xsl:template> </xsl:stylesheet>
    If I save that to a file called del2sql.xsl I can insert the links in the rss response like so:

    wget -O - http://feeds.delicious.com/rss/aufflick | xsltproc del2sql.xsl - | psql links_db Pretty neat huh! Unfortunately xslt is even less decipherable than golf perl, but thems the breaks.

    You might also be better using xslt to convert to CSV and then using some simple perl script to slurp in the csv - you could then handle insert/update errors gracefully in your perl.

    The utility sql-escape and replace-substring templates can be placed in a common file and included into your xsl stylesheets via an xsl:import statement.

    Update: In my example I used a commandline xslt engine wrapper, but if you're going the route of a perl script to manage the db connection you can manage the xslt engine there also. CPAN has plenty of xslt related modules. XML::XSLT::Wrapper seems a good idea and would let you benchmark the different libraries it supports.

Re: I have a generalized "from XML to SQL" chore... best way?
by trwww (Priest) on Mar 01, 2008 at 09:32 UTC

    Hello,

    Assuming that the XML file takes a "each child of the root node is a record" layout, the first thing I would do is turn each record in to a data structure. An easy way to do this while keeping efficiency in mind is with Jenda's XML::Rules (see also node #582348).

    Then personally I would use Data::FormValidator to validate and manipulate each record.

    And then probably DBI or maybe an ORM to insert the data in to the db.

    trwww

Re: I have a generalized "from XML to SQL" chore... best way?
by Jenda (Abbot) on Mar 03, 2008 at 19:19 UTC

    I'm afraid that if such a module was to be general enough it would be either just as low level or way too complex.

    For the time being try to forget about the modules that might be available and take a few of those tasks and try to express their differences in a way that would be detailed enough for a computer (using a hypothetic module). Try to express as a config file or as function/method calls of a module the slices and the filtering and the mapping ... you may easily find out that what you end up is too complex and that there really isn't a way around using your preferred XML parser and DBIx module directly. Or maybe you find out that what YOU need can be built on top of that selected parser and DBIx easily with the exact bits that can be controlled that you need.

    In either case I don't think you can get away with something like a shared script and a bunch of config files. I'd expect more luck with a library that implements the common bits used by a new script for each import.

      Maybe someone can adapt/use this one. It is a generic clientside XML to SQL converter.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://670977]
Approved by Corion
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (2)
As of 2024-04-26 04:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found