http://qs321.pair.com?node_id=324107

inman has asked for the wisdom of the Perl Monks concerning the following question:

I am faced with the task of iterating through a substantial set of MS Access databases (*.MDB) files and, without knowing anything about their schema, writing out an XML file containing all of the table contents. The idea being that I can then run a search engine spider over the resulting XML files such that I can locate databases on the file system. This is part of a data discovery tool used for risk management.

I am seeking the wisdom of fellow monks with this task because I believe that I am not the only one to have needed to do this. I am also looking for an example of how to discover the schema of an MS Access database.

The plan so far:

  1. Scan the network drive for *.mdb files using File::Find.
  2. Create / update and ODBC DSN for each database file using Win32::ODBC to manage the conection.
  3. Open a connection to the database using DBI and the DBD::ODBC driver.
  4. Discover the schema - This is where I need some information!
  5. Iterate over the tables, rows, columns. Write out to an XML file using XML::Writer.

Any help on the 'Discover the Schema' bit would be most appreciated.

Replies are listed 'Best First'.
Re: Flattening Access DB to XML
by g00n (Hermit) on Jan 26, 2004 at 13:18 UTC

    MS Access stores all the tables and schema in a single .mdb file. It's more than possible that an OLE call exists to extract the schema by table. For example in ADO.Net, OleDbConnection.GetOleDbSchemaTable() method can be used to extract a schema.


    you did not mention

    • which os your using (implied by win32::odbc)
    • assuming win32, what access drivers do you have?
    • is the db encrypted? (no mention on site about this but could be a problem)
    • what version of access files your using (mbt supports version 3/4)

    method 1
    DBIx::XML_RDB by matts - straight from the man docs ...

      sql2xml.pl -sn myserver -driver myDriver -uid user -pwd seekrit -table user -output users.xml

    here's a tute with code.

    method 2
    last ditch effort (requires linux box) you could try mbdtool to tackle step 4. here's a brief description of how to generate the schema of a ms access file. (from cvs) It's license is LGPL but may have limited availablity on win32 (base install requires glib2).

    method 3
    read the discussion on Microsoft access database on a Linux server from php ('success on mdb tools odbc install' post by lauram) using the MDBTools ODBC driver. Any reason this can't be done using perl?

      the above should really only be attempted for those who have nothing to do. the real solution relies on using 'the most excellent', DBD-AD0-2.83 (the current version) and look what we read in the *suprise* README file ....

        $sth = $dbh->ado_open_schema( $QueryType, @Criteria ) or die ...;
      the file continues ...

      This method can be used to obtain database schema information from the provider. It returns a valid statement handle upon success.

      the DBD::ADO perldoc online is here.

      so what does this mean? It means that you can now find out the schema programatically for unknown Access databases. I found this by checking installing MSDN, searching on COM and IDBSchemaRowset. In an article, Data Access for the Masses Through OLE DB, José A. Blakeley, MSDN. You can get the catalog information through the IDBSchemaRowset interface.

      In ADO the connection object supports this interface (OpenSchema). Behind this is a COM explanation. For those interested in this try reading OLE DB for the ODBC Programmer, Michael Pizzo and Jeff Cochran, MSDN.

      This means you can query the "... types, tables, triggers, views etc ..." for any given Access database.

      I should know all this stuff. The real credit goes to Tim Bunce and Phlip for good documentation. Oh, dont forget TMTOWTDI.

Re: Flattening Access DB to XML
by Grygonos (Chaplain) on Jan 26, 2004 at 14:53 UTC

    This is a fairly easy task if you are on win32. Use Win32 OLE to access it like so

    sub getTableDefs { $accessConst = Win32::OLE::Const->Load('Microsoft Access 8.0 O +bject Library'); $daoConst = Win32::OLE::Const->Load('Microsoft DAO 3.51 Object +Library'); #array to hold table names my @names; #Create a new OLE Access.Application object my $app = Win32::OLE->new('Access.Application'); #Open the database being worked in $app->OpenCurrentDatabase($datasource); #Get the database object my $database = $app->CurrentDb(); #Save every table name that is not a system or linked table foreach my $i (0..($database->TableDefs->{Count}-1)) { #If the attributes are 0 (ie a normal table) if(!$database->TableDefs($i)->{Attributes} || $database->Table +Defs($i)->{Attributes} eq $daoConst->{'dbAttachedTable'}) { #add it to the list of valid selections push @names $database->TableDefs($i)->{Name}; } } #Quit the application $app->DoCmd->Quit($accessConst->{'acQuitSaveNone'}); #Return the list of valid table names return @names; }

    This is some code I wrote a while ago. It does work but it might not be exactly what you want. It only counts non system tables and non-linked tables. this was in a package.. I moved the constant object declaration into the sub for clarity when I pasted it here.

    Again.. this code has been tested and does work. I hope this can be of some use to you.

    P.S. If you open up the object browser in access, and you can check what the possible values of the attribute method are, so you can know which table you want to count. There is also a QueryDef collection if you are interested in counting those.


    Grygonos
Re: Flattening Access DB to XML
by been42 (Curate) on Jan 26, 2004 at 14:54 UTC
    I know that it's not in Perl, but this is the first place I found a way to get info on the innards of an Access database. It worked perfectly the first time I used it (at the time, I had about an hour to find a solution), and I've used it ever since:

    Reverse Engineer MS Access/Jet Databases

Re: Flattening Access DB to XML
by iburrell (Chaplain) on Jan 26, 2004 at 21:55 UTC
    First, you don't need to create a permanent DSN for each file. DBD::ODBC supports dynamic connections by specifying the driver and other parameters. <code> my $dsn = 'dbi:ODBC:driver=Microsoft Access Driver (*.mdb);dbq=\\\\cheese\\g$\\perltest.mdb' <code> Second, new versions of DBI have an interface for exploring the schema of databases. DBD::ODBC has one of the most complete implementations of these functions. Check out table_info and column_info.
      The code that I have developed so far uses Win32::ODBC to do create/update a DSN and discover the driver to use, and names of the tables. DBI and DBD::ODBC are used to read each table in it's entirety and dump it to XML.

      I much prefer the idea of using a fully DBI based solution to do both the schema detection work and the database reading. I will update my code and post it later today for critique and further analysis.
      Thanks.

Re: Flattening Access DB to XML
by inman (Curate) on Jan 27, 2004 at 14:07 UTC
    The story so far:
    Many thanks to all of the monks who came up with answers to my original post. I went with the DBI / DBD::ODBC approach because it was closest to my original idea and I had all of the components already installed.

    The code below lacks any real error handling and doesn't log any error/status information. I will be doing that later.

    Be gentle with your comments!
    Inman