Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Building a data Hierarchy

by SlackBladder (Novice)
on Mar 20, 2007 at 11:54 UTC ( [id://605660]=sourcecode: print w/replies, xml ) Need Help??
Category: Text Processing
Author/Contact Info SlackBladder
Description: Firstly, most of the code was donated by "Graff", however, I want to make it available to all as it works very well. This code connects to a SQL Server machine, extracts the data via a query and works out hierarchical data structures (if any exist) within a dataset of two columns of data. It's pretty fast (currently extracts over 7000 records from a SQL Server system and builds the hierarchy within around 3 to 4 seconds). I have replaced "company" specific data.
#!/usr/local/bin/perl -w

use strict;  # important for variables
use warnings;  # Gives optional warnings
use Win32::SqlServer qw(SCALAR);   # This is the module which is used 
+to connect to the RDBMS
use Tie::File;
# Create database connection object, set database and then connect wit
+h integrated security.
my $sqlsrv = Win32::SqlServer->new; # Create a new SQLSRV object
$sqlsrv->setloginproperty('Server', 'SERVERNAME'); # Set the Server pr
+operty
$sqlsrv->setloginproperty('Database', 'DATABASENAME'); # Set the datab
+ase property
$sqlsrv->connect(); # This is where we connect to the sql server syste
+m using integrated security
my $reportfile = "report.txt"; # this is where the report goes
my $resultfile = "resultset.txt";  # Store the results from the query 
+in here
my $hierarchy = "hierarchy.out"; # This is where we store the final re
+sults for the hierarchy
my $folder = "THIS_IS_A_FOLDER_NAME";
my $countrecords; my $finalcount; my $resultset; my $name; my $file; m
+y $addr; my @array; my %node; my @rows;
my @TimeArray = localtime(time);
my $year = sprintf ("%04d",$TimeArray[5]+1900);
my $monthasnum = sprintf ("%02d",$TimeArray[4]+1);
my $day =  sprintf ("%02d",$TimeArray[3]);
my $hour = sprintf ("%02d",$TimeArray[2]);
my $minute = sprintf ("%02d",$TimeArray[1]);
my $datestamp =  "$year"."_"."$monthasnum"."_"."$day"."_"."$hour" ."_"
+."$minute" ;

# Open up the reportfile, if this fails, die
open (REPORTFILE,">>$folder\\$reportfile") or die "Can not open file $
+folder\\$reportfile for writing, quitting\n";
open (RESULTFILE,">$folder\\$resultfile") or die "Can not open file $f
+older\\$resultfile for writing, quitting\n";


# Print some meaningful header information to the report
print REPORTFILE "TAG HIERARCHY REPORT\n";
print REPORTFILE "YEAR_MONTH_DAY_HOUR_MINUTE\n";
print REPORTFILE "$datestamp\n";


# Our SQL statement.  Do not alter this unless you understand the cons
+equences !!!
my $stmnt01 = <<SQLEND;
SELECT FunctionalLocationCode, SuperiorFunctionalLocation
FROM dbo.TblToFunctionalLocationMaster
where SuperiorFunctionalLocation like 'A1%' and (Objectstatuscode = 'O
+PMI' or Objectstatuscode = 'OPER')
SQLEND


# The result set file is still open from further up, don't need to ope
+n it.  Print the query results to the file
my $result = $sqlsrv->sql($stmnt01, SCALAR);
foreach $name (@$result) {
   print RESULTFILE "$name\n";
}

# Remove the very last line from the file as it contains a blank line
tie @rows, 'Tie::File', '$folder\\$resultfile' || die "Can't open: $!\
+n";
pop @rows;
untie @rows or die "Could not update $folder\\$resultfile\n $!";


# Do a quick count on the number of records in the file to be processe
+d and report
# back to the reportfile, do not need to open the file as is still ope
+n
# from previous operation
open(FILE, "< $folder\\$resultfile") or die "can't open $file: $!";
$countrecords++ while <FILE>;
# Print the contents of the counter to the report file.
print REPORTFILE "Number of records to be processed for this run is ..
+.$countrecords\n\n\n";

# close file handles
close (REPORTFILE);
close (RESULTFILE);


# Begin building the hierarchy
open(TH, ">$folder\\$hierarchy") or die "can't open $hierarchy: $!";  
+  # output file
open (DATA, "< $folder\\$resultfile") || die "could not open file: $!"
+; # input file
while (<DATA>) {
    chomp;
    my ( $c, $p ) = split /\|/;
        if ( $c eq $p ) {  # these are easy, so finish them first
        print TH;
        print TH "\n";
        next;
    }

    if ( exists( $node{$c}{child_of} )) {
        warn "$.: bad record: $c is child of both $p and $node{$c}{chi
+ld_of}\n";
        next;
    }
    $node{$c}{child_of} = $p;
    $node{$p}{parent_of}{$c} = undef;
}

# begin the sorted output by looping over values that do not have pare
+nts:

for my $parent ( grep { !exists( $node{$_}{child_of} ) } keys %node ) 
+{
    my $children = $node{$parent}{parent_of};  # ref to hash of child 
+values
    trace_down( $children, \%node );
}

sub trace_down
{
    my ( $kids, $tree ) = @_;
    for my $kid ( keys %$kids ) {
        print TH "$kid|$$tree{$kid}{child_of}\n";
        # print "$kid|$$tree{$kid}{child_of}\n";
        if ( exists( $$tree{$kid}{parent_of} )) {
            trace_down( $$tree{$kid}{parent_of}, $tree );
        }
    }
}

# Close all open file handles, this is good coding practice.
close (REPORTFILE);
close (RESULTFILE);
close (TH);
close (DATA);
Replies are listed 'Best First'.
Re: Building a data Hierarchy
by graff (Chancellor) on Mar 21, 2007 at 02:01 UTC
    Firstly, most of the code was donated by "Graff", however, I want to make it available to all as it works very well.

    I'm glad to see you have something working (after such a length development phase ;) But I'd just like to clarify that, at least in terms of raw byte count, I did not write "most of the code" posted above. I just provided the  while (<DATA>) loop, the following "for" loop, and the "trace_down" sub.

    I think the post could be a more interesting contribution, and more people could find it useful, if that portion of your script were abstracted away from the database stuff. The basic premise is:

    • Read a file containing an unsorted list of "child/parent" records (one per line), where a child value in one record might be the parent value in one or more other records scattered throughout the list.

    • Populate a "node" hash structure, keyed by the string values found in each record, to keep track of parent and child relations among the strings.

    • Output the records in sorted order, so that for any string value that appears as both "child of" and "parent of" other values, the record containing its "child of" relation is first.

    So as a general-purpose function, the while loop and for loop should be in a subroutine, whose parameters might be: input file handle, output file handle, maybe a string to use as the split regex. That and the "trace_down" sub could go into a "SortTree.pm" module, or something to that effect, to make it easier to re-use.

    As for the database stuff you've posted here, it's fine that it works for you, but I expect anyone else would just have to scrape it off, because it really only works for you. Also, it's not clear to me how or why the contents of the "resultsfile" (from the DB query) end up as pipe-delimited. Is that some sort of default setting on your database server, or in Win32::SqlServer?

    One last nit-pick -- you could replace all this:

    my @TimeArray = localtime(time); my $year = sprintf ("%04d",$TimeArray[5]+1900); my $monthasnum = sprintf ("%02d",$TimeArray[4]+1); my $day = sprintf ("%02d",$TimeArray[3]); my $hour = sprintf ("%02d",$TimeArray[2]); my $minute = sprintf ("%02d",$TimeArray[1]); my $datestamp = "$year"."_"."$monthasnum"."_"."$day"."_"."$hour" ."_" +."$minute" ;
    with this:
    use POSIX; my $datestamp = strftime( "%Y_%m_%d_%H_%M", localtime );
    Gotta love POSIX... (updated to add link to POSIX man page)
Re: Building a data Hierarchy
by polettix (Vicar) on Mar 22, 2007 at 13:34 UTC
    open (REPORTFILE,">>$folder\\$reportfile")...
    Some comments about this. First of all, you could benefit from being used to use lexical filehandles instead of package scoped ones, like:
    open my $reportfile, ...
    This will let you avoid nasty situations when your script grows and you find yourself opening REPORTFILE in more than one place. The usual "avoid globals if you can" stuff. You can then use it where you previously used REPORTFILE:
    print $reportfile "TAG HIERARCHY REPORT\n";

    Moreover, you're using the two-argument version of open, which should be avoided just to prevent the user shoot in her (or your) feet. The three-argument version needs only a bunch of chars more but is way more safe:

    open my $resultfile, '>>', $filepath...
    In this way, $filepath can contain stuff like "booh; rm -rf /" and still let you sleep well (I don't know an equivalently destructive command in win32, but you get the point).

    The portable way to handle file paths is via File::Spec. Just in case you're interested into making your script more portable (which would probably require changes in the DB interface routines, too). Something along these lines:

    my $filepath = File::Spec->catfile($folder, $reportfile); open my $reportfile, '>>', $filepath ...;

    Last, but not least, you can provide a more sensible feedback to the user regarding the reasons why your script was unable to open the file, by means of the $! variable (see perlvar):

    my $filepath = File::Spec->catfile($folder, $reportfile); open my $reportfile, '>>', $filepath or die "open('$filepath') for appending: $!";

    Flavio
    perl -ple'$_=reverse' <<<ti.xittelop@oivalf

    Don't fool yourself.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: sourcecode [id://605660]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (10)
As of 2024-03-28 12:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found