Project Management: Graph & Diagram for Visualizing & Analyzing Structure with GraphViz

While many programmers prefer processing data and structural information in their heads without any visual aids, visualization and graphical presentation are nonetheless often an efficient and effective way to convey nonlinear ideas, simple or complex.

Here we will present a few simple examples of the use of GraphViz in the context of project management during the following phases: 1) Requirements, 2) Design, and 3) Coding/Testing, with a Web-based bill collector application as an example.

Requirements

Visio or ArgoUML is good and commonly used for interactively creating flowcharts or diagrams of any kind.

But suppose otherwise. Your client (a bill collector) wants some access control for the application you're going to build (i.e. who can see and do and control what). In order to do that, you need to find out their organizational structure. Virtually all organizations don't have one documented. You need to construct one by a series of triads (me, my boss, my subordinates) by asking all the users/employees.

After gathering such data, it would be helpful to construct a complete or some partial org chart to ask the client to verify. Hand drawing one with interactive software won't be very practical. If your software has some wizard or macro to generate a data-driven org chart, by all means, use it. Alternatively, you can use GraphViz.

use strict;
use GraphViz;
use XML::Twig;

# mkg($xml, $output_file_name)
mkg("@{[<DATA>]}", "orgchart");

sub mkg {
    my ($xml, $file) = @_;
    my $root = XML::Twig->new()->parse($xml)->root;
    my $g = GraphViz->new();
    render($g, $root);
    $g->as_jpeg("$file.jpg");
}

sub render {
    my ($g, $root) = @_;
    my $super = mkname($root);
    $g->add_node($super);
    foreach my $child ($root->children) {
        my $subord = mkname($child);
        $g->add_edge($super => $subord, dir => 'back');
        render($g, $child);
    }    
}

sub mkname {
    $_[0]->att('title') . " (" . $_[0]->att('name')  . ")";
}


__DATA__
<Boss title="CEO" name="Joe">
    <C title="COO" name="Liz">
        <VP title="VP HR" name="Fu"></VP>
        <VP title="VP Sales" name="Ty"></VP>
    </C>
    <C  title="CFO" name="Lo">
        <VP title="VP HR" name="Fu"></VP>
        <VP title="VP Sales" name="Ty"></VP>
    </C>
    <C  title="CTO" name="Gi">
        <VP title="VP HR" name="Fu"></VP>
        <VP title="Sysadmin" name="TJ"></VP>
    </C>
    <C title="Adviser" name="Bob"/>
</Boss>
[download]

The code above will generate a graph like this. The example assumes your data are stored or can be converted to XML.

Such software as Visio is programmable. Why not use it? Consider the following two example. First, you try to rename a shape in an existing file:

use strict;
use warnings;
use Win32::OLE;
$Win32::OLE::Warn = 3;
my $path = "c:\\files\\visio\\";
my $file = "test.vsd";
my $Visio = Win32::OLE->new('Visio.Application', 'Quit');
my $VDocs = $Visio->Documents;
my $VDoc = $VDocs->Open("$path$file");
my $VPage = $VDoc->Pages->Item(1);
my $VShapes = $VPage->Shapes;
my $VShape = $VShapes->Item(1);
$VShape->{Text} = "New Name";
print $VShape->{Text};
$VDoc->SaveAs($path."test2.vsd");
[download]

You have to go through an object hierarchy six levels deep. Programming with GraphViz is rather straightforward.

use strict;
use warnings;
use GraphViz;

my $g = GraphViz->new(node => {fontsize => 10}, edge => {fontsize => 9
+}, rankdir => 'LR');
$g->add_node('email', label => "Email App:\nperiodic DB query\nto send
+ emails", shape => 'box');
$g->add_node('report', label => "periodic financial\nstatement");
$g->add_node('cond', label => "amount\nowed", shape => 'Mdiamond');
$g->add_node('msgS', label => "very frightening\nmessage");
$g->add_node('msgM', label => "extremely frightening\nmessage with\nra
+ndom death threat");
$g->add_node('msgL', label => "very frightening\nmessage without\ndeat
+h threat");
$g->add_node('thankC', label => "thank you\nfor your business");
$g->add_node('thankD', label => "thank you\nfor your payment");
$g->add_edge('email' => 'report', label => 'creditor/collector');
$g->add_edge('email' => 'cond', label => 'debtor');
$g->add_edge('cond' => 'msgS', label => 'small');
$g->add_edge('cond' => 'msgM', label => 'medium');
$g->add_edge('cond' => 'msgL', label => 'large');
$g->add_edge('email' => 'thankC', label => 'creditor/collector');
$g->add_edge('email' => 'thankD', label => 'debtor');
$g->as_jpeg("email01.jpg");
[download]

The code above will create a flow diagram like this (the first email diagram). The code basically consists of a series of add_node and add_edge. You could add_edge without add_node actually, where new nodes will be automatically created. add_node gives you additional control over the appearance of an individual node.

The two sample codes above show sometimes GraphViz (and procedural programming) could make things easier than, say, Visio (and OOP). But if you need your graph to be programmable and interactive at the same time, you would need OOP, where each shape has associated instance methods, which GraphViz doesn't have. (Also, GraphViz doesn't have any UML-compliant shapes by itself.)

Back to the application, the graph above was pretty much a literal translation of one of the things that the client described and wanted the application to do--to periodically look at the database, and send emails with corresponding content to creditors or bill collectors (who use the application to collect bills) and debtors (who can pay bills online).

Let's say the client has looked at the graphs and signed off the requirements specification. We may now proceed to the design phase.

Design

One of the major goals of Requirements Analysis and Design is minimization--from programmers' perspective at least, as users may or may care if your code is efficient and effective or not as long as it does the work for them. (Note that minimization is not same as generalization. A database with one Employee table is as minimized as a database with one Person table. But Person is more generalized than Employee. How much generalization should there be is an open debate.)

Simple graph doesn't always mean simple coding (nothing simple about "translate English into Russian") but a complicated one is certainly a warning sign. In fact, during Requirements Analysis and Design, a few questions we should always ask:

Why do we need this requirement? What does it accomplish?
Do these requirements complement each other? Any one contradicts some other?
How can some of them be conceptually and/or logically combined?
Is every requirement logically complete?
Am I working too much?

It's a common and fatal mistake to translate user's raw requirements directly into code without critically questioning the legitimacy, validity and usefulness of each requirement, as well as the project as a whole.

Graph could help spotting "loose end" and determine the completeness of the requirements, as every edge has to end up at a "meaningful" node.

Graph is also a great aid when doing simplification--by generalizing and/or minimizing the graph. In our first email graph example, though the graph is logically correct, for fear that it might mislead a designer or programmer into turning each "bubble" in the graph into a template or script individually, we might want to simplify the graph (sensible to designers and programmers, not necessary to the client, as it's now primarily for technical purpose). That is, to minimize the number of bubbles, so to speak.

Each bubble loosely represents an action or a message. We notice that the messages can be categorized into two categories: a report, and a thank-you note, visualized as follows.

$g = GraphViz->new(node => {fontsize => 10}, edge => {fontsize => 9}, 
+rankdir => 'LR');
$g->add_node('email', label => "Email App:\nperiodic DB query\nto send
+ emails", shape => 'box');
$g->add_node('report', label => "periodic financial\nstatement", clust
+er => 'report');
$g->add_node('cond', label => "amount\nowed", shape => 'Mdiamond');
$g->add_node('msgS', label => "very frightening\nmessage", cluster => 
+'report');
$g->add_node('msgM', label => "extremely frightening\nmessage with\nra
+ndom death threat", cluster => 'report');
$g->add_node('msgL', label => "very frightening\nmessage without\ndeat
+h threat", cluster => 'report');
$g->add_node('thankC', label => "thank you\nfor your business", cluste
+r => 'thank');
$g->add_node('thankD', label => "thank you\nfor your payment", cluster
+ => 'thank');
$g->add_edge('email' => 'report', label => 'creditor/collector');
$g->add_edge('email' => 'cond', label => 'debtor');
$g->add_edge('cond' => 'msgS', label => 'small');
$g->add_edge('cond' => 'msgM', label => 'medium');
$g->add_edge('cond' => 'msgL', label => 'large');
$g->add_edge('email' => 'thankC', label => 'creditor/collector');
$g->add_edge('email' => 'thankD', label => 'debtor');
$g->as_jpeg("email02.jpg");
[download]

The code above generates this second email diagram. "cluster" attribute was added to group the bubbles.

Turning the diagram into a "design," we generate the third email diagram with the following code.

my @color = (color => 'lightgray', fontcolor => 'lightgray');
$g = GraphViz->new(node => {shape => 'box', fontsize => 10}, edge => {
+fontsize => 9}, rankdir => 'LR');
$g->add_node('email', label => "Email App:\nperiodic DB query\nto send
+ emails");
$g->add_node('report', shape => 'ellipse');
$g->add_node('thank', label => "thanks you\nnote", shape => 'ellipse')
+;
$g->add_node('report XSL', @color);
$g->add_node('thank XSL', @color);
$g->add_node('XML data', @color);
$g->add_edge('email' => 'report', label => "all");
$g->add_edge('email' => 'thank', label => "all");
$g->add_edge('report' => 'report XSL', label => 'use', @color);
$g->add_edge('thank' => 'thank XSL', label => 'use', @color);
$g->add_edge('report' => 'XML data', label => 'use', @color);
$g->add_edge('thank' => 'XML data', label => 'use', @color);
$g->as_jpeg("email03.jpg");
[download]

The XSL boxes in the graph signifies that we'll embed our business logic into XSL modules, whereas XML box represents data logic and module. Combining things into a couple of (XSL) modules may simplify the design of the application if we're allowed to compromise a little of the flexibility of the layout and the content of each type of the email message.

For architectural discussion, it's often more effective and efficient to look at a graphical DB schema instead of a textual one or a bunch of create table scripts. Suppose we have created the following tables in MySQL.

DROP TABLE IF EXISTS org;
CREATE TABLE org (
  id     int NOT NULL,
  name     varchar(255) NOT NULL,
  PRIMARY KEY  (id),
  UNIQUE KEY id (id)
) TYPE=InnoDB;

DROP TABLE IF EXISTS employee;
CREATE TABLE employee (
  id     int NOT NULL,
  name     varchar(255) NOT NULL,
  PRIMARY KEY  (id),
  UNIQUE KEY id (id)
) TYPE=InnoDB;

DROP TABLE IF EXISTS orgstruct;
CREATE TABLE orgstruct (
  org_id        int NOT NULL,
  employee_id     int NOT NULL,
  subord_id     int NOT NULL,
  PRIMARY KEY    (org_id, employee_id, subord_id),
  INDEX (org_id),
  INDEX (employee_id),
  INDEX (subord_id),
  FOREIGN KEY (org_id) REFERENCES org (id),
  FOREIGN KEY (employee_id) REFERENCES employee (id),
  FOREIGN KEY (subord_id) REFERENCES employee (id)
) TYPE=InnoDB;
[download]

We can reverse engineer the tables simply like this:

use strict;
use warnings;
use DBI;
use GraphViz::DBI;
my $dbh = DBI->connect("DBI:mysql:test", "user", "password");
GraphViz::DBI->new($dbh)->graph_tables->as_jpeg("dbi.jpg");
$dbh->disconnect;
[download]

Here's the result.

Of course, there're plenty of powerful database tools out there that do reverse and even round trip engineering that you can (and probably should) use.

Coding/Testing

Many people use some script to generate a HTML directory tree to help developers browse through scripts and module and retrieve them from CVS or whatever.

You could also more fancily use GraphViz to generate image map instead (which could be useful at times).

use strict;
use GraphViz;
use XML::Twig;

# write $file.jpg and $file.html to files

# mkg($xml, $output_file_name)
my $file = "modules";
my $map = mkg("@{[<DATA>]}", $file);
my $html = <<HTML;
<HTML>
    <BODY>
        <MAP NAME=mymap>
            $map
        </MAP>
        <IMG SRC="$file.jpg" USEMAP="#mymap">
    </BODY>
</HTML>
HTML
open OUT, ">$file.html";
print OUT $html;
close OUT;


sub mkg {
    my ($xml, $file) = @_;
    my $root = XML::Twig->new()->parse($xml)->root;
    my $g = GraphViz->new();
    render($g, $root);
    $g->as_jpeg("$file.jpg");
    return $g->as_cmap;
}

sub render {
    my ($g, $root) = @_;
    $g->add_node($root->att('name'),  URL => $root->att('src'), shape 
+=> 'record');
    foreach my $child ($root->children) {
        $g->add_edge($root->att('name') => $child->att('name'));
        render($g, $child);
    }    
}


__DATA__
<script name="report" src="file://usr/perl/pl/report.pl">
    <module name="MyApp::DataXML" src="file://user/perl/pm/MyApp/DataX
+ML.pm">
        <module name="DBI" src="http://search.cpan.org/author/TIMB/DBI
+-1.37/DBI.pm"/>
        <module name="XML::libXML" src="http://search.cpan.org/author/
+PHISH/XML-LibXML-1.54/LibXML.pm"/>
    </module>
    <module name="MyApp::RptXSL" src="file://user/perl/pm/MyAp/RptXSL.
+pm">
        <module name="XML::libXSLT" src="http://search.cpan.org/author
+/MSERGEANT/XML-LibXSLT-1.53/LibXSLT.pm"/>
    </module>
    <module name="Mail::Sender" src="http://search.cpan.org/author/JEN
+DA/Mail-Sender-0.8.06/Sender.pm"/>
</script>
[download]

Here is the image map generated by the code above. The CPAN modules' links are real; the others are dummy.

GraphViz also comes with Devel::GraphVizProf, which is a graphical version of Devel::SmallProf.

If you wish, you may also color highlight your subroutines in your profile graph based on their relative execution times, as in the following example.

use strict;
use GraphViz;
use XML::Twig;
use List::Util qw/ max /;

my @profile;
for (<DATA>) {
    chomp;
    push @profile, [split /\s+/];
}
push @profile, [undef, 'end'];
my $max = max( map {$_->[0]} @profile );

my $g = GraphViz->new();
for my $i (0..($#profile-1)) {
    my $w1 = ($profile[$i][0])/$max ;
    my $w2 = 1-$w1/2;
    my $color = "$w1,$w2,$w2";
    $g->add_node($profile[$i][1], fontcolor => $color, color => $color
+);
    $g->add_node($profile[$i+1][1]);
    $g->add_edge($profile[$i][1] => $profile[$i+1][1], label => $profi
+le[$i][0], color => $color, fontcolor => $color);
}

$g->as_jpeg("profile.jpg");

# millisec sub
__DATA__
1 fetchXML
2 preprocessXML
5 generateReport
1 randomThreat
6 generateReport
2 sendemail
[download]

Here is how the graph looks like.

__________________________
Update: Couldn't help but incorporate Kageneko's just-posted Prettified Perl Inheritance as a GraphViz example.

The code is dumb down version of Kageneko's.

# Original code: "Prettified Perl Inheritance" by Kageneko
use strict;
use warnings;
no strict 'refs';
use GraphViz;

my $module = 'Net::FTP';
my $output_jpg_file_name = 'isa';

my %already_loaded   = ();

# write $output_jpg_file_name.jpg to file
my $g = GraphViz->new;
ScanModule($g, undef, $module, 0);
$g->as_jpeg("$output_jpg_file_name.jpg");


sub ScanModule {
    my $g         = shift;
    my $parent  = shift;
    my $module  = shift;
    my $depth   = shift;
    my @total   = @_;
    my $loaded  = 0;
    my $label;

    $loaded = 1 if (exists $already_loaded{$module});    
    eval "use $module" if (!defined $parent);
    
    $g->add_edge($parent => $module) if $depth > 0;
    $label = $module;
    
    unless ($loaded)  {
        my $version = $module->VERSION();
        $label .= " (v$version)" if $version;

        $g->add_node($module, label => $label);

        my $isa   = "${module}::ISA";
        my $count = 1;
        my $total = @$isa;
        
        foreach (@$isa) {
            ScanModule($g, $module, $_, $depth + 1, @total, $count, $t
+otal);
            $count++;
        }
        $already_loaded{$module} = $parent;
    }
}
[download]

Here's the output.

(It's worth mentioning that Aristotle wrote a short neat script a while back to robustly list any Perl code's module dependencies.)

Comment on Project Management: Graph & Diagram for Visualizing & Analyzing Structure with GraphViz Select or Download Code


laziness, impatience, and hubris
	PerlMonks