Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

Beginnings

I'm in charge of document and reverse-engineering a large system composed mainly of unorganized, undocumented, roten and ragged Perl Code, probably written for Perl 5_0005 in a AIX machine.

The History

It was being maintained for the last seven years by folks that know nothing about "good pratices" on software engineering and the code needs refactoring a.s.a.p.

The Mission

My mission, right now, is rise as much as I can from the existing code, and I see a big ammount of information running away to the backyards door if I don't document all the existing subs and its dependencies.

Gory Details

What I already got: I have a nice hack using File::Find that was able to put me on my way by using simple regular expressions to recognize and capture subrotine declarations and qualify them by packages. Code follows:

#!c:\perl\bin\perl.exe use strict; use warnings; use constant CODE => '/path/to/code/rootdir/'; use File::Find; my %source; find( \&wanted, CODE ); ## ## pkg( $filename ) ## Determina o pacote (package) que o arquivo implementa. ## sub pkg{ my $file = shift; my $pack; open FILE, $file or die $1; while( <FILE> ){ if( m{^package\s+([\w:]+)\s*} ){ $pack = $1; last; } } close FILE or die $1; return $pack; } ## ## uses( $filename ) ## Determina as bibliotecas que este módulo|script usa|requer. ## sub uses{ my $file = shift; my %libs; open FILE, $file or die $1; while( <FILE> ){ $libs{ $1 }++ if m/^\s*(?:use|require)\s+(\S+).*?;/; } close FILE or die $!; return wantarray ? ( keys %libs ) : [ keys %libs ]; } ## ## loc( $filename ) ## Conta as linhas de código existentes em um determinado arquivo. ## sub loc{ my( $file, $counter ) = ( shift ); open FILE, $file or die $!; $counter++ while( <FILE> ); close FILE or die $!; return $counter; } ## ## subs( $filename ) ## Determina o nome das subrotinas declaradas em $filename. ## sub subs{ my $file = shift; my @subs; my $package; open FILE, $file or die $1; while( <FILE> ){ $package = $1 if m{^package\s+([\w:]+)\s*}; push @subs, $1 if /^sub\s+([\w:]+)/; } close FILE or die $1; @subs = map { s/$package\:\://o; $_ } @subs if $package; return wantarray? @subs : \@subs; } ## ## version( $filename ) ## Tenta determinar a versão do módulo ou script ## inspecionando a variável $VERSION definida por ele. ## sub version{ my $file = shift; my $version; open FILE, $file or die $1; while( <FILE> ){ if( m/VERSION\s*=\s*/ ){ $version = $_; last; } } close FILE or die $1; { no strict; if( $version ){ eval $version; $version = $VERSION; $version = '0.1' if $version =~ /Revision/; }else{ $version = ''; } } return $@? $@ : $version; } ## ## wanted( $filename ) ## Determina se desejamos ou não contar com este arquivo na nossa aval +iação. ## Chamada como call-back pelo File::Find::find. ## sub wanted{ my $f; my $file = $_; return unless ( -f && /\.(p[lm])$/ ); my $type = ( $1 eq 'pl' ? 'script' : 'module' ); ( $f = $File::Find::name ) =~ s{CODE}{}o; $source{ $f } = { type => $type, name => $file, filename => $f, LoC => loc( $File::Find::name ), 'package' => pkg( $File::Find::name ) || 'main', libs => uses( $File::Find::name ) || [], subs => subs( $File::Find::name ) || [], version => version( $File::Find::name ), }; } my $packages; for my $file ( keys %source ){ push @{$packages->{ $source{$file}->{package}||'main' }}, $source{ +$file}; } # use Data::Dumper; # print Dumper( $packages ); # print Dumper( \%source ); print q{ <html> <head> <style> body { margin: 5% 10% 5% 10%; background-color: white; font-size: 130%; } div.system { background-color: #EEE; padding: 10 20 10 20; } p.title{ font-size: 180%; font-weight: bold; text-align: center; } div.package{ margin: 5 5 0 5; padding: 0 5 5 5; border: solid thin black; } div.file{ margin: 5 10 5 10; padding: 5 5 5 5; background-color: #DDD; } </style> <title>Relat&oacute;rio de An&aacute;lise do C&oacute;digo Fonte</ +title> </head> <body> <div class="system"> <p class="title">Relat&oacute;rio de An&aacute;lise do C&oacut +e;digo Fonte</p> }; my $total_loc = 0; my $total_subs = 0; my $total_packages = scalar keys %$packages; my $total_files = scalar keys %source; map { map { $total_loc += $_->{LoC} } @$_ } values %$packages; map { map { $total_subs += scalar @{$_->{subs}} } @$_ } values %$packa +ges; print qq{<p>Total pacotes: $total_packages.<br>Total Arquivos: $total_ +files.<br>Total linhas: $total_loc.<br>Subrotinas: $total_subs.</p>\n +\n}; foreach my $pack( sort keys %$packages ){ my @files = sort { $a->{name} cmp $b->{name} } @{$packages->{$pack +}}; my $package = $files[0]->{package} || 'main'; print qq{<div class="package"> <a name="#package_${package}">\n<p> +Pacote <code&gt;${package}</code&gt; </p>\n\n}; foreach my $f ( @files ){ $f->{filename} =~ s/\'\"//og; $f->{name} =~ s/\'\"//og; print qq{<div class="file"> <a name="#file_$f->{name}">\n<p>Arquiv +o <code&gt;$f->{filename}</code&gt; </p>\n\n<ul>}; print qq{<li>Tipo: <code&gt;$f->{type}</code&gt;</li>\n}; print qq{<li>Linhas de C&oacute;digo: $f->{LoC}</li>\n}; print qq{<li>Vers&atilde;o: $f->{version}</li>\n}; if( scalar @{$f->{libs}} ){ print qq{<li>Depend&ecirc;ncias:\n<ul>\n}; foreach my $lib( sort @{$f->{libs}} ){ $lib =~ s/\'\"//og; if( exists $$packages{$lib} || grep /\L$lib\E/, map lc, keys % +$packages ){ print qq{<li> <a href="#package_$lib">$lib</a></li>\n}; }else{ print qq{<li>$lib</li>\n}; } } print qq{</ul>\n}; } if( scalar @{$f->{subs}} ){ print qq{<li>Subrotinas Implementadas:\n<ul>\n}; foreach my $sub( sort @{$f->{subs}} ){ print qq{<li><code&gt;$sub()</code&gt;</li>\n}; } print qq{</li>\n}; } print qq{</ul></div> <!-- class file -->\n}; } print qq{</div> <!-- class package -->\n}; }

Considerations

I know this is not the Best Way To Do It, but I'm sure its working. The output is kind of nicely formatted HTML linking modules dependencies inside my source code.

Next Steps

My next step would be mapping every subrotine call so I can follow calls along the modules and determine real modules dependencies between modules. It seems that the old developer think its easy just cut'n'paste all possible use Module; from the system than decide what modules are really needed for a given module or program.

Looking For

In sintesys, I'm looking for suggestions about what kind of strategy I should use to parse all this Perl code (about 1.85Mb of ascii text) and build the information I need about it in a few hours (this is all I have now, sorry).

Limitations

My limitations: I have no access to other modules than not the default in ActivePerl 5.6.1 (this is internal politics: I can't use anything not previously checked by security folks, including open-source tools). This is kind of a problem, given that the ActivePerl 5.6.1 lacks the IniFile module, needed to parse the original system config files.

A Possible Start Point

Talking about a start point, maybe I should scan a module file looking for subs, and inside each sub, look for barewords (this will probably lead me to a better chance of having a subrotine call). After that, I need to scan the module for the candidate subrotine name and if I can find it, store this information. If there is no subrotine declaration for the given bareword, I need to scan modules listed on @ISA, in order, for the same thing, recursivelly until I find the sub declaration or run out of modules to scan.

Suggestions?

Of course, it seems to me that this could take a big amount of time. Maybe there are better approaches for this problem...

Updates

Fixed a little issue about a misplaced <code> tag in the article; Many thanks to jasonk for pointing this.

wazoox just told me that the '<' signs at my code was errouneously translated to '&lt;'. Thank you for that.


In reply to Reverse Engineering Perl Using... Perl. by monsieur_champs

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (5)
As of 2024-03-29 13:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found