Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Baldly globaling were no-one globaled before.

by gumpu (Friar)
on Aug 26, 2000 at 22:46 UTC ( #29826=perlquestion: print w/replies, xml ) Need Help??

gumpu has asked for the wisdom of the Perl Monks concerning the following question:

Namaste

I am pretty new to Perl (2 months).

This is a question about globals and how to build elegant data structures.

Say I have a Perl program that analyses a bunch of C source files. It figures out which functions, variables, constants, macros etc there are in the program and in which file they are defined. It also figures out which functions are called by which function and more useful information like that. After this is done it generates a number of HTML pages that show the structure of the program based on all this information.

All the information has to be stored somewhere. One possibility is to use a bunch of globals, say:

my @functions; # list of all function names my @typedefs; # list of all typedef names my @variables; # list of all variables names my @constants; # list of all constants names my %defined_in; # maps a name to a source file name my %pattern; # maps a name to a pattern that can be used to # look-up the name in a source file. my %calls # maps a function name to a list of function names # that are called by that function.

Given this the temptation is great to write a couple of subroutines that fill these globals while scanning through the source code. With the subroutines taking just the names of the source files as parameter.

Now we all know the mantra, "globals are evil and modifying globals in subs is even more evil".

In a slightly more elegant method the subroutines take references to the various hashes and arrays that they modify. This has two disadvantages it seems:

  1. The subroutines will be filled with dereferences.
  2. Long lists of parameters

Yet another option is to make another hash

my %program;

Where for instance

$program{"functions"}

returns a reference to the array with all function names, and $program{"defined_in"} returns a reference to the hash with file_names.

All the subroutines can take a reference to this hash has parameter. This solves the long list of parameters problem. However the subroutines now have to cope with two levels of dereferencing.

Is there a more elegant way to do this in Perl?

Have fun

Replies are listed 'Best First'.
Re: Baldly globaling were no-one globaled before.
by athomason (Curate) on Aug 27, 2000 at 01:17 UTC
    I'd go about the problem in a different manner. Blindly doing pattern matches for interesting stuff (e.g. /int\s+(\w+);/) will get you into trouble, will require a lot special-case attention, and might well take longer with debugging and all than using a real grammatical parser. Even that snippet right there is wrong in at least three ways, despite looking casually right.

    But parsing is a really big wheel to reinvent. I've done it, and it can be enlightening and entertaining, but if you just want functionality, go for a c(p)anned solution. Perl already has some parser modules, like Parse::yapp. All you need to use yapp (note: I haven't used it) is a yacc-compatible grammar, which you should be able to download for any C variant out there.

    Of course, a parser won't solve your problem for you. Once you have a parse tree, you'll need to dig through it to find all the functions, variables, and other such goodness, but the elements will be much more accesible.

    If you're willing to learn about lexers and parsers (which you really should!), this is the way to do it, especially since you've got multiple source files. I just have an itching feeling that once you start trying to do everything by hand you'll quickly run into a bunch of tall, spiky walls. But if you still want to do it the naive way, I don't really see a problem with modifying globals, though be aware that a variable declared with my isn't necessarily global: it's only accesible to subs declared after the variable. Go back and read the scoping docs. Passing a hashref like you say would only lengthen your parameter list by one item, and getting comfortable with dereferences wouldn't hurt. It's also a good practice, generally.

Re: Baldly globaling were no-one globaled before.
by knight (Friar) on Aug 27, 2000 at 04:42 UTC
    As an alternative to a full-blown parser, the C::Scan module does an extremely effective job of extracting information from C source code without relying on simple pattern matching. It's based on the Data::Flow module, and does some really fast, accurate scanning by (e.g.) replacing C strings and comments with white space (to avoid false matches within those constructs) and matching braces and parentheses to zero in on what's a function definition/declaration by syntactic position, not by regex matching.

    It may not be as flexible as a full LALR parser, but it already exists, so you wouldn't have to create your own C grammar or retrofit an existing one to a parser. I don't think it does everything you're talking about, but it does enough that it would probably be an effective starting point.

    (There's some really slick, mind-expanding Perl in both C::Scan and Data::Flow, which isn't surprising seeing as how they were originally written by Ilya Zakharevich. They're both worth looking at for the learning experience alone...)
Re: Baldly globaling were no-one globaled before.
by ZZamboni (Curate) on Aug 27, 2000 at 05:28 UTC
    Both athomason and knight have mentioned some good suggestions as to how to do the parsing. To address your question of how to store the data: I would go the single-global way. But as pointed out by athomason, a variable declared with "my" is not necessarily a global. The proper way would be to do something like this:
    use strict; use vars qw($program); $program={};
    and then you can use $program anywhere in your code. Notice that I am making $program into a reference directly, because that's how it is going to be used all over the place. It will save you quite a few \%'s when calling subroutines.

    If you are of the C tradition, you could have a "main" subroutine, and instead of declaring $program as global, make it my within that subroutine, which would then pass its reference to everyone else:

    sub main { my $program; ... $program={}; parse_program("source.c", $program); print_functions($program); do_other_stuff($program); }
    Within the subroutines themselves, you would have to do the two levels of indexing, but I don't think that is a big deal. You can store the necessary elements in variables to use within each subroutine:
    sub print_functions { my $program=shift; # this is not needed if $program is # global, obviously my $functions=$program->{functions}; # and now you can just use $functions to access the data }
    Cheers,

    --ZZamboni

RE: Baldly globaling were no-one globaled before.
by ZZamboni (Curate) on Aug 27, 2000 at 22:19 UTC
    After looking again at my last comment, it occurred to me that another way of doing it would be to convert your program into an object, and store the data as attributes of the object. Something like this:
    package CParsingThingy; sub new { my $class=shift; my $self={}; # you could either do this... $self->{program}={}; # or this: # $self->{functions}=[]; # $self->{typedefs}=[]; # $self->{defined_in}={}; # etc. bless $self, $class; } ... sub parse_program { my $self=shift; my $source=shift; my $program=$self->{program}; # and use $program as before. .. } # etc.
    This has the advantages of not using globals and of making it easier, if need be, to have several CParsingThingy's simultaneously. More than once I have written a program thinking "nah, I don't need no stinkin' objects for this" and then find myself going back and giving it an object interface because it makes it so much easier to package, reuse and extend things.

    --ZZamboni

Re: Baldly globaling were no-one globaled before.
by gumpu (Friar) on Aug 28, 2000 at 17:39 UTC

    Many thanks for the good suggestions and pointers! As was pointed out, trying to 'parse' C code with just regular expressions does not work. I found out that there are always some exceptions popping up. I'll try and use the parser module suggested and then use the OO method suggested by ZZamboni to solve the problems of the globals.

    Have Fun

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://29826]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (6)
As of 2022-12-10 05:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?