Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Seeking advice about learning another's code

by E-Bitch (Pilgrim)
on Aug 02, 2001 at 02:31 UTC ( #101508=perlquestion: print w/replies, xml ) Need Help??

E-Bitch has asked for the wisdom of the Perl Monks concerning the following question:

This is more of a conceptual question, so it probably needs to be moved (such is my luck), but here goes:

So, I'm at work right (all the good stories start out this way... stop me if you've heard this before) when my supervisor says, "Hey, <insert division name here> has this really cool database verification script that checks the lines of a pipe delimited file for accuracy / formatting, and if each line passes, puts it into the database, and if it fails, emails the owner of the particular line of the file. Here is the script, can you figure it out, and get one like it running by next week for our database?"

<insert foot into mouth>Uh... (fearing job security) Sure!</insert foot into mouth>

So, now I'm pouring through this 1200 line perl script with another 1200 lines of sql scripting standing behind it going, "Uh... damn... shouldnt have said that."

My question for the monks around here is this:
I have figured out other's code before, but never on this scale before. I have begun to write / draw the UML diagram for this (I think this is UML, at least, but if not, its the diagram that has conditional statements in the diamonds, with lines to show program flow...) and have gone through pages. (Yea, I can hear you already, use a program that does the UML diagram for you... but I want to draw it... :)

So, does anyone have any tips on figuring out large blocks of uncommented code? <shiver>
I see a lot of stuff that I know about, and many things that I can figure out (like perl's database access routines), and a few things that I have no clue on (like this: $pipecount = tr/|//;). I have only known perl for like 5 months, and havent had too much time to devote to it for pleasureful programming. I have the camel, but what are good ways to go about this (by next week?)


Replies are listed 'Best First'.
Re: Seeking advice about learning another's code
by footpad (Abbot) on Aug 02, 2001 at 07:16 UTC

    Maybe I'm completely whacked here, but I'm not sure you've got time to study your co-worker's code in depth, at least not in the detail you'd like. You've got a week; that's 40 hours of work time. The way I see it, you've got two choices:

    1) Hack together a simpler prototype using what you've learned already. It appears that the goal is:

    Is Data Valid? (Yes) --> Insert into Database | (No) ---> Send email

    So, try to find the three places in your co-worker's code where he does each single operation, splice that into subroutines, create a simple slurper for your input file that calls those subs, run the above process for each line, and you're out of there.

    You might see if MIME::Lite, DBD and other respected modules wouldn't help cut down the lines of code. But, I'd do that later rather than sooner.

    At the very least, you should be able to get a working prototype for some basic rules in place, if not all of them. That 1200 lines of SQL code makes me suspicious. Are you sure he's (assumption, sorry) not calling stored procs, defining triggers on the fly, or such? I have seen systems that needed that many lines of SQL--but those are generally run by people who a) know how large they are and b) consequently won't give you something like this with only a week. This is a very basic operation it seems to me. In fact, I'd hope that such a complex data model would already have stored procs and/or triggers in place to do the complicated bits of placing the data into the proper tables. Check for those and use them if at all possible.

    2) If you really feel like you need to adapt the mess you're given, then at least give yourself a little help from Perl. Since your coworker chose not to comment, it's either as obvious as Petruchio likes or a complete mess. In lieu of any details, I'm assuming the latter.

    Does the code use subroutines? If so, then you'll want to make a list of all the sub declarations and the line they're declared on; use Perl to document this. (You'll need to refer to this while you're reading the code.)

    Need an idea which variables are being used, where they're declared, and where they're used? Modify that little Perl script accordingly. Scan for sclars, arrays, and hashes, locate the lines they're used on and then dump the results to a text file. This will help you understand the flow of internal data a little more.

    IOW, use Perl's strengths to automatically document--within reason--the existing code. This may seem trivial, but speaking as one that has maintained far too much bad code in his time--including a monolith that used NO INDENTATION (as in none, nada, zip, buptkis. Lines and lines of left aligned code, as far as the eye can see. God, I hated that project)--you need to know as much as you can about that code in the shortest amount of time.

    Use Perl's Pathetically Eclectic Rubbish Listing abilities for that. Create cross-references as quickly as you can.

    Next, walk through the existing code at a high level and document it using an outline. Make it broad; one comment per 20, 50, or 100 lines or so. Use this as you adapt it for your needs. Give yourself no more than two days to finish the outline.

    The outline will tell you what you can keep (can, not should; you're on a deadline...move alone). What you're really after here is what you must minimally change. You don't really have time for expensive rewrites or elegance.

    Remember, this is version 1.0. If you hit this deadline, you'll be in a credible position to request permission to rewrite things for version 2.0. It may be cr@p, but functional cr@p is okay for a limited amount of time and as long as no one outside the company uses it.

    I wouldn't even call it alpha or beta code, I'd call it "proof of concept" or even prototype code. Then, I'd work toward getting permission at rewriting it to be better, faster, stronger, and so on.


    P.S. There is a third, Dilbertian alternative, which I only mention 'cause I've seen it happen. Specifically, this might be a rite of passage, a test, a no-win scenario like the Kobayashi Maru scenario from Star Trek II: The Wrath of Khan. Sometimes, the best career-saving answer is, "I don't know, sir, let me look at things and let you know. I may not be able to do it in a week. I'll do my best to give you an accurate and fair estimate."

Re: Seeking advice about learning another's code
by VSarkiss (Monsignor) on Aug 02, 2001 at 02:55 UTC

    Prayer comes to mind. (What else did you expect at the Monastery? ;-)

    Seriously, here's a few tips.

    • Try to break it up into pieces. Even if it's not in subs, make a copy of the original, and start ripping out lines. Once you have some manageable chunks, you'll be able to figure it out more easily.
    • Make a version that doesn't do any database calls at all. You don't mention whether it's using DBI; in any case, it has to be calling something to execute SQL. Substitute an "execute_sql" routine that just prints out the string that was passed in.
    • Learn to use the Perl debugger real well real fast.... Use it to test expressions you don't understand by typing them in and seeing what they do.
    • Personally, I wouldn't bother with the UML diagrams. They're great for design, but I've never found them useful for reverse engineering. But that's just me.
    Prayer may not be a bad idea, actually....


Re: Seeking advice about learning another's code
by DBX (Pilgrim) on Aug 02, 2001 at 02:46 UTC
    I can only offer you a couple of quick suggestions to get you started. Having been in this situation before, the first I start doing is commenting everything I DO understand. Then the parts that are not familiar or are very complex stand out and I can go back through on another pass and work it out.

    The other thing I do that helps me tremendously is to start encapsulating sections of code into functions (especially when the script has been written in a procedural fashion as opposed to a function-oriented program) to clean up the flow. Instead of many confusing lines, you have a few function calls, some of which you will get and some you won't. It's much easier to dive into the function you don't get than to try to extract its meaning from all the context around it when the script has been written procedurally.

    If you are on a deadline, what you may realize if you comment the code and put related code into functions is that you don't have to mess with some of the parts you don't understand. As long as they still work, you may only have to change or work with the rest of the script and you can go back when you have time and learn the rest.

    I hope this makes sense and helps you.

Re (tilly) 1: Seeking advice about learning another's code
by tilly (Archbishop) on Aug 02, 2001 at 09:06 UTC
    The key to understanding someone else's code is understanding how they thought when they wrote it. Unfortunately if they were not thinking in a very organized or structured way, then it could be hard.

    First of all is there any division of this stuff into functions or it it all straightline? If the latter than I would give up on understanding it. 1200 lines straightline is probably not understandable.

    Next, did they use strict? If the answer is no, then you have an obvious approach. Just insert a use strict and start repairing errors. As you go you will have every variable pointed out to you, and you can understand each one. You should be able to also notice mistakes, and thinking about which things are and are not mistakes will help you. And if you are asked what you are doing, you can just sigh and say, "You wouldn't believe what I have found in this code..." If they did use it, then your code is probably in better shape, but you don't have this obvious approach.

    If the thing is well-factored into functions, then you can just start commenting inputs, outputs, and what the functions do. That should give you the overall structure, but it might take some work.

    Finally if you need to write it, ask here for ideas. There are a number of ways to structure running a series of tests, and if you ask you can get some good ideas. (I would use a functional approach with the tests being anonymous functions that take a parsed line and return all necessary useful error messages for the email. Many of the tests would be closures...)

Re: Seeking advice about learning another's code
by ginseng (Pilgrim) on Aug 02, 2001 at 02:48 UTC
    You're going to hate this suggestion. I already hate it.

    Retype the program.

    When reading code, it's too easy to skip around, to glance over a questionable section without realizing it. When you type it all, you have to pay attention to every single line. No exceptions. This is actually the way alot of programmers learned to program, in the late 70's/ early 80's, myself included. In my case, almost all my programs came from Compute! Magazine, and I typed them all in manually. This method gets you close to expert on a program's code, if you can do it.

    Yeah, it's still a lousy suggestion.

    My next suggestion, for what it's worth, and the one upon which I usually rely, is to rewrite it. "If the code's uncommented, it obviously cannot be good code, right?" "It's probably pure spaghetti! Not like you or I would write, if we were to do it ourselves?" "And it's bloated, too, right? I mean, who needs 2400 lines to do something like this?" Your boss will love those comments, right?

    I'm not sure if either suggestion will help, but at least you have my sympathy, if not concrete assistance.

Re: Seeking advice about learning another's code
by Everlasting God (Beadle) on Aug 02, 2001 at 03:03 UTC
    a) Ack, what I the name of all that is holy takes 1200 lines of SQL? IANASG, but good god, that's a lot of stuff man!

    b) Nope, not uml, just a plain ole' flow chart. UML is more a class inheritance/interface kinda thing. Java guys like it, so you know...

    c) Doesn't the name kinda give it away? $pipecount get the scalar return value of tr, which is the number of things it translated, namely pipe chars, and since there is no translation rule, tr will just leave them alone (or maybe just blindly exchange them for the same thing, not really sure)

    which brings me to
    d) Camel is your friend, fellow monks are your other friends. Bog through, and if you get stuck, ask with some nice snipets.

    which brings me to
    e) This will probably a rather time consumeing endeavor, so if it really is your job on the line, it may be hack now, understand a little later time.

    Good luck, hope it isn't really your job on the line (Not that I think it's impossible or you can't do it, but I know it wouldn't help me work if it was)

    'The fickle fascination of and Everlasting God' - Billy Corgan, The Smashing Pumpkins
      ++all by the way!
      I havent dove into the sql yet.
      Java is kinda the only "offical" language around here if ya know what I mean.
      $pipecount does give it away, but I was kinda curious how it did it (and it was an example too... guess that should been a "thing I could figure out", huh?)
      da job isnt on the line, but possible reputation points are... as an intern, I get a buffer, but I dont want to use it yet.
      Thanks for All the help guys (and ladies...)!

      wrong place, please ignore and delete
        Nope, I'm in Broomfield, CO. The perl I have written is incredibly useful, but that whole, "we have to use java" thing kinda prevents any high level uses of anything else. All our database access that I have done have been JSPs/servlets.

Re: Seeking advice about learning another's code
by tachyon (Chancellor) on Aug 02, 2001 at 02:57 UTC

    Here is an explanation of tr. It returns the number of translations it made. Normally you use it to translate one char into another like tr/A-Z/a-z/ which just lowercases $_ If the second part is null....well run the code and see.

    $_ ="|Here|are|six|pipe|symbols|"; $pipecount = tr/|//; print $pipecount ,"\n", $_, "\n"; $pipecount = tr/|/~/; print $pipecount ,"\n", $_, "\n";

    Good luck.




Re: Seeking advice about learning another's code
by Cubes (Pilgrim) on Aug 02, 2001 at 02:48 UTC
    I'm afraid I can't be too much help here, but I do have one suggestion. When you come across the parts you don't understand (or even the ones you think you understand, but aren't sure about), look them up. Crack open the Camel and read what it has to say about what tr does, and what it returns. Do this for every bit you find confusing, and you'll not only have a much better understanding of the code than if you'd just guessed, you'll also have increased your general perl knowledge (sorta like improving your vocabulary by looking up every word you're not 100% sure about while reading a book).
Re: Seeking advice about learning another's code
by SageMusings (Beadle) on Aug 02, 2001 at 07:43 UTC
    Fellow Perl Disciple,

    As mentioned, my vote is definitely for writing the code from scratch. In the long run, it will be easier on your time line and your digestive tract. Seriously, a little bit o' Perl goes a long way. In fact, I'll bet a 1200 line mass of SQL and Perl can be elegantly reduced. You may end up with a thing of beauty.

    I have been in a similar situation before, although the deadline was more generous. I chose the re-write. If anything, at least the code you already have, dense as it is, gives you a rough framework to hang your work on.

    Good luck. If you get stuck with any particulars, ask.
Completely rewriting someone else's deprecated code...
by dragonchild (Archbishop) on Aug 02, 2001 at 21:19 UTC
    As someone who's doing this as a side-project (gotta love those!) right now, I can tell you these are the steps to take:
    1. Print out the whole thing and take it out of the office. Sit down in a Perkins/Denny's/etc. and, with a bottomless cup of coffee and a new pack of smokes, read it the whole way through. Make notes. Highlight everything. Cross out everything you think it useless.
    2. Go back to the office the next day (very important ... you need a day to digest all you just read) and run it once, without modification. This is to make sure you have a baseline set of output.
    3. Make a copy in another directory. DO NOT EDIT THE COPY YOU WERE GIVEN. You will make mistakes and need to start over.
    4. Add "use strict". This is so critical I'd repeat it if I was being pedantic.
    5. Add "use strict". (Ok, so I'm being pedantic.)
    6. Once "use strict" runs and the output is identical, you can now start editing.After every change, re-run the code and verify the output hasn't changed, except for what you deliberately changed. I cannot emphasize that enough. You're trying to clean up someone else's code. Don't add your own bugs. If you do, you'll go nuts, cause you won't know what part of the code is your bug and what part is his badly-written schlepp.
    7. Identify repeated blocks of code and encapsulate them in functions. These blocks just need to be functionally equivalent, not letter-for-letter equivalent. You can always pass in parameters or hashrefs or whatever.
    8. Start to make a stab at clarifying the data structures. For example, if someone is using $A, $B, and $C, make a hash with keys of ('A' .. 'C'). Most likely, you'll be able to reduce the number of lines and improve readability. You'll definitely save your sanity. (And, possibly, improve run-time.)
    9. Get rid of EVERY global variable. Then, if you find that you absolutely needed that one global, re-add it and document the hell out of it. Everywhere. Don't miss even one spot.
    10. Remove un-needed code. Do this very delicately. It's the place most likely to break your output.
      1. Comment out a section.
      2. Re-run the code.
      3. Delete the commented section.
      4. Rinse and repeat.
    11. Think about making what you're doing a module (or group of modules) and write a script that calls those. Maybe you should even make this OO (though that rarely is called for).
    Remember, you're trying to save the next poor schmuck the heartache you just went through. Document and comment thoroughly. Don't take a short-cut just to save 20min. Do it right.

    Another suggestion - in your specific case, I'd look at getting rid of the SQL script and using DBI. I'm a little suspicious of marrying Perl and SQL scripts. There may be a good reason for it, but that's what I'd look at. Also, if there are time issues in a run, most 1200 line SQL scripts tend to be very badly written. I remember re-writing a 5000 line PL/SQL module, primarily just changing the order of the SELECT calls, and changing the run-time of a user screen from over 1 minute to under 1 second. That made the screen (one of the most important in the tool) actually useable.

      Might I suggest using a revision control system instead of (or in addition to) copying the source to your own "work copy"? You can simply run subversion on your own machine, and you can even commit from within Komodo, for instance. You'll feel far, far more comfy making changes if you can always revert to whatever earlier version you ever committed.

      Oh yeah, and write some tests, to make sure you don't break anything while working through the code.

      And run the code through perltidy (possibly after deparsing).
        I wrote that four years ago. In that time, a few things have happened, like the creation of subversion and improved testing facilities within Perl. I fully agree with everything you're saying, and would add those items in as part of a refactoring process.

        My criteria for good software:
        1. Does it work?
        2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
Re: Seeking advice about learning another's code
by scain (Curate) on Aug 02, 2001 at 17:31 UTC

    Obviously there are lots of good suggestions here, I just wanted to add one more: If the indentation is not good, try running it through perltidy. This program adds white space and fixes indentation and makes reading code much easier.

    Good luck,

Code Breakdown
by E-Bitch (Pilgrim) on Aug 02, 2001 at 19:52 UTC
    Okay, So after going through the perl, I am now beginnging to see the light. The sql, I am going to assume, (and verify later) is full of useful, albeit unnessacary functions that can be replicated without figuring out 1200 lines of sql. Which brings me to my next point. The code is broken into several subroutines, (actually, 8 to be precise), and the execution runs 4 of them in sequence, then emails the results (good or bad) to the owner of the file. If nobody minds, I will process my thoughts here, and subject them to criticism for robustness.

    First Subroutine: initVars
    Percieved Usage: To initialize variables (hey, I got one!!!)
    • printing debug statements if the variables are set. (I have search for where the variables are set, and even run with strict, and cant find them... any help? the exact code looks like this:
      print STDOUT "\nStarting the initVars routine...\n" if ($L1);
      And I have gathered that $L1 is a command line argument (as there is $L2, $L3, and $L4, and they are all used in the same manner.) The one problem I see is, the code never processes command line args into these variables. Is there a way to do this w/o processesing them?
    • Sets 2 environment variables, and puts them into the $ENV hash.
    • sets up the path name for 4 files (good lines, bad lines, log file, and a lockfile)
    • sets up two paths for in and out directories
    • calls the abortScript function if there are not 3 arguments
    • processes 4 command line arguments into 4 separate variables for d-base info ($SERVER, $LOGIN, $PASSWORD, $USER_OWNER).
      Sidenote: This is another bone of contention. the previous line says, call abortScript if $#ARGV != 3, and then this one shifts ARGV 4 times... am I being an idiot and forgetting that perl's $#ARGV gives the index of the last element in the array, and not the count?
    • places the servername into the $ENV hash
    • sets up the lock file name into a string.
    • sets up true and false variables
    • sets up a var called $anyDataFiles, and inits it to false
    • Logs into the database, using a function &ora_login (its an oracle dbase) || abortScript
      Here is the next problem. The code has NO use statements whatsoever. How can he (yes, it was a he that wrote this. I'd talk to him, but he has left the company since he wrote this) call functions such as &ora_login and others, without using modules or writing them into the code?
    subroutine 2: filterUnWanted
    Percieved Usage: to filter out unwanted files?
    • makes a system call to a shell script (which I dont have)

    subroutine 3: checkRecovery
    Percieved Usage: to see if script ran fully last time.
    • checks for lockfile's existence
      • If one exists, opens it, reads a time from it.
      • emails the user that the prog is running in recovery mode, and that the program wasnt completed last time.
      • deletes all information from what looks to be temporary database tables.
      • Commits the deletes,
      • unlinks the lockfile
      • unlinks the bad file
      • unlinks the log file
      • Otherwise, queries a table for a timestamp from a table
      • checks $anyVendorFileName ne ""
        • if it exists:
        • emails the user that there are unprocessed records
        • sets up errlogfilename
        • sets up the datafilename
        • sets up the baddatafilename
        • sets $anyDataFiles to $TRUE (i assume that this indicates that a file was being processed earlier)
        • calls process log table function
        • otherwise, queries an error log table. || aborts
        • emails the owner (are we seeing a theme here? put it in a sub!!)
        • sets up the errorlogfilename
        • sets up the datafilename
        • sets up the baddatafilename
        • sets $anyDataFiles to #TRUE (again, assuming that this indicates a file was being processed earlier)
        • calls &processLogTable

    subroutine 4: processFiles
    Percieved Usage: to process and verify all data in input files, and insert if good, email if bad.

    subroutine 5: processLogTable
    Percieved Usage: takes all information from the log table in the dbase, and writes them to the log file

    subroutine 6: removeCtrlChars
    Percieved Usage: to remove all control characters from the file by making a bunch of sed calls.

    subroutine 7: errorInLine
    Percieved Usage: called when a line is found to have an error in it.

    subroutine 8: abortScript
    Percieved Usage: called when an error is generated that prevents the script from continuing
    (I'll update more later, or if you all dont want to listen, I'll stop).

Re: Seeking advice about learning another's code
by itub (Priest) on Jul 30, 2005 at 01:43 UTC
    May I suggest reading Perl Medic? It's a book exactly about this topic.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://101508]
Approved by root
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (5)
As of 2023-01-30 14:25 GMT
Find Nodes?
    Voting Booth?

    No recent polls found