Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

What are the core points of good procedural software design? (functions, code structuring)

by Anonymous Monk
on Jun 23, 2008 at 03:41 UTC ( #693430=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

So, here I am again, looking at a script that's grown larger than I expected. It's loaded with a page or two of file-scope lexicals at the top, followed by a bunch of functions that operate on them (many take no arguments and return no values), followed by a short "main" section at the end that uses those functions.

This happened because the script started out as just a short shell-scripty thing. "Do this, ask if I should do that, then do the other thing, and so on. Oh, but I need to save some state, so here's a global for that... and also a global for this... well, may as well rope this bunch of logic off in a function...". It all started so innocently.

The program doesn't (yet?) have many things in it that could be thought of as "objects", per se. Though, even if I did create a class and pulled a few strands of the bowl, there still would be plenty of spaghetti left over that needs straightening out.

So, what are the core points of good procedural design? I realize that functions modifying globals is not a terribly good idea, but am not sure how to escape it. I also realize that OOP isn't the cure for everything, and don't want to rush headlong into creating classes unnecessarily.

  • Comment on What are the core points of good procedural software design? (functions, code structuring)

Replies are listed 'Best First'.
Re: What are the core points of good procedural software design? (functions, code structuring)
by GrandFather (Saint) on Jun 23, 2008 at 04:08 UTC

    I find really light weight OO cleans up globals something wonderful. Consider:

    use strict; use warnings; my $obj = bless {global1 => 0, global2 => 3}; $obj->setThree (6); $obj->doSomething (); print $obj->{four}; sub setThree { my ($self, $three) = @_; $self->{three} = $three; } sub doSomething { my ($self) = @_; $self->{four} = $self->{three} + $self->{global2}; }

    Prints:

    9

    So the cost is a bless to create $obj then calling "methods" on $obj rather than directly calling the subs and having to access stuff through $self inside the subs.

    The real question is, does it buy you anything? Well, for the first cut when there were only a couple of variables and the whole thing was only a few dozen lines long, nope, it buys you nothing at all.

    For the second cut where you are adding a few more globals, you don't have to scroll to the top of the file to add globals - not really a win yet. But at least it's likely the subs dealing with the new "globals" are close together and the documentation describing them can be close by too, so there is some real benefit.

    But with the third round where you could really do with refactoring the code and generate a real class, guess what - you've already done most of the work!

    For anything that is likely to grow I find using light weight OO from the start makes it easier to evolve the code over time. The payback isn't on day one, but by a modest chunk into the project the scale tips and the up front work becomes worth while.


    Perl is environmentally friendly - it saves trees

      Grandfather, I like it. And I'm glad to use classes where I can. However, it seems to me that compartmentalizing off a part of the program into a class doesn't actually help me understand how to better write the necessary functions -- the only difference now is they're "instance methods" and every time they access one of the object's variables they need a "$self->{}" to do it.

      That is to say, the following two things look quite similar to me:

      1. A class, with instance variables, and instance methods
      2. A file, with file-scope lexicals, and functions

      So, getting back to the original posted question, perhaps I wasn't specific enough. Here's two specific questions:

      1. Should my functions be returning something instead of setting values of globals? (er.. FSL: file-scope lexicals, that is)
      2. Should my functions be taking arguments (specifying necessary data) instead of just looking at FSLs for that data?

      (1) If you say "yes" to the first one, then what does that buy me? Instead of setting an FSL somewhere, I'm returning values, and so need to create an FSL to hold the value anyway! Ex:

      foobar(); # vs. my $value = foobar();

      ...and now I'm back to my file containing a bunch of FSLs, only now they're scattered around the file (in front of function calls) instead of all listed at the top of the file.

      (2) If you say yes to the 2nd one: my functions need to know lots of things sometimes. I might have to pass in a bunch of arguments to tell it everything it needs to know. It seems ugly to have a function take more than a few arguments. How do you deal with this situation?

        What the light weight OO buys you up front is a "legal" way of using FSLs. Or actually, rather than using a FSL, you are using an OSP (Object Scope Property). In other words, it doesn't really buy you anything at all on day one. However, it does make it a whole lot easier to refactor your code as it grows. There are a few minor things that may be considered advantages of the OO approach:

        You might gain a little advantage by using setters on the object rather than setting random FSLs in various places - at least you can centralize sanity checking of values.

        You might gain a little advantage by grouping manipulation code for particular related properties together and describe how the properties are related in a POD block above the manipulation members. Using OSPs rather than FSLs removes the need to provide all the FSLs at the top of your script so grouping related stuff in sensible way becomes easier.

        None of these is a big win. For small chunks of code, unless someone has already solved the problem for you, there are seldom any big wins anyway. The best you can do is prepare the ground for later grand development, and that is where light weight OO does have an advantage.

        In fact, if it is possible you may migrate to an OO solution, it's easier to not pass stuff around for your first iteration. When you OO things just delete all the FSLs and change any $FSL to $self->{FSL}. So you could argue that you ought avoid both 1. and 2. for the first cut.


        Perl is environmentally friendly - it saves trees
        1. Should my functions be returning something instead of setting values of globals? (er.. FSL: file-scope lexicals, that is)

        ...

        (1) If you say "yes" to the first one, then what does that buy me? Instead of setting an FSL somewhere, I'm returning values, and so need to create an FSL to hold the value anyway!

        At least you don't have the globals in the body of functions definitions. You can now analyze the functions in separation from the rest of the code - this is a clear win. The next step is to move more code into the functions - and you'll get mostly global free code.

      Although I was expecting someone to say that the core points of good procedural software design are to use object oriented (i.e. "not procedural") design, but I didn't expect that to be the first reply :)

      Stop saying 'script'. Stop saying 'line-noise'.
      We have nothing to lose but our metaphors.

        I'm glad that answer came.

        I've looked into some larger C libraries (GTK and Imlib2 lately), and I found that their interfaces look very object oriented.

        It seems to be a general trend to write OO code even in non-OO languages.

Re: What are the core points of good procedural software design? (functions, code structuring)
by starbolin (Hermit) on Jun 23, 2008 at 06:43 UTC

    Anonymous Monk writes:

    "...there still would be plenty of spaghetti left over that needs straightening out."
    That is a feeling I am familiar with; but realize that part of the complexity of the code is due to not having a guiding data structure. The first thing I try to do is to enforce localization of data. Push the data to the right side of the function calls. Whether that involves closures, ties, manifests, configuration hashes or whatever, get the data separated from the control flow, the if/then/else on the left side.

    Very often there is natural structure that is hidden behind the chaff. Often, to uncover it, you have to restructure the program to allow the data to lay together. For example, when dealing with multiple files, opening a group of files inside one loop then processing them inside another loop. Even multiple loops, each doing one step of the processing. This is perhaps more resource intensive but it groups the like data together. Once you have the like data grouped together you can begin to look for natural groups that can become objects. When you have groups of data together with their accessor functions, then you have an API.

    "I realize that functions modifying globals is not a terribly good idea,..."
    Only if the API for that data isn't well defined. This is what an object does, it associates the functions with the data and defines the scope of the interactions.

    "...and don't want to rush headlong into creating classes unnecessarily."
    Agreed, let the code tell you what it's objects are. That will happen as you force localization of data.

    "So, what are the core points of good procedural design?"
    Define the data structure first, write the code to the needs of the data.


    s//----->\t/;$~="JAPH";s//\r<$~~/;{s|~$~-|-~$~|||s |-$~~|$~~-|||s,<$~~,<~$~,,s,~$~>,$~~>,, $|=1,select$,,$,,$,,1e-1;print;redo}
Re: What are the core points of good procedural software design? (functions, code structuring)
by kabeldag (Hermit) on Jun 23, 2008 at 04:29 UTC

    Thinking about what steps need to be taken before starting out is always good (planning/thinking/brainstorming). Identifying input and output of each step, then modularizing that step into a sub should eliminate a few globals. If the entire process of getting from A to B is a bit of a long one (A to Z), then move your modularized subs into a separate library (Perl Module). Once you have modularized everything, then the kick off / entry point of the journey should be become apparent, though it should have been apparent at the start.

    Somebody else will no doubt provide a good list of points in regard to the overall approach. All I can say is to modularize, modularize and modularize...(:-\)

Re: What are the core points of good procedural software design? (functions, code structuring)
by BrowserUk (Patriarch) on Jun 23, 2008 at 15:14 UTC
    I realize that functions modifying globals is not a terribly good idea, but am not sure how to escape it.

    It's gonna sound trite, but the easiest way to escape it, is to not do it! Because once you have started doing it (in any particular script), it becomes really hard to undo it.

    For new (standalone) code, there are a couple of simple steps that will help you avoid it:

    1. Put all your subroutines at the top of your script.

      In particular, define all your subroutines before you declare any variables.

      By placing your 'main' code at the bottom of the script, it prevents you from using non-locally scoped variables within them and forces you to pass any information you require within them, via parameters.

    2. Never declare variables 'en-masse'.

      Only declare variables when, and where you need them.

      By declaring your variables where you first use them, you will naturally limit their scope. And limiting scope is the very essence of structured programming.

    When it comes to trying to sort out existing spaghetti code, there are two schools of thought:

    • Make minimal changes commensurate with achieving your goal.
    • Re-write from scratch.

    Which of these is applicable will depend very much upon your immediate goals. There is no one right answer.

    If your goal is to add a new smallish feature to an existing, working script in a hurry, the minimal changes route is often the best way to go. That includes, unfortunately, perpetuating existing bad practices against your better judgement. Trying to re-write or even just re-structure existing code to accommodate new features can lead to disastrous consequences, especially if time pressures are involved.

    It can often save time in the long run, to go with the flow and add the new feature in the existing style in order to meet the immediate goal, and leave restructuring until things are stable. Restructuring should always be tackled from a position of a known and stable base. That is, don't try to both re-structure and add new features at the same time. You'll never be sure that whether bugs and deficiencies you encounter are a result of the restructuring, or the new features. And it will usually be some subtle, and untraceable, interaction of the two.

    By restructuring the existing, working code, regression testing becomes a usually relatively simple process of substituting the restructured script for the original in a realistic scenario, preferably using inputs and outputs derived from live operations, and checking that they do the same thing. This is usually far quicker than trying to generate comprehensive test scripts to exercise all aspects of the existing script, which can be an extremely difficult and costly process. Made even more costly as much of it will need to be discarded and/or re-written to suite the new script if it is extensively re-structured.

    It's also the case that trying to re-structure existing code, using existing unit test scripts for verification often leads you into making the same structural choices, and mistakes as were made for the script you are re-structuring. Hence, defeating some or all of the purpose of the exercise. By limiting regression testing to matching the overall output for one or more sets of realistic inputs, you free yourself from the design constraints and mistakes of the original authors.

    Only once the re-structured script (or module) has been verified against what it is to replace, and preferably exercised in-situ for a while in the live system, should you consider adding new features to it. Of course, if the re-structuring is prompted by the need for a new feature and the difficulty (perceived or actual), of adding it to the existing code, then this implies deferring that addition until the re-structuring is done and dusted. And that can be politically difficult to propose. But in the long term, separating the two steps will nearly always prove less costly.

    However, there is nothing to stop you from acknowledging the requirements of the new feature(s) during your re-structuring exercise, and catering for them in advance.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: What are the core points of good procedural software design? (functions, code structuring)
by perreal (Monk) on Jun 23, 2008 at 10:44 UTC
    Well lately I'm enclosing possible global variables in a block with the functions that use them:
    { my $var; sub func { $var=1; func2($var); } }
    In most cases this approach works for me.
      or in other words, you're turning global variables into local ones (or in perl terms, lexical vars), and subs into closures.

      That can work fine in many cases, but you should be aware of that distinction.

      Update: slight re-wording.

Re: What are the core points of good procedural software design? (functions, code structuring)
by jimX11 (Friar) on Jun 23, 2008 at 13:31 UTC

    Code maintenance and good design are different tasks.

    New maintenance programmers often want elegant code, leading to the "let's just redo it all" idea. That's often much harder than just getting the new feature in place.

    When I add a feature to existing code I often use the sprout object (or method or function) technique:

    1. read the legacy code and find where the changes will go
    2. add tests around that area of the legacy code, so that you'll more easily know if you broke something (with as little change to the legacy code as possible). This also allows you to know what variables are involved. This is the hardest part.
    3. develop the new code, using test driven methods. You feed your new code the values from the parameters found in the 2nd step. Your new code is of course elegant and you develop it independently of the legacy code.
    4. alter the legacy code so that it uses the new code. Cross your fingers and run the tests from the second step.

    Then you have some tests for the legacy code and you have new elegant code with tests too.
Re: What are the core points of good procedural software design? (functions, code structuring)
by educated_foo (Vicar) on Jun 23, 2008 at 15:04 UTC
    IMHO they are the same as for any other software design fad: On a small scale, keep functions to about a page (80x24), and make them do one thing. On a large scale, organize things into modules corresponding to tasks. In your case, I would look at breaking a chunk of the script off into a separate module, which doesn't have to be in "I can has objects" style. If you later need more than one simultaneous instance of that module, you can always objectify it then.

    Above all, don't get hung up on dogmas like "I need objects," "global variables are evil," etc. These are just vague guidelines, and are useless unless you understand why they are sometimes true.

      ++

      But I'd prefer s/sometimes/often/

      Careful with that hash Eugene.

Re: What are the core points of good procedural software design? (functions, code structuring)
by dwm042 (Priest) on Jun 23, 2008 at 18:08 UTC
    One of the simplest things you can do with a situation where you have a ton of related file scope variables is assemble them into a hash, then rewrite the functions to accept a hash reference.

    I'm maintaining a lot of code whose authors never figured hashes out, and there are a number of instances where "insolvable" problems go away when you group the data into a hash, and then use the hash as a way to abstract out the various bits and pieces of data and logic.

    If you can get this far, then you can decide if it's worth it to take one more step, and turn the hash into an object, and the functions into object methods.

    Update: fixed typos.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://693430]
Approved by GrandFather
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (2)
As of 2022-12-04 01:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?