Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Tracking processing by returning objects?

by BazB (Priest)
on May 16, 2003 at 22:19 UTC ( [id://258778]=perlquestion: print w/replies, xml ) Need Help??

BazB has asked for the wisdom of the Perl Monks concerning the following question:

Howdy, Monks.

I'm in the process of putting together a couple of OO-based modules that wrap binaries and perform various sorts, subsets and munging operations on files.

The main problem I have is tracking all the files produced by each operation, and using that information to cut back on the amount of the information required by each method call.

In my quest to simplify (at least to the caller) the amount of parameters required I've tried the following:

  • Track the last file or files produced, and various bits of data relating to those files.
    Assume that the last file produced is the one to operate on for the next operation, unless told otherwise.
This breaks very quickly - the user might do things in a valid but wierd order, or the code will pick the wrong file.

  • Return the filenames of the output and ask for input filenames
Also not great - then we have to repeatedly recalculate supplimentary data about each file, and it could get really messy if we have lots of files produced for some reason.

Both ideas above don't handle the case where the underlying data is held in a number of files, because the code decided to split them up for parallel processing for example.
The user only needs to believe they have one logical file, not all the details underneath.

Anyway, after that rather lengthy introduction, how should I deal with this?

I though of returning some sort of Foo::IO object that contains all of the filenames that had been processed by a method, and all the information we might need. If the user wants to process the output from an early stage of the code, he just uses the old Foo::IO object.

General example:

my $munger=Munger->new( input => "data" ); # assume the original "data" # since there's no Foo::IO argument my $sort_result = $munger->sort(by => "date"); # Again, just use the input filename stored in # the main object - "data" my $mangle_result = $munger->mangle(); # Use sorted file, and know that it's sorted by date # Give it the sort result Foo::IO state object my $subset_result = $munger->subset( by => date, $sort_result ); # etc.

When I mentioned my problem in the CB, jwest suggested using a object that contained several file objects that could be used for this sort of problem.

So, my questions are:

  • Should I return some sort of Foo::IO/Foo::File object to encapsulate the files?
  • How should I handle the IO object in relation to the main class (Foo) and the IO/File class?
  • Is there some other way of handling state without real pain?
  • Am I approaching this is the right way?
jwest's idea seems to make a lot of sense, but I've not yet been able to think out how the main class that does all the munging would use these state objects, and how it would all fit together.

Hopefully I've explained my problem with sufficient detail.
Thanks in advance.

BazB


If the information in this post is inaccurate, or just plain wrong, don't just downvote - please post explaining what's wrong.
That way everyone learns.

Replies are listed 'Best First'.
Re: Tracking processing by returning objects?
by BrowserUk (Patriarch) on May 17, 2003 at 06:31 UTC

    Seems that Class Munger represents a set of data in a file or file(s) in some state. Once you have applied a Munger method to a instance of Munger, you have create a new file (or set of files) with a new set of data. This means that you have a new instance of Class Munger.

    So why not create a new instance each time you munge anything.

    my $mungable = Munge->new( file=>'data' ); my $sorted = $mungable->sort( by=>size ); my $mangled = $mungable->mangle(); my $subset_of_mangled = $mangled->subset( some=>'criteria' ); my $subset_of_sorted = $sorted->subset( some=>'criteria' ); my $subset_of_orig = $mungable->subset( some=>'criteria' );

    I can't help thinking that if you return an object representing the results of one set of manipulations performed on one set of data, and the user passes this back to you via a call to manipulate a completly different set of data, you have an unresolvable problem?


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
      So why not create a new instance each time you munge anything?

      That's the approach I'd take. Each instance becomes a "result cache", with DESTROY performing cleanup where necessary.

      This approach has a lot in common with functional programming: you're composing a result out of functions. The complication is that a side-effect of each function is a file-based cache that need to be managed.

        I used the same technique for an image editor which had a similar need for caching.

        Each operation that was applied to an image, produced a new image that was the results of the previous manipulation. The buffers for the images were allocated from a built-in virtual memory manager that used an LRU caching mechanism.

        The really nice thing about the technique is it allows you to try two of three different enhancement techniques on the base image and compare them side by side before choosing which one to commit to. Also has the added bonus of making Undo a simple free current and back up to the previous version.

        That whole thing was written in a macro assembler, but it was object oriented to the point where the syntax for adding a number from memory to a register was

        esi.load Object; eax.add Object.Number;

        Thankfully, it was someone elses job to write the virtual memory manager:)


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
      I can't help thinking that if you return an object representing the results of one set of manipulations performed on one set of data, and the user passes this back to you via a call to manipulate a completly different set of data, you have an unresolvable problem?
      I don't really see this a being a problem. This is no different from user passing in the wrong file name to a call.

      Some manipulations might require other stages to be performed first and that can be tested for.
      If the user passes the wrong object that passes all tests, there isn't much I can do about it - I'd hope the user knows what order they want to do things in.

      I can't think of any neat way of taking this away from the user, and honestly don't see any need to do so.
      Simplifying the representation of the file(s) to process is probably all I need.


      If the information in this post is inaccurate, or just plain wrong, don't just downvote - please post explaining what's wrong.
      That way everyone learns.

        I guess my explaination left something to be desired, so here it is in slightly expanded pseudo-code.

        The basic idea is that rather than each operation on an instance of Foo returning an instance of some State class, it returns a new instance of itself. To perform any further processing on that state of the data, he invokes the operation directly on returned object.

        That way there are no seperate objects to be passed back to you wrongly.

        package Foo; sub new{ my ($class, filespec) = @_; # check for existance, other init stuff return bless {file=>$filespec }, $class; } sub sort{ $self = shift; system( 'sort', $self->$filespec, '>', "$self->filespec.sorted" ); return $self->new( "$self->$filespec.sorted" ); } sub munge{ $self = shift; system( 'munge', $self->$filespec, '>', "$self->filespec.munged" ) +; return $self->new( "$self->$filespec.munged" ); } 1; package main; # Start with a object created by the user. my $data = Foo->new( 'datafile' );; # Munge it two different ways and get two seperate objects back repres +enting the two new states my $sorted = $data->sort(); my $munged = $data->munged(); # Further processing on the new state is invoked directly on the new o +bject my $sorted'n'munged = $sorted->munge(); my $munged'n'sorted = $munged->sort(); # Or even pipeline them my $sorted'n'munged'n'sorted_again = $data->sort()->munge()->sort();

        Anyways, if I understood your question correctly, that's how I would do it.

        HTH. Sorry, if it doesn't.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller

      Thinking out loud which may or may not have to do w/ OP...

      I can't help but think of (C++-like) copy constructor and assignment op overload methods, given one instance of the class, to be used to return a spanking new instance of the same class. (That's what BrowserUk was trying to show, which is illustrated in the last "pipeline" statement.)

      Had i been unfamiliar w/ C++, i would myself have been checking for the reference type in the function to create an class instance. Now, writing two separate functions (in Perl) to create class instance does not seem bothersome despite the Perl code that i have seen so far. (BTW, i have yet to write an OO Perl module.)

      Getting back to the problem... If using jwest's idea, states could be stored in an array reference, in a hash, w/ file names (or file/IO objects, as appropriate) as the keys. Missing states could be omitted altogether or undef'd. (Needless to say this array ref. value could just as well be a hash ref.)

      Personally, i would try BrowerUK's idea first (complete w/ appropriate class cloning/copying method separate from class constructor) as it is simpler in approach. If really necessary, use a hash like above to keep a list of processes applied.

      If that works, then i would try the one-big-encapsulating-object idea. ("If the former works, why..." you say? Well, that's up to you, BazB.) Or, if skipping the simpler idea, i would search CPAN and Internet for prior art before embarking upon the latter approach for it seems crude (based on the desrciption alone).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://258778]
Approved by Limbic~Region
Front-paged by Limbic~Region
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (4)
As of 2024-03-29 05:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found