Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Index or iterate - your choice

by GrandFather (Saint)
on Jan 27, 2021 at 02:57 UTC ( [id://11127505]=perlquestion: print w/replies, xml ) Need Help??

GrandFather has asked for the wisdom of the Perl Monks concerning the following question:

I'm working on a module for parsing and extracting data out of ELF (Executable and Linkable Format) files which I intend to put on CPAN shortly. An ELF file contains a couple of important tables whose entries describe parts of the file (segments and sections). I have an object which holds on to objects for each of the tables. The tables aren't very interesting, but the table entries are so I want to provide access to the individual entries. Current options look like this:

use warnings; use strict; use ELF::Reader; my $elfPath = $ARGV[1]; my $elfFile = ELF::Reader->new(filePath => $elfPath); # Using index on the segments object my $segments = $elfFile->GetSegments(); for my $segIndex (0 .. $elfFile->SegmentCount() - 1) { next if !${$segments}[$segIndex]->FileSize(); print ${$segments}[$segIndex]->Describe(head => 16, tail => 16, wi +dth => 32) }; print "\n"; # Using an iterator my $nextSeg = $elfFile->GetSegmentIter(); while (my $segment = $nextSeg->()) { next if !$segment->FileSize(); print $segment->Describe(head => 16, tail => 16, width => 32) }

Prints:

Type: PT_LOAD Virtual load address: 0x00000000 Memory image size: 0x000129f8 Segment alignment: 0x00000008 00000000: 20040000 00010019 00010069 00010081 000129e8: 000129c8 60200000 000129d0 20000000 Type: PT_LOAD Virtual load address: 0x00000000 Memory image size: 0x000129f8 Segment alignment: 0x00000008 00000000: 20040000 00010019 00010069 00010081 000129e8: 000129c8 60200000 000129d0 20000000

The question is: should I provide both access techniques, or just one (which?) or something else? There will be a number of different classes that provide similar acessors.

Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond

Replies are listed 'Best First'.
Re: Index or iterate - your choice
by Discipulus (Canon) on Jan 27, 2021 at 07:54 UTC
    Hello GrandFather,

    my 2 cents as I dont know at all the matter of ELF files, so a rubber duck service from me.

    Why index? The GetSegments returns an arrayref but then you access it by index using another call to SegmentCount and I expected something like foreach my $segment ( @{$elfFile->GetSegments()} ) ..

    If the data is already stored inside the object, so is not huge data, personally I find returning them all via GetSegments simpler to understand and use.

    If, by other hand, the data is bigger and you parse it live the iterator make much more sense.

    So for me if the data will always fit already inside the object then provide it in a whole via GetSegments and stop.Only if the data can be bigger and you dont precompute it in advance the iterator make sense as alternative.

    Basically the problem can be reduced to: @lines = <$handle> as opposite of while (<$handle>) with the second more idiomatic and memory safe, but if you have @lines already filled the iterator makes little sense for me.

    L*

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

      That helps. Thanks teddy bear (or rubber duck, as the case may be). I was overthinking the plumbing so the "index" variant at least can be much simpler by returning a list as implied by your comments and stated explicitly by tobyink. In a typical ELF file the number of entries is small and the size of the entries is small and fixed so there is no issue returning a list.

      Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
Re: Index or iterate - your choice
by tobyink (Canon) on Jan 27, 2021 at 19:46 UTC

    Generally speaking, if you want to provide a list of things, just return a list. People can assign it to an array and loop through it, access a particular element by index, etc.

    The exceptions would be where there are so many list items or the items are so big, that they would use too much memory to store in an array, so accessing them one by one is better; or if generating each item is relatively expensive (in terms of time, CPU, network activity, etc) so if you can avoid fetching the entire list, that is preferable. In these cases use an iterator instead.

      where there are so many list items or the items are so big, that they would use too much memory to store in an array
      Perhaps tie-ing is another good option for this. Then you can get all the semantics of an array for free! (from the user perspective at least)

      -Thomas
      "Excuse me for butting in, but I'm interrupt-driven..."

        I did consider including that, but by providing a tied array, you're kind of encouraging end users to treat it as any old array.

        So they might not consider that doing something like:

        foreach my $item ( reverse @array ) { ... }

        Is going to impact performance way more than they might expect.

        If it's exposed as an iterator, then it encourages them to access items in a one-at-a-time sequential fashion. They still can slurp it all into an array, but they can't blame you when that eats up all their memory.

        Not saying it's never a good idea, but situations where it is aren't going to be that common.

      After I posted it I wondered if my question was too trivial to post. But it turned out to be a great sanity check! I'm now returning a list as suggested. Thank you, Discipulus and jdporter for shining a spot light on this!

      Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
Re: Index or iterate - your choice
by bliako (Monsignor) on Jan 27, 2021 at 10:18 UTC

    I am not acquainted with the ELF and what functionality a potential user may desire from your module. So I will risk some general points.

    Filtering (the ELF segments) can be important and could even expand to checks other than size>0. In both get() methods you do this filtering by hand, inside the loop. If indeed there is scope for filtering useless segments and selecting "interesting" (hmmmm!!) segments then I would re-pose the problem. It looks to me that such a functionality would be very appealing. But is the array or the iterator suited for this best? Usually when filtering, one expects the result collection to be of the same type as the input, e.g. an array or iterator (Edit: and even offer in-place editing). Neither looks suitable though for filtering! Splicing an array? Linkedlist?

    There is also the question of passing the segments, array or iterator, to another sub/module for further processing, filtering, profiling etc. I would think an arrayref is the least common denominator here. Unless that other module is yours and you control the API.

    But there is also the problem of creating new iterators to return back after filtering and selecting. Possibly pipelined. Would (not-)garbage-collecting the iterators be a problem?

    As I said I am ignorant about ELF, but perhaps it makes sense to store the segments in a linkedlist/graph/tree? Perhaps for adding/removing a new segment and then writing out the ELF?

      Nice! I hadn't thought of filtering returned results. To me the iterator feels like a pretty natural way to do that. As it stands the module is a reader so the semantics for the collection is read only. However changing the module to an ELF manager is an attractive thought. It impinges on ELF::Writer's solution space, but the APIs for my module and NERDVANA's module are quite different so maybe creating a "one size fits all" module makes sense?

      Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
Re: Index or iterate - your choice
by jdporter (Paladin) on Jan 27, 2021 at 14:29 UTC

    It looks like GetSegments returns an array ref. If so, then SegmentCount is superfluous, no?

    my $segments = $elfFile->GetSegments(); for my $segIndex (0 .. #$segments) { $segments->[$segIndex]->FileSize() or next; print $segments->[$segIndex]->Describe(head => 16, tail => 16, wid +th => 32) };
    But even the index is unnecessary (unless you actually need the number for some reason):
    print $_->Describe(head => 16, tail => 16, width => 32) for grep $_->FileSize, @$segments;

    If GetSegments doesn't in fact return an array ref, then make a method that does. :-) It could even be a tied array for more perlishness.

      Actually GetSegments() was returning the internal ProgramHeader object which had an overload for @{}. But that was overthinking the plumbing so I've changed it to return a list.

      Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
Re: Index or iterate - your choice
by jcb (Parson) on Jan 28, 2021 at 02:19 UTC

    ELF structural tables tend to be relatively small, so simply returning a list (as other monks have suggested and you seem to have chosen) is probably the best solution. Further, the records themselves (excluding the actual contents) are of very limited and finite size.

    I would suggest "splitting the difference" and returning a list of objects that describe the table entries fully and carry internal file references for the actual data. Producing a tied filehandle "on demand" that reads the extent for the actual data is not difficult; you need only remember the position and length and enforce the artificial EOF.

      As mentioned above the various replies have stimulated me into returning lists for segments and sections which, as you point out are small in number and small in size. However I'm going to provide an iterator for returning the contents of segments and sections. Parameters to the iterator generator will let me set the maximum size of blobs returned and allow specifying selected portions of the blocks to be returned. That fits nicely with common use patterns where fixed size blocks are consumed by processes such as flash programmers and debuggers. That also helps with code that might write files for consumption by flash programmers and debuggers etc.

      Methods that return lists or iterators can both benefit from providing filtering so I'll roll that in as an option too. Any other kitchen sinks I should add?

      Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond

        I would still advocate using a tied filehandle for reading contents, since that is effectively a built-in iterator interface for byte streams.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11127505]
Approved by Athanasius
Front-paged by haukex
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (5)
As of 2024-04-24 01:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found