Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Adding items to arrays: best approach?

by geertvc (Sexton)
on May 28, 2020 at 07:28 UTC ( [id://11117383]=perlquestion: print w/replies, xml ) Need Help??

geertvc has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

Being a beginning "Perl-learner", I want to do things the (most?) right/best way immediately.

I have a question about the right/best way to append items to an already existing array. This is my initial array:
$inputPathPrefix = "C:/Temp/xml"; @inputfiles = ( $inputPathPrefix . "/test1.xml", $inputPathPrefix . "/test2.xml", $inputPathPrefix . "/test3.xml" );
I would like to extend the array. Therefor, there are 2 ways of doing this AFAIK:

First method:
@inputfiles = ( @inputfiles, $inputPathPrefix . "/test4.xml", $inputPathPrefix . "/test5.xml", $inputPathPrefix . "/test6.xml", $inputPathPrefix . "/test7.xml" )
Second method:
push(@inputfiles, $inputPathPrefix . "/test4.xml"); push(@inputfiles, $inputPathPrefix . "/test5.xml"); push(@inputfiles, $inputPathPrefix . "/test6.xml"); push(@inputfiles, $inputPathPrefix . "/test7.xml");

Three questions:

1. What's the best/preferred/Perl-ish method?
2. Has the first method advantages over the second method or vice versa? What are the advantages, if any?
3. What's the habbit in case of initial array declaration and first method: put the comma's at the back or at the front?

Best,
--Geert

Replies are listed 'Best First'.
Re: Adding items to arrays: best approach? (updated)
by haukex (Archbishop) on May 28, 2020 at 07:51 UTC

    The Perlish way is TIMTOWTDI - both push and @array = (@array, ...) are fine.*

    Note that instead of prepending $inputPathPrefix to each value individually, it's probably nicer to use map.

    Also, I strongly recommend you don't manipulate filenames with plain string operations, and use modules like Path::Class or the core module File::Spec instead. (Update: I meant to make this same comment on your previous post, but didn't get around to it, sorry)

    use warnings; use strict; use Data::Dump; use File::Spec::Functions qw/catfile/; my $inputPathPrefix = "C:/Temp/xml"; my @inputfiles = map { catfile($inputPathPrefix, $_) } 'test1.xml', 'test2.xml', 'test3.xml'; push @inputfiles, map { catfile($inputPathPrefix, $_) } 'test4.xml', 'test5.xml', 'test6.xml', 'test7.xml'; dd @inputfiles; __END__ ( "C:/Temp/xml/test1.xml", "C:/Temp/xml/test2.xml", "C:/Temp/xml/test3.xml", "C:/Temp/xml/test4.xml", "C:/Temp/xml/test5.xml", "C:/Temp/xml/test6.xml", "C:/Temp/xml/test7.xml", )

    Update: Clarified first sentence.

    Update 2:

    What's the habbit in case of initial array declaration and first method: put the comma's at the back or at the front?

    Personally, at the back, that's also the most common that I see.

    Has the first method advantages over the second method or vice versa? What are the advantages, if any?

    * push is faster. But also, IMHO, TIMTOWTDI still applies as long as performance isn't an issue. In other words, if you're only doing this on small arrays once or twice in your code, it's more likely the performance hotspots will be in other places in your code and this amounts to a micro-optimization.

Re: Adding items to arrays: best approach?
by davido (Cardinal) on May 28, 2020 at 14:38 UTC

    Always let your commas trail. And I prefer having a comma even on the last element because then tomorrow when I add another element it's just a new line; I don't have to add a comma to the previous last line; my git diff is cleaner, and I'm less likely to forget.

    If you are concerned about efficiency please understand that push behaves similarly to the C++ vector push_back method, which has an amortized time complexity of O(1). This is because as more space is allocated, the man behind the curtains asks for more than is needed immediately, predicting that more will be needed eventually. Here's a made up example:

    @a = qw(a b c d e); # Array has 5 elements, with room for 10; push(@a, qw(f g h i j)); # Array has ten elements, with room for 10. push(@a, qw(k l m n o)); # Array has 15 elements with room for 30. New + memory had to be allocated, and array had to be moved into new, larg +er memory space. push(@a, qw(p q r s t)); # Array has 20 elements with room for 30. push(@a, qw(u v w x y)); # Array has 25 elements with room for 30. push(@a, qw(z 1 2 3 4)); # Array has 30 elements with room for 30. push(@a, qw(5 6 7 8 9)); # Array has to be moved to new memory. Array +has 35 elements with room for 70.

    The formula for calculating how much extra space to allocate is not the same as what I demonstrated here. I just wanted to demonstrate that Perl allocates memory in chunks that allow for expansion, and only moves the array to a new, larger block when that future expansion memory in the current block has depleted.

    Because the need to copy the entire array over to a larger memory block as the array expands only happens infrequently, despite the copy-over process being linear, the amortized time is constant; the copy-over fades into background noise.


    Dave

Re: Adding items to arrays: best approach?
by hippo (Bishop) on May 28, 2020 at 08:22 UTC
    What's the best/preferred/Perl-ish method?

    That will be subjective. However, I'd have to go with push for several reasons perhaps the strongest of which is that this is precisely what it is designed to do: append entries to arrays. If it were no better than the first approach it would not be in the language.

    Has the first method advantages over the second method or vice versa? What are the advantages, if any?

    I see some advantages to the second method:

    1. There is no need to specify the name of the array twice therefore the programmer is less likely to make a mistake such as extracting data from the wrong array in the RHS.
    2. The first method does a lot more stuff: extracting data from the array into a list, appending other values and then overwriting the array with the resulting list. This sounds much less efficient than a simple push which (as a guess) I would expect literally to append to the array in situ. For large arrays or many ops this might become an important consideration.
    3. push will also save you from mistakes like attempting to push to a scalar.

    That said, I would definitely combine your 4 pushes into one and probably use a glob or map to avoid all the repetition.

    What's the habbit in case of initial array declaration and first method: put the comma's at the back or at the front?

    Not sure what you mean here, sorry.

      Hello hippo,

      Not sure what you mean here, sorry.

      Pls. allow me to clarify:

      Chose between
      $inputPathPrefix . "/test4.xml",
      or
      , $inputPathPrefix . "/test4.xml"
      when adding items to an array (or something similar).

      But seen the majority of the people that answered (really big thank you to all!) I think it's not that relevant anymore, since most - if not all - are in favor of the push way of working.

      Nevertheless, it would be interesting to hear from wintered Perl programmers their preferred way, although I realize this might be a matter of personal taste and flavor and as such, be quite subjective...

      Best,
      --Geert
        Perl accepts commas before the closing parenthesis/bracket/brace without inserting a (dummy) item, therefore I prefer the comma "after", so that all entries are equal (neither the first nor the last differ).

        This is indeed a purely stylistic concern. Personally, when a line needs to be split around a binary operator I would split after the operator as I think this reinforces the point that the statement isn't finished once the end of the line has been reached. I do this in any language, not just Perl. It was probably taught to me decades ago as a minor technique in aiding clarity and has stayed with me ever since.

        With postfix control operators (if, unless, for etc.) it's the other way round as they are more tightly bound to the subsequent clauses (again, subjective). So I might write:

        print 'Very looooooooooooooooooooooooooooooooooooooooooooooooooooooooo +ooooong non-interpolated string costing $many ', "and another clause from prog $0\n"; # but die 'Loooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo +ooooooong exception message' if $OMG_ZOMBIES > $max_zombies_allowable;
Re: Adding items to arrays: best approach?
by soonix (Canon) on May 28, 2020 at 08:38 UTC
    In the final program or script, you probably won't list all those file names, but you'll read them e.g. from a config file or even calculate them, which more or less will lead to a loop, like this
    use 5.011; use warnings; use Data::Dumper; my $inputPathPrefix = "C:/Temp/xml"; my @inputfiles; for my $i (1..7) { push(@inputfiles, $inputPathPrefix . "/test${i}.xml"); } say Dumper(\@inputfiles);
    gives
    $VAR1 = [ 'C:/Temp/xml/test1.xml', 'C:/Temp/xml/test2.xml', 'C:/Temp/xml/test3.xml', 'C:/Temp/xml/test4.xml', 'C:/Temp/xml/test5.xml', 'C:/Temp/xml/test6.xml', 'C:/Temp/xml/test7.xml' ];
    In a loop, push is more or less the only method.

    BTW, you can also push multiple values in one go, see push

Re: Adding items to arrays: best approach?
by leszekdubiel (Scribe) on May 28, 2020 at 08:15 UTC

    You can use "push" function, to push whole list to the end of array. See this example:

    # perl -e 'use Data::Dumper; my @a = qw(1 2 3); push @a, map { $_ * 2 +} qw(4 5 6); print Dumper(\@a); ' $VAR1 = [ '1', '2', '3', 8, 10, 12 ];

    Now commpare your problem:

    # perl -e ' my @array1 = qw(one two three); my @array2 = qw(four seven nine); my $prefix = "C:/Temp/somedir"; my @fulllist = map { "$prefix/$_" } @array1; push @fulllist, map { "$prefix/$_" } @array2; use Data::Dumper; print Dumper(\@fulllist); ' $VAR1 = [ 'C:/Temp/somedir/one', 'C:/Temp/somedir/two', 'C:/Temp/somedir/three', 'C:/Temp/somedir/four', 'C:/Temp/somedir/seven', 'C:/Temp/somedir/nine' ];

      Or, getting back to the OP's original request (but using interpolation),

      push @inputfiles, "$inputPathPrefix/test4.xml", "$inputPathPrefix/test5.xml", "$inputPathPrefix/test6.xml", "$inputPathPrefix/test7.xml", ;

      This is my personal favorite way to append to an array.

      I do not normally build file names this way, preferring something like the old File::Path. Though, as others have pointed out, there are sexier way to do things these days.

Re: Adding items to arrays: best approach?
by perlfan (Vicar) on May 28, 2020 at 07:46 UTC
    meta quote ops will help a lot, especially qw.

    I'd definitely prefer push (inverse of pop) for appending arrays and it's friend for prepending arrays, unshift (inverse of shift) .

    Slight improvement to your last example:

    push(@inputfiles, qq{$inputPathPrefix/test4.xml}); push(@inputfiles, qq{$inputPathPrefix/test5.xml}); push(@inputfiles, qq{$inputPathPrefix/test6.xml}); push(@inputfiles, qq{$inputPathPrefix/test7.xml});
    Better (IMO):
    my @files = (qw/test4.xml test5.xml test6.xml test7.xml/); # assumin +g you know $inputPathPrefix will be the same for all of them push @inputfiles, @files;
    The rub with qw// is that there's no variable interpolation. This means the following is treated literally,
    my @files = (qw{$inputPathPrefix/test4.xml $inputPathPrefix/test5.xm +l $inputPathPrefix/test6.xml $inputPathPrefix/test7.xml});
    is equivalent to:
    my @files = (q{$inputPathPrefix/test4.xml}, q{$inputPathPrefix/test5 +.xml}, q{$inputPathPrefix/test6.xml}, q{$inputPathPrefix/test7.xml});
    There is probably a lot of ways to do this idiomatically, the above is just an example of where I'd tend to go with it.
Re: Adding items to arrays: best approach?
by AnomalousMonk (Archbishop) on May 28, 2020 at 16:23 UTC

    Further to davido's discussion of the Perl interpreter's efforts to manage memory for your arrays (and hashes and everything else):
    geertvc: Note that if you have an array that starts small and that will grow to a very large size that you know (or can estimate) in advance, it's possible and occasionally advantageous to "pre-grow" the array by assignment to the array's max index via the  $# array dereference operator (if that's the correct terminology).

    c:\@Work\Perl\monks>perl -wMstrict -le "my @ra = qw(a b c); print 'A: array max index: ', $#ra; ;; $#ra += 1_000_000; print 'B: array maximum index: ', $#ra; print 'B: number of array elements: ', scalar @ra; " A: array max index: 2 B: array maximum index: 1000002 B: number of array elements: 1000003
    See perldata regarding $#. In most cases, however, it's best to let the interpreter manage these things on its own.


    Give a man a fish:  <%-{-{-{-<

Re: Adding items to arrays: best approach?
by kcott (Archbishop) on May 29, 2020 at 09:03 UTC

    G'day Geert,

    ++ You're asking good questions. :-)

    You've received a lot of good answers, so I won't delve into those again. However, I did want to say something about abstraction.

    Whenever you have code with many lines that are almost identical, consider abstracting that code into one or more subroutines. Here's an example:

    my @inputpaths = @{test_paths([1..3])}; push @inputpaths, @{test_paths([4..7])}; sub test_paths { my ($ids) = @_; state $prefix = 'C:/Temp/xml/'; return [ map "${prefix}test${_}.xml", @$ids ]; }

    That code is intended to give you an idea of abstraction based on what you originally posted.

    In other scenarios, you might want to pass $prefix as an argument to give you more flexibility; and, unless all your filenames look like testN.xml, you'd probably want to do something different than passing a list of numbers (to replace N). I'd also recommend you heed the advice of ++haukex regarding the use of modules such as File::Spec.

    — Ken

      Hello Ken,

      Thanks for your appreciation. I'm trying to be as clear as possible when asking questions. And in the meantime, I'm also learning the syntax to be used in messages, like showing the real name and also the username (or how do you call it with PerlMonks?): see the start of this reply... :-)

      I'm definitely taking all the great advice given by many people here into account. I've already changed my script in many ways. And indeed, the testx was just an example I gave. The real files are not incremental by numbers but totally different names in totally different (sub)directories. I just wanted to point out the difference between how to extend arrays.

      I've also seen Perl has a zillion modules/libraries that do a lot of the hard work behind the scenes. One of them is indeed File::Spec but there are more, lots more...

      I'm still learning (and eager to learn) the language of Perl and I'm realizing I still have a long way to go. But by practicing it step by step I'm convinced I'll be able to do whatever I want it to do (this might be an "overstatement"... :-)).

      Maybe little bit saying why I'm using Perl for the moment: automatic Java code generation. I'm using XML files as input and using the amazing Text::Xslate and XML::LibXML::Simple Perl modules together with appropriate code templates to generate that code. It's a bless!!!

      Best,
      --Geert

        The real files are not incremental by numbers but totally different names in totally different (sub)directories.

        That's the problem with providing example data which has an inherent pattern that doesn't match your production data. It suggests to those wishing to help that the pattern matters and therefore they are likely to take this into account. If you don't want to give an excerpt from the production data for some reason (security, embarrassment, NDA, ...) it would be better to put something purely random (but still representative) in its place. For a short dataset your favourite metasyntactic variables would suffice.

        "... just an example I gave. The real files are not ..."

        The last two paragraphs that I wrote were intended to convey the idea that I understood this.

        "I'm still learning (and eager to learn) the language of Perl ..."

        Well, you've certainly come to the right place. :-)

        "... and I'm realizing I still have a long way to go."

        It shouldn't be too hard to get a handle on the basics. Take a look at perlintro: as well as having those "basics" on that page, it is also peppered with links to additional information and advanced topics (I'd suggest following these as the need arises rather than trying to learn it all at once).

        "... automatic [whatever] code generation ..."

        Perl is an excellent language for this type of task. You can also generate Perl code; here's a rough example of one way you could do this:

        my $generated_function = generate_function(...); $generated_function->(); sub generate_function { ... return sub { ... }; }

        You might also find "Template Toolkit" is worth a look.

        — Ken

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11117383]
Approved by marto
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (3)
As of 2024-04-24 14:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found