http://qs321.pair.com?node_id=594413

(I prefer /msges to replies, whenever practical.)

(RFC) Arrays: A Tutorial/Reference

Array is a type of Perl variable. An array variable is an ordered collection of any number (zero or more) of elements. Each element in an array has an index which is a non-negative integer. Perl arrays are (like nearly everything else in the language) dynamic:

  1. they grow as necessary, without any need for explicit memory management;
  2. they are heterogeneous, or generic, which is to say, an array doesn't know or enforce the type of its elements.

The values of Perl array elements can only be scalars. This may sound like a limitation, if you think of scalars only as comprising numbers and strings; but since scalars can be references to the compound variable types (array and hash), arbitrarily complex data structures are possible. Other scalar types, such as filehandles and the special undef value, are also naturally allowed.

So, given a data structure like that, what kinds of things would you want to do with it? That is, what operations should be able to act on it? You might conceive different sets of operations, or interfaces, depending on how you expect to use an array in your program:

  1. as a monolithic whole;
  2. as a stack or queue — that is, only working with its ends;
  3. as a random access table of scalars — that is, working with all of its elemental parts.
Perl arrays can be used in all those ways, and more.

Here are the fundamental Perl array operations:

  • Initialize
  • Clear
  • Get count of elements
  • Get the highest index
  • Get list of element values
  • Add new elements at the end
  • Remove an element from the end
  • Adds new elements at the beginning
  • Remove an element from the beginning
  • Access one element at an arbitrary index
  • Access multiple elements at arbitrary indices
  • Insert/Delete/Replace items in the middle of an array

This tutorial focuses specifically on the array variable type. There are many things you can do in Perl with lists which will also work on arrays; for example, you can iterate over their contents using foreach. Those things are not discussed here. Also: What is the difference between a list and an array?

Initialize an array

Simple assignment does the job:

@array = ( 1, 2, 3 ); @array = function_generating_a_list(); @array = @another_array;
The key points are that
  1. the assignment to an array gives list context to the right hand side;
  2. the right side can be any expression which results in a list of zero or more scalar values.
The values are inserted in the array in the same order as they occur in the list, beginning with array index zero. For example, after executing
@array = ( 'a', 'b', 'c' );
element 0 will contain 'a', element 1 will contain 'b', and so on.

Whenever an array is assigned to en masse like this, any contents it may have had before the assignment are removed!

Clear an array

Simply assign a zero-length list:

@array = ();
Assigning a value such as undef, 0, or '' will not work! Rather, it will leave the array containing one element, with that one value. That is,
@array = 0; # and @array = ( 0 );
are functionally identical.
Note that omitting the parentheses is bad style, if your goal is actually to assign the one-element list (0) to the array.

Get count of elements

To get the "length" or "size" of an array, simply use it in a scalar context. For example, you can "assign" the array to a scalar variable:

$count = @array;
and the scalar variable will afterwards contain the count of elements in the array. Other scalar contexts work as well:
print "# Elements: " . @array . "\n";
(Yes, print gives its arguments list context, but the dot (string concatenation) operator takes precedence.)

You can always force scalar context on an array by using the function named scalar:

print "# Elements: ", scalar(@array), "\n";
Note that this is a get-only property; you cannot change the length of the array by assigning a scalar to the array variable. For example, @array=0 does not empty the array (as stated in the previous section, Clear an array).

Get the highest index

Often, you want to know what is the highest index in an array — that is, the index of its last element. Perl provides a special syntax for obtaining this value:

$highest_index = $#array;
This is useful, for example, when you want to create a list of all the indices in an array:
foreach ( 0 .. $#array ) { # $_ is set to each index number, in turn, from first (0) to last ($ +#array) }

Unlike scalar(@array), $#array is a settable property. When you assign to an array's $#array form, you cause its length (number of elements) to grow or shrink accordingly. If the length increases, the new elements will be uninitialized (that is, they'll be undef). If the length decreases, elements will be dropped from the end.

Clear an array - Round 2

Given that $#array is assignable, you can clear an array by assigning -1 to its $#array form. (Why -1? Well, that's what you see in $#array if @array is empty.) Generally, this is not considered good style, but it's acceptable.

Another way to clear an array is undef @array. This technique should be used with caution, because it frees up some memory used internally to hold the elements. In most cases, this isn't worth the processing time. About the only situation in which you'd want to do this is if @array has a huge number of elements, and @array will be re-used after being cleared but will not hold a huge number of elements again.

Beware: As mentioned above in Clear an array, assigning @array = undef does not clear an array. Unlike the case with scalars, @a=undef and undef(@a) are not equivalent!

Get list of element values

To get the entire list of values stored in an array at any given time, simply use it in a list context:

print "Here are your things: ", @array, "\n";
This is useful for iterating over the list of values stored in an array, one at a time:
foreach ( @array ) { ...
This works because in the foreach control construct, the stuff inside the parentheses is expected to be a list — or, more precisely, an expression which will be evaluated in list context and is expected to result in a list of (zero or more) scalar values.

Quiz: What's the difference between these two lines of code:

$x = @array; @x = @array;

Answer:

Remove an element from the end

The function to remove a single element from the end of an array is pop. Given the code:

@array = ( 'a', 'b', 'c' ); $x = pop @array;
$x will contain 'c' and @array will be left with two elements, 'a' and 'b'.

Note: By "end", we mean the end of the array with the highest index.

Add new elements at the end

Use the push function to add a number of (scalar) values to the end of an array:

push @array, 8, 10 .. 15;

Remove an element from the beginning

The shift function removes one value from the beginning of the array. That is, it removes (and returns) the value in element zero, and shifts all the rest of the elements down one, with the effect that the number of elements is decreased by one. Given the code:

@array = ( 'a', 'b', 'c' ); $x = shift @array;
$x will contain 'a' and @array will be left with two elements, 'b' and 'c'. (You can see that shift is just like pop, but acts on the other end of the array.)

Add new elements at the beginning

In a similarly analogous way, unshift acts on the beginning of the array as push acts on the end. Given:

@array = ( 1, 2 ); unshift @array, 'y', 'z';
@array will contain ( 'y', 'z', 1, 2 )

Access one element at an arbitrary index

The first element of an array is accessed at index 0:

$first_elem = $array[0];
Why the $ sigil? Remember that the elements of an array can only be scalar values. The $ makes sense here because we are accessing a single, scalar element out of the array. The thing inside the square brackets does not have to be an integer literal; it can be any expression which results in a number. (If the resulting number is not an integer, it will be truncated to an integer (that is, rounded toward zero).

Change the value of the last element:

$array[ $#array ] += 5;

Access multiple elements at arbitrary indices

By analogy, if you want to access multiple elements at once, you would use the @ sigil instead of the $. In addition, you would provide a list of index values within the square brackets, rather than just one.

( $first, $third, $fifth ) = @array[0,2,4];
Jargon alert: this syntax for accessing multiple elements of an array at once is called an array slice.

Never forget that with an array slice the index expression is a list: it will be evaluated in list context, and can return any number (including zero) of index numbers. However many numbers are in the list of indices, that's how many elements will be included in the slice.

Beware, though: an array slice may look like an array, due to the @ sigil, but it is not. For example,

$n = @array[0..$#array];
will not yield the number of items in the slice!

Set the second, third, and fourth elements in an array:

@array[1..3] = ( 'x', 'y', 'z' );

Sidebar: More about indices

We said earlier that array indices are non-negative integers. While this is strictly true at some level, perl conveniently lets you index elements from the end of the array using negative indices. -1 refers to the last element, -2 to the next-to-last element, and so on. To oversimplify a bit, -1 acts like an alias for $#array... but only in the context of indexing @array!

So the following are equivalent:

$array[ -1 ] $array[ $#array ]
But beware:
@array[ 0 .. $#array ]
can not be written as:
@array[ 0 .. -1 ]
because in this situation the -1 is an argument of the .. range operator, which has no idea what "highest index number" is actually wanted.

Insert/Delete/Replace items in the middle of an array

It is possible to insert items into the middle of an array and remove items from the middle of an array. The function which enables this is called splice. It can insert items anywhere in an array (including the ends), and it can remove (and return) any sub-sequence of items from an array. In fact, it can do both of these at once: remove some sub-sequence of items and put another list of values in their place. splice always returns the list of removed values, if any.

The second argument of splice is an array index, and as such, everything we've said about indices applies to it.

The queue-like array functions could have been implemented in terms of splice, as follows:

unshift @a, @b; # could be written as splice @a, 0, 0, @b;
push @a, @b; # could be written as splice @a, $#a+1, 0, @b; # we have to index to a position PAST the end + of array!
$b = shift @a; # could be written as $b = splice @a, 0, 1;
$b = pop @a; # could be written as $b = splice @a, -1, 1;
(Beware that in scalar context splice returns the last of the list of values removed; shift and pop always return the one value removed.)

Remove 3 items, beginning with the 3rd:

@b = splice @a, 2, 3;
Insert some new values after the 3rd, without deleting any:
splice @a, 2, 0, @b;
Replace the 4th and 5th items with three other values:
splice @a, # array to modify 3, # starting with 4th item 2, # remove (replace) two items 'x', 'y', 'z'; # arbitrary list of new values to insert
And while we're at it: Clear an array - Round 3:
@a = (); # could be written as splice @a, 0;

Any Questions?

The Perl FAQ has a section on Arrays.

Related Resources


What about wantarray?

Despite its name, wantarray has nothing to do with arrays. It is misnamed. It should have been named something like is_list_context. It is used inside subroutines to detect whether the sub is being called in list, scalar, or void context. It returns true, false, and undef in those cases, respectively.


Other possible topics:

  • tieing arrays; the Tie::Array module
  • delete and how it doesn't work on arrays
  • exists and how it DOES work on arrays
  • Various related Perl FAQ entries
  • Array-related modules, such as those in the Array:: family
  • Traps/gotchas, such as deleting from an array while iterating over it
  • multidimensional arrays

PS: This RFC has been converted into an actual tutorial, Arrays: A Tutorial/Reference.