Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight

To glob or not to glob

by haukex (Archbishop)
on Jan 07, 2018 at 12:54 UTC ( [id://1206848] : perlmeditation . print w/replies, xml ) Need Help??

I am often torn when it comes to recommending Perl's glob (aka the <...> operator, when it isn't readline). On the one hand, it's built in and often shortens code, on the other, it has several caveats one should be aware of.

  1. glob does not list filenames beginning with a dot by default. For someone coming from a unixish shell, this might make perfect sense, but for someone coming from, for example, a readdir implementation, this might be surprising, and so it should at least be mentioned.

  2. Probably the biggest problem I see with glob is variables interpolated into the pattern. The default glob splits its argument on whitespace, which means that, for example, glob("$dir/*.log") is a problem when $dir is 'c:/program files/foo'. This can be avoided by doing use File::Glob ':bsd_glob'; (Update: except on Perls before v5.16, please see Tux's reply and below for alternatives), but that doesn't help with the next problem:

  3. If a variable interpolated into the pattern contains glob metacharacters (\[]{}*?~), this will cause unexpected results for anyone not aware of this list and expecting the characters to be taken literally.

  4. Lastly, File::Glob can override glob globally. If, for example, you use it in a module, and someone else overrides the default glob, then suddenly your code might not behave the way you expected.

  5. <update> glob in scalar context with a variable pattern also suffers from surprising behavior, as choroba pointed out in his reply - thank you! (additional info) </update>

That's why I think advising the use of glob without mentioning the caveats is potentially problematic. Perhaps one wants to create a backup of a folder and don't want to miss any files, say for example, .htaccess? And I also often see things like glob("$dir/*") going without comment.

Personally I find readdir, in combination with some of the functions from File::Spec, to be a decent, if slightly complicated, tool (one example). One better alternative among several is children from Path::Class::Dir, or methods from one of the other modules like Path::Tiny. (Modules like File::Find::Rule often get mentioned as alternatives, except that those of course recurse into subdirectories by default.)

use Path::Class qw/dir/; my @files = dir('foo', 'bar quz', 'baz')->children; # @files includes .dot files, but not . and .. # and its elements are Path::Class objects print "<$_>\n" for @files;

Now, of course this isn't to say glob is all bad, I've certainly used and recommended it plenty of times. If one has read all of its documentation, including File::Glob, and is aware of all the caveats, and especially if one is using fixed strings for the patterns, it can be perfectly fine. But I still think it should not be blindly used or recommended.

Replies are listed 'Best First'.
Re: To glob or not to glob
by Laurent_R (Canon) on Jan 07, 2018 at 22:21 UTC
    Hi haukex,

    you're making a very valid point++. I sort of feel guilty here, since I must admit that I do recommend glob every now and then (though quite rarely), because it makes most things shorter and easier (especially, compared to opendir/readir, because it returns filenames with their relative paths, that's really nice), it does what I want in probably 95% of the cases. And I must further confess that, when I happen to recommend glob, I usually do not list the caveats (especially not the fourth one, which I know but would not even think of mentioning); if you write one short line to recommend to perhaps use glob for its ease in a given situation, you just can't really add four paragraphs to warn about possible issues in other situations.

    To tell the truth, I usually hesitate a bit before recommending it, because of the issues that you listed, but I still think it is a very practical operator and I don't think it should be banned (at least not on *nix systems). I am happy that it removes directory entries starting with a dot (it saves me a grep with a regex), because that's exactly what I need most of the time. I also like that way it does pattern expansion to pick up the files that I'm looking for (saves another grep). Even if it's not used so often, the multi pattern expansion when there is a white space can really be handy. And I am especially happy that it returns filenames with paths, as this is very often what I want to have. These are quite significant advantages IMHO.

    Yet, I also agree with the caveats you're mentioning. Should we then throw the baby out with the bath water? I'm not quite sure, but I would tend to think we shouldn't. But, clearly, it should be used only in well defined situations. Should we recommend it? Or stop recommending it? I don't know.

Re: To glob or not to glob
by choroba (Cardinal) on Jan 08, 2018 at 10:43 UTC
    Another potential danger of glob is its usage in scalar context. It should return the matches one by one, but the iterator is assigned to the place where glob is used and isn't reset when a new parameter is supplied to it:
    #! /usr/bin/perl use warnings; use strict; use feature qw{ say }; open my $fh, '>', $_ for my @files = qw( a1 a2 b1 b2 ); for my $mask ('a*', 'b*') { while (my $f = glob $mask) { say $f; last } } unlink @files;

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re: To glob or not to glob
by karlgoethebier (Abbot) on Jan 08, 2018 at 12:19 UTC
    "... those of course recurse into subdirectories by default"

    Yes, sure. But the iterator from Path::Tiny takes recurse => 0 as argument. And File::Find::Rule takes maxdepth( $level ). Same behavior as find with -maxdepth n - less or more. Hence no need to glob?

    Minor edit: Changed wording because of bad English...

    Best regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

    perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

Re: To glob or not to glob
by Tux (Canon) on Jan 11, 2018 at 07:52 UTC

    Using it the way you describe will cause porting issues for older perl versions:

    $ perl -MFile::Glob::bsd_glob -wE'say for bsd_glob ("*.foo")' Can't locate File/Glob/ in @INC (@INC contains: /pro/lib/pe +rl5/site_perl/5.14.2/IA64.ARCHREV_0-LP64-ld /pro/lib/perl5/site_perl/ +5.14.2 /pro/lib/perl5/5.14.2/IA64.ARCHREV_0-LP64-ld /pro/lib/perl5/5. +14.2 .). BEGIN failed--compilation aborted.

    edit: correct above example of failure. The correct syntax is just as bad though:

    $ perl -MFile::Glob=:bsd_glob -wE'say for bsd_glob ("*.foo")' "bsd_glob" is not defined in %File::Glob::EXPORT_TAGS at /pro/lib/perl +5/5.14.2/IA64.ARCHREV_0-LP64-ld/File/ line 45. File::Glob::import("File::Glob", ":bsd_glob") called at -e lin +e 0 main::BEGIN() called at -e line 0 eval {...} called at -e line 0 Can't continue after import errors at -e line 0. BEGIN failed--compilation aborted.

    And additional shit hits the fan when you would try to install it

    $ cpan File::Glob : The most recent version "1.28" of the module "File::Glob" is part of the perl-5.26.1 distribution. ...

    Does this in your opinion imply that File::Glob should be dual-lived?

    The current (today) absolute advised minimum version to be supported by the toolchain is perl-5.8.1. My example with 5.14.2 could arguably be considered recent enough.

    Digging in the Delta's:

    perl561delta.pod: File::Glob::glob() has been renamed to File::Glob::bsd_glob() perl58delta.pod: File::Glob::glob() has been renamed to File::Glob::bsd_glob() perl5160delta.pod: It has a new C<:bsd_glob> export tag, intended to replace C<:glob>. perl5260delta.pod: L<C<File::Glob::glob()> will disappear in perl 5.30. Use C<File::Glo +b::bsd_glob()> instead.

    Does *not* make me happy. I see the trouble that is addressed, but this makes portable programming much much harder. Looks like the only portable way to write it from 5.6.1 to blead is:

    use File::Glob; my @files = File::Glob::bsd_glob ("*.txt");

    Safe, but not really a nice replacement of the simple <*.txt> or glob ("*.txt").

    Enjoy, Have FUN! H.Merijn

      Thank you for the reply!

      Using it the way you describe will cause porting issues for older perl versions

      You are correct that the :bsd_glob export tag wasn't added until Perl v5.16, sometime from File::Glob 1.13 to 1.17 in File::Glob 1.15. (Note you've got a typo in your example, -MFile::Glob::bsd_glob. fixed) But from the File::Glob docs:

      The :glob tag, now discouraged, is the old version of :bsd_glob. It exports the same constants and functions, but its glob() override does not support iteration; it returns the last file name in scalar context.

      So one backwards-compatible way to go is use File::Glob ':glob';, unless you want to use it in scalar context, which might not be a good idea anyway due to the issue choroba described.

      $ touch 'foo bar.txt' $ perl -MFile::Glob=:glob -e 'print for <*foo bar*>'

      This works in Perl 5.6.2 thru 5.24, and warns about the deprecation of :glob in 5.26. What you can do in Perl v5.6 thru v5.26 (and hopefully beyond) is either

      use File::Glob 'bsd_glob'; print for bsd_glob('*foo bar*'); # -- or -- use File::Glob $] lt '5.016' ? ':glob' : ':bsd_glob'; print for <*foo bar*>;

      With the limitation of not being able to use the latter in scalar context until Perl v5.16 and up (Update: and the former not at all, unfortunately).

      Does this in your opinion imply that File::Glob should be dual-lived?

      No, I don't yet have a well-formed opinion either way.

Re: To glob or not to glob
by Eily (Monsignor) on Jan 08, 2018 at 11:20 UTC

    That's one of those posts where I wish I could upvote more than once, or give a golden ++ :).

    Like Laurent_R I think I may be guilty of recommending glob without commenting on the caveats. Partly because I rarely use the tool myself and don't know it well: I didn't know the wildcard chars beyond * and {} before that post, and the fact that the input is split on spaces had slipped my mind until I read from the same thread. I'll try to keep your meditation in mind next time I'm tempted to present glob as a magical solution to a problem.

Re: To glob or not to glob
by haukex (Archbishop) on Jan 09, 2018 at 21:08 UTC

    Thank you all very much for the replies so far! (I've /msg'ed everyone to make you aware of this reply-to-all.)

    Laurent_R, Eily, and morgon, thank you for your thoughts on recommending glob. I'm not sure if there is one true answer as to whether glob can generally be recommended or not - when used properly, it provides nice and short code, while on the other hand, the various caveats can be easily forgotten, and later refactoring of the code could introduce problems, e.g. when a formerly static pattern gets a variable interpolated into it. I guess my emphasis should have been a bit more strong: "it should not be blindly used or recommended."

    I would probably say that glob with fixed strings is generally fine, but interpolating variables (especially user-supplied paths at the beginning of the pattern) can get very tricky, and I personally often reach for Path::Class in these cases. Another situation in which glob is probably fine is when the user supplies the entire pattern, e.g. as part of a configuration file ("files = ~/foo/*.log"), and one then does @files = glob($config{files}). One can simply point to the documentation of glob and File::Glob when documenting that configuration option. (The problem of someone overriding glob globally remains, although I wonder how often that might happen...)

    choroba, thank you for the additional caveat, which I've added to the root node. Now that you mention it, I dimly remember this being discussed before, and a bit of super searching found this thread with the reference to RT#123404 - I should have remembered this!

    karlgoethebier, you are quite correct that those modules can also be limited to just one directory, thank you. It feels like a little bit of overkill to use a module like those for a single directory, but then again, it's good to use the tools one knows. My comment was hinting at the fact that I've seen modules like File::Find etc. recommended as an alternative to readdir without mentioning their recursive behavior.

Re: To glob or not to glob
by morgon (Priest) on Jan 07, 2018 at 19:13 UTC
    But I still think it should not be blindly used or recommended.
    Using it in one-off hacks is fine of course but it should not be recommended.