Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re: Re: Re^4 Useful addition to Perl?

by etcshadow (Priest)
on Mar 07, 2004 at 21:47 UTC ( [id://334674]=note: print w/replies, xml ) Need Help??


in reply to Re: Re^4 Useful addition to Perl?
in thread Useful addition to Perl?

Well, the problem, as I see it, with writing this as a wrapper for File::Find is that that would be suboptimal for the most important use case, and that is perl one-liners (-pe and -ne). Also, for that matter, what this does and what File::Find do really only partially overlap, in that they both traverse directories... but that's about the end of it.

The ultimate intent of this is to DWIM when I say perl -mr -ne 'print if /foo/' *, and to not do anything silly in the process, like creating a list of every file on the file-system. Maybe I'm wrong, here, but I think that this is an important enough goal (both to do and to do well), that it outweighs the importance of reusing File::Find. Granted, I'm not saying that reuse shouldn't be involved... I sure as heck wouldn't want to reimplement File::Spec.

Really, what it comes down to is that File::Find implements a "push" interface from the file-system... that is, File::Find pushes file names into your code (because you give it a code-ref as an entry-point for your code). The thing is, though, that perl -ne or perl -pe would need a "pull" interface. That is, they translate to while (<>) { ... }. Which, itself, is essentially:

while (@ARGV) { $ARGV = shift @ARGV; open ARGV, $ARGV or warn("Couldn't open $ARGV: $!\n"), next; while (<ARGV>) { ... } }
Now, to look at that code, you can see that it is definitely trying to pull filenames out of @ARGV... so the easiest way to implement an interface on that is to tie a behavior to reading from @ARGV... which is exactly what I've done.

Now, it's true that I could make this pulling from @ARGV use File::Find as the behavior which underlies the read-event... but if I did that, then I'd end up reading in the whole file-system tree (or the whole sub-tree that is being accessed)... and if there's no good reason to do it that way, then I'd rather not. Granted, if File::Find offered a means to essentially say "depth => 1" (that is, give me all the contents of this directory, but don't traverse sub directories), then that might be worthwhile... as it would save the effort of opendir; readdir; closedir; grep; fix-file-names.... but that's just not what File::Find does. Moreover, I've never been happy with the fact that File::Find actually chdir's into the directory as it goes... that's just ugly. It should use File::Spec to prepend the leading path... but I digress.

Anyway, I hope that explains why I didn't want to use File::Find for this. I did give it serious consideration... but ultimately, I think that the method I arrived at in the end is the best one that I considered. It is simple, elegant, efficient, and useful. And doing it with File::Find just couldn't make it be all of those at once.

------------ :Wq Not an editor command: Wq

Replies are listed 'Best First'.
Re: Re: Re: Re^4 Useful addition to Perl?
by demerphq (Chancellor) on Mar 07, 2004 at 22:24 UTC

    Trouble is that the code as you posted is liable to go into a infinite loop if the directory listed contains a symlink to itself or to one of its parents. Also there are other similar problems IMO with your code... The reason I advocate using File::Find is that it's already handled these issues, as well as the other lurking in your code. Your idea is great. But IMO, you should avoid reinventing File::Find and just use it.


    ---
    demerphq

      First they ignore you, then they laugh at you, then they fight you, then you win.
      -- Gandhi


      Fair criticisms. The particular point that the posted code doesn't handle symlinks is valid, and will be fixed before I submit this. The meta-point that there will be other issues such as this that will inevitably come up, and that this will create wasted effort as similar parallel issues are fixed over time in File::Find is also very true.

      The issue, though, from my standpoint, is that File::Find doesn't offer any means (at least as far as I can think of... please tell me if I'm wrong) to turn it's use inside-out... that is, if you will, to ask File::Find for a file, rather than be told by File::Find that there is a file.

      Ideally, I'd be able to use File::Find::Iterator, but to look at it, it also doesn't reuse File::Find... it's just yet another implementation of directory-tree traversal, and so using it would bring in all the same issues of duplicating work of File::Find (not me duplicating the work, but the mantainers of File::Find::Iterator). Also, File::Find::Iterator appears to be not very complete (version .3), and not file-system-portable (it assumes a directory-separator, and it, also, doesn't handle symlinks).

      In a truly ideal world, File::Find would offer some kind of interface that allowed it operate in this manner, but I just don't see it / can't think of how to do it. Sadly, it would be easy to build File::Find's interface out of a thing wrapper around File::Find::Iterator's, but not vice-versa, and File::Find is the one that (currently) works :-(

      So, I suppose I can reframe the question as: is there a way to build an iterator-like interface out of an event-generator interface? Even doing ugly stuff with goto's, I can't think of how to get past the fact that I'd have to be leapfroging backward and forward over a couple of stack-frames (or, more precisely, saving a couple of stack-frames off to the side, and then deleting them... then restoring them back later).

      What would make me really want to use File::Find for this is if the maintainers of File::Find decided to flip things around a little bit so that File::Find was just a thin wrapper around an underlying iterator class that did the real work... and then I could piggy-back off of that same underlying iterator.

      I don't mean to sound closed-minded... I'm not. I'm just trying to figure out a solution to a problem with certain constraints, and to the best I've been able to figure out so far, File::Find won't work within those constraints. I would actually love to figure out how to use File::Find for this, but to the best of my knowledge, it won't. I'd love to hear suggestions about how to fit File::Find into this problem, without violating the two primary constraints that:

      • @ARGV not be blown up to include the entire file tree
      • perl -ne '...' (and any similar looping over <>) works in a completely DWIM fashion.
      Thanks for any ideas.
      ------------ :Wq Not an editor command: Wq

        The issue, though, from my standpoint, is that File::Find doesn't offer any means (at least as far as I can think of... please tell me if I'm wrong) to turn it's use inside-out... that is, if you will, to ask File::Find for a file, rather than be told by File::Find that there is a file.

        Well, to me what you are doing is transforming directories into a list of files right? So the code would be:

        sub recurse_dir { my $dir=shift; my @files; find { wanted=> sub {push @files,$_ unless -d $_}, no_chdir=>1},$dir +; return @dirs; }

        Which then makes your code become:

        my @argv=map { -d $_ ? recurse_dir($_) : $_ } @ARGV;

        Note that this code replaces FWICT your entire doesnt repeatedly stat files it already has, and is robust and portable, and could obviously be inlined further and provides a while host of filtering and hooks with low effort. All you have to do is wrap your tie logic around it and presto...

        Also I bypassed the point about not putting the entire tree into the array. I suspect that you will find that in order to prevent circular directory structures blowing you out of the water you are going to have to store all the visited directories, which essentially means hold the whole lot in memory. Essentially I dont see this as a particularly good idea.


        ---
        demerphq

          First they ignore you, then they laugh at you, then they fight you, then you win.
          -- Gandhi


Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://334674]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (3)
As of 2024-04-24 00:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found