Don't ask to ask, just ask | |
PerlMonks |
Pipe dreamby tlm (Prior) |
on Sep 09, 2005 at 01:30 UTC ( [id://490397]=perlmeditation: print w/replies, xml ) | Need Help?? |
How come Unix's piping paradigm didn't make it into Perl? Or maybe it did and I didn't notice? Yes, I know that one can open pipes like this: ...but I have in mind something more integrated into Perl than that. Specially after the introduction of lexical handles, I would like to be able to take a read handle and transform it somehow to modify its output. For example, suppose the file foo.tsv consists of newline-separated records of tab-delimited fields, and I want to generate a "view" consisting of those records whose first field has the value 42. Furthermore, I only want fields 1, 3, and 8, and I want the resulting records to be sorted lexicographically. Finally, I want to put everything in foo_view.tsv. Easy:
But here's a different way to think about this: The function Filter::grepit takes an open read handle and a regex and returns a read handle that outputs only those records from the original handle that match the regex. The function Filter::cols takes an open read handle, a field delimiter, and a list of field numbers, and returns a record consisting of only those fields. Finally, Filter::sortit returns records in lexicographic order. Admittedly, this code is not more succinct and not much clearer than in the first version, though, subjectively, I find it easier on the eye somehow. But the potential big win is in the fact that, in principle, to sort the records we no longer have to read all the records into a Perl array, which could take up a lot of memory. This problem is relegated to the implementation of sortit. Of course, sortit could end up doing precisely that behind the scenes, but it could do something else. For example, sortit could fork the job off to sort(1):
Now, even for huge files, we can let sort(1) handle the problem of creating intermediate sorted fragments, merging them, etc. I'm sure there are better ways to implement this kind of thing, but you get the idea. Does anything like this already exist in CPAN? (The closest I've found is PerlIO layers, which I find pretty hard to use.) PS: FWIW, here are implementations of grepit and cols:
the lowliest monk
Back to
Meditations
|
|