Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
Without going into the golfish ways of doing this, but sticking to script form, there's a whole lot of things you can improve.
my %skip = ( 'gif' => 1, 'jpg' => 1, 'jpeg' => 1, 'png' => 1 );
I would prefer to write this like so:
my %skip_for; @skip_for{qw( gif jpg jpeg png )} = ();

and later test using exists $skip_for{$ext}

Next note: you can just use $File::Find::name rather than "$File::Find::dir/$_"

Then we have a case of redundant syntax: in \&{ sub { ... } } the sub{ ... } already gives you a reference. Then your &{} goes and dereferences it, only to feed it back to the \ which makes a reference from the result again. You can drop the surrounding \&{} and simply write sub { ... } here.

I am a bit puzzled by this:

  my ($nil,$ext) = $file =~ /^(.*?)\.(.*?)$/gs;

If you throw away the first capture, why capture at all?

  my ($ext) = $file =~ /^.*?\.(.*?)$/gs;

which is better written as

  my ($ext) = $file =~ /[.]([^.]+)$/gs;

(In words: I want as many non-dot characters as there are in front of the end of the string, update: but only if there's a dot in the filename.)

The $ext = '' unless defined $ext; can be avoided if you put the $skip{$ext} inside an if(/match here/)

Lastly, since you're not interested in the individual lines of your input, but separating the input costs effort, it would be better to unconditionally slurp large chunks of X bytes instead.

The next point is a maneuvre critique. Why would one first fetch a list of directories and then go and read each directory manually, when the same first search already gives you all the file names on a silver plate? (And why are counting something, when you never use that count? :-))

And lastly, rather than hardcode the directory in the script, it's preferrable to take them as parameters from the commandline.

So here's an updated version:

#!/usr/bin/perl -w use strict; use Fcntl; use File::Find; my %skip_for; @skip_for{qw( gif jpg jpeg png )} = (); find( sub { next if -d or /^[.]/; next if /[.]([^.]+)$/ and exists $skip_for{$1}; my $content = ""; # gobble and mangle 64k chunks at a time sysopen FH, $_, O_RDWR; s/\r//g, $content .= $_ while sysread FH, $_, 65536; # go back to top of file sysseek FH, 0, 0; syswrite FH, $content, length $content; # the file still has its original length, # because we didn't clobber it with an open FH, ">file" # so we need to fix that truncate FH, tell FH; close FH; }, (@ARGV) || "." # NB: parens required );

Further improvement might be to use some Getopt:: module to allow the user to change the $skip_for rules.

Update: I must have been asleep as well. Kudos to Zaxo for pointing out my regex would return the whole filename for extensionless files. Also, I need to go flaggelate myself for a while:

sysopen FH, $_, O_RDWR or (warn "Couldn't open $File::Find::name: $!\n", return); s/\r//g, $content .= $_ while ( defined (sysread FH, $_, 65536) or (warn "Couldn't open $File::Find::name: $!\n", return) );
and, of course,
return if -d or /^[.]/; return if /[.]([^.]+)$/ and exists $skip_for{$1};
since this is a sub, not a for loop. I feel stupid now. Oh well, guess we can feel stupid together. :-)

Makeshifts last the longest.


In reply to Re: Recursive File Substitution by Aristotle
in thread Recursive File Substitution by mt2k

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (5)
As of 2024-04-18 04:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found