Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Find installed Perl modules matching a regular expression

by toolic (Bishop)
on Sep 15, 2009 at 16:01 UTC ( [id://795418]=CUFP: print w/replies, xml ) Need Help??

Here is a handy command-line tool to quickly view installed Perl modules whose name matches a specified regular expression.

Features

  • Perl regular expression syntax, with separate case-sensitive switch.
  • Optional initialization file for faster look-ups.
  • Option to print the module name or the full directory path to the module file.
  • Option to display duplicate modules and other statistics.
  • Uses only core modules.

Other well-known methods

So... why another way to do it?

Simply put: I could not easily convince these other tools to Do What I Want, as quickly as I want, in (what I consider) a bug-free manner. Obviously, this is not a new idea; it is merely a different implementation. There are many threads here at the Monastery, as a Super Search would reveal, and I have probably read every node of every thread on the topic. I believe HTML::Perlinfo does everything this script does (and much, much more), except that I could not easily figure out how to generate output as simple text, rather than HTML. I consider HTML::Perlinfo to be a valuable companion to this script. I run a daily cronjob to dump out recent versions of both HTML and text.

Impatience

In my opinion, the biggest advantage here is the fast look-up capability. No matter how you slice it, you have to search through the @INC directories via some variant of find, which can take a whole minute or so -- I just do not have the patience to wait that long! Maintaining the initialization file avoids that nonsense.

The code

use warnings; use strict; use Getopt::Long; use Pod::Usage; use File::Find; my $print_path; my $report; my $re; parse_args(); # Clean up @INC my @dirs; for my $dirname (@INC) { if (-d $dirname) { next if $dirname eq '.'; $dirname =~ s{/+}{/}g; $dirname =~ s{/$}{}; push @dirs, $dirname; } } @dirs = uniq(@dirs); # For quicker operation, use init file, if it exists my @files; my $use_find = 1; my $message; my $init_file = exists $ENV{HOME} ? "$ENV{HOME}/.findpm" : ''; if (-e $init_file) { if (open my $fh, '<', $init_file) { @files = <$fh>; close $fh; chomp @files; my $days = 1; if (-M $init_file > $days) { $message = "Warning: $init_file is older than $days day\n" +; } die "Error: $init_file is empty" if -z $init_file; $use_find = 0; } else { $message = "Warning: $init_file exists, but can not be opened: + $!"; } } # Otherwise, use the slower find command if ($use_find) { # Find all .pm files under @INC dirs my @find_dirs = reduce_dirs(@dirs); find( { wanted => sub { push @files, $_ if -f $_ and /\.pm$/ }, no_chdir => 1, }, @find_dirs ); @files = uniq(@files); } # Print those modules/files which match the regex my %mods; for my $file (@files) { my @ds; for my $dir (@dirs) { if (index($file, $dir) == 0) { #print "$d2 is a substring of $d1, starting at pos 0\n" push @ds, $dir; } } my $d = (sort {length($b) <=> length($a)} @ds)[0]; my $rel = substr($file, (length($d)+1)); my $name = $rel; $name =~ s/\.pm$//; next unless $name =~ /$re/; push @{ $mods{$rel} }, $d; if ($print_path) { print "$file\n"; } else { $rel =~ s/\.pm$//; $rel =~ s{/}{::}g; print "$rel\n"; } } if ($report) { my $num_dups = 0; for (keys %mods) { $num_dups++ if (scalar(@{$mods{$_}}) > 1); } if ($num_dups) { print "\nDUPLICATES\n"; for my $rel (keys %mods) { if (scalar(@{$mods{$rel}}) > 1) { print "$rel\n"; for my $dir (@{$mods{$rel}}) { print " $dir/$rel\n"; } } } } print "\nSUMMARY\n"; print " regex = $re\n"; print " Used '$init_file' init file instead of 'find'\n" unless + $use_find; print " INC dirs:\n"; print " $_\n" for @dirs; print ' Total ".pm" files = ', scalar @files, "\n" +; print ' Matching unique ".pm" files = ', scalar keys %mods, +"\n"; print ' Matching duplicate ".pm" files = ', $num_dups, "\n"; } warn $message if $message; exit; sub reduce_dirs { # Reduce a list of directory names by eliminating # names which contain other names. For example, # if the input array contains (/a/b/c/d /a/b/c /a/b), # return an array containing (/a/b). my @dirs = @_; my %substring_count = map { $_ => 0 } @dirs; for my $x (@dirs) { for my $y (@dirs) { next if $x eq $y; if (index($x, $y) == 0) { # if y is substring of x, starting at position 0 $substring_count{$x}++; } } } my @dsubs; for (keys %substring_count) { push @dsubs, $_ if $substring_count{$_} == 0; } return @dsubs; } sub uniq { # From List::MoreUtils, $VERSION = '0.22' my %h; map { $h{$_}++ == 0 ? $_ : () } @_; } sub parse_args { my ($help, $sens); GetOptions( 'sens' => \$sens, 'path' => \$print_path, 'report' => \$report, 'help' => \$help ) or pod2usage(); $help and pod2usage(-verbose => 2); my $pat = (@ARGV) ? shift @ARGV : '.'; $pat =~ s{::}{/}g; $re = ($sens) ? qr/$pat/ : qr/$pat/i; #print "pat=$pat\n"; #print "re=$re\n";#exit; @ARGV and pod2usage("Error: unexpected args: @ARGV"); } =head1 NAME B<findpm> - Find installed Perl modules =head1 SYNOPSIS findpm [options] [regex] Options: -help verbose help -path print out full directory paths also -report print out detailed report -sens case-sensitive [default is case-insensitive] =head1 DESCRIPTION Search through the directories in the Perl C<@INC> variable for Perl module files (all files with a C<.pm> extension) matching a specified regular expression. The names of all the modules which match will be printed to STDOUT. Any directories listed in C<@INC> which do not exist will be silently +ignored. Excludes the current directory (.). If you are impatient (like I am) you can optionally use an initializat +ion file instead of letting the script search through all the C<@INC> directories every time you run the script. The file must be in your h +ome directory and must be named C<.findpm>. You must create this file you +rself (see EXAMPLES below), and you should keep it up to date. Since you wi +ll get a warning if the init file is more than a day old, I recommend creating the file using a cron job that runs once a day. If the init +file does not exist, the script will proceed to search C<@INC>. =head1 ARGUMENTS =over 4 =item regex An optional regular expression may be given. The regex may be a simpl +e string, such as C<foo>, or it may be a more complicated expression, su +ch as C<^foo.*bar\d>. The regex syntax is Perl; it should not be confused with shell wilcard syntax or the syntax for other common Unix utilitie +s, such as I<sed> or I<grep>. It is best to quote the regex to prevent interaction with the shell. Do not include the C<.pm> extension as par +t of the regex. If no regex is given, find all modules. =back =head1 OPTIONS All options can be abbreviated. =over 4 =item sens By default, the regular expression is case-insensitive. So, if the inp +ut regex is C<foo>, it will match C<foo> as well as C<FOO> and C<Foo>, et +c. To use case-sensitive, use the C<-sens> option. findpm -sens foo =item path By default, only the module name is printed. To instead print the full directory path to the module file, use the C<-path> option. findpm -path foo =item report To print out additional statistics, use the C<-report> option. This will show the total number of matching modules, duplicate modules +, etc. findpm -report =item help Show verbose usage information. =back =head1 EXAMPLES Find xml modules: findpm xml Find modules with case-sensitive "Ext": findpm -sens Ext Find modules like File::Find. The following are equivalent because C<::> will be converted to C</> (similar to I<perldoc>): findpm 'file::find' findpm 'file/find' Find all modules in all C<@INC> directories: findpm Create init file: rm -f ~/.findpm; findpm -path > /tmp/.findpm; mv /tmp/.findpm ~/.f +indpm =head1 CONFIGURATION AND ENVIRONMENT Searches for an optional initialization file in the directory specifie +d by the C<HOME> environment variable: ${HOME}/.findpm =head1 LIMITATIONS The initialization file is only supported for Unix-type operating syst +ems. =cut

Constructive criticism, suggestions for improvements and bug reports are welcome.

Update: Now only uses core modules.
Update: Avoid potential warning; small change to POD.
Update: find is more portable.

Replies are listed 'Best First'.
Re: Find installed Perl modules matching a regular expression
by Anonymous Monk on Sep 16, 2009 at 12:06 UTC
    To get rid of the unix limitation you could die without $ENV{HOME}, or use File::HomeDir.

    Here is my caching version, works on ALL systems :)

    echo pml is module names list pminst >pml echo pmlf is module filenames list pminst -l >pmlf echo pmlfl is name tabspace filename paste pml pmlf >pmlfl grep "^CGI::S[^:]*$" pml grep "CGI/S[^/]*$" pmlf grep -P "^CGI::S\w+$" pml grep -P "^CGI::S\w+\t" pmlfl perl -lne "print $_ if /^CGI(::\w+)$/" pml perl -lne "print $_ if m!CGI/S[^/]*$!" pmlf perl -lne "print $_ if /^CGI(::\w+)\t/" pmlfl
    update: Whoops, I just realized pminst is broken in 2 ways, This doesn't match
    D:\>pminst Wx$
    And this matches prints MSWin32-x86-multi-thread
    D:\>pminst Wx.pm$ Wx MSWin32-x86-multi-thread::Wx D:\>perl -le"print for @INC" C:/perl/5.10.1/lib/MSWin32-x86-multi-thread C:/perl/5.10.1/lib C:/perl/site/5.10.1/lib/MSWin32-x86-multi-thread C:/perl/site/5.10.1/lib .
      I appreciate the feedback ++
      To get rid of the unix limitation you could die without $ENV{HOME}, or use File::HomeDir.
      You are correct: the reason for my self-imposed unix limitation is that I was unaware of how to handle $ENV{HOME} in a portable way. Thanks for bringing the File::HomeDir module to my attention. For my purposes, I have come to realize that it is important to only use core modules in this script. The original version of the script used the non-core List::MoreUtils. I ran into problems on one system configuration here @work which, unbelievably, did not have it installed. So I could not even analyze what modules were installed because my script died because it could not use a module!

      I will take a look at the File::HomeDir source code to see if I can incorporate its techniques for making findpm portable.

      Whoops, I just realized pminst is broken in 2 ways
      I am also aware of 2 bugs in pminst:
      1. It completely misses some modules.
      2. It unnecessarily duplicates some modules in its output. I believe this is the same as the MSWin32-x86-multi-thread issue you mentioned. It does not seem to handle all of the @INC paths gracefully. At first, I was willing to concede that my sysadmins set @INC in an unconventional manner... until you mentioned that it was also an issue for your system.

      I should file a bug report on CPAN. Unfortunately, it is not obvious to me how to patch the code. I guess this is the reason I created the findpm script in the first place.

      Update: Someone has reported a bug: https://rt.cpan.org/Public/Bug/Display.html?id=50644

Re: Find installed Perl modules matching a regular expression
by toolic (Bishop) on Sep 22, 2009 at 20:55 UTC
    I have updated the code to be more portable to other operating systems. The restriction now is that the initialization file is only supported for Unix-type operating systems. But, even that could be fixed by changing a single line in the source code.
      Here is another idea, I would replace
      if (-M $init_file > $days) { $message = "Warning: $init_file is older than $days day\n"
      with a check to see if $init_file is more recently modified than a directory in @INC, like
      for my $init_file ( '.', '..' ) { my $mod = ( stat $init_file )[9]; if ( my @mod = grep { ( stat $_ )[9] > $mod } @INC ) { warn "Warning: $init_file is older than (", join( ' , ', @mod +), ") "; } } __END__ Warning: .. is older than (C:/perl/5.10.1/lib/MSWin32-x86-multi-thread + , C:/perl/5.10.1/lib , C:/perl/site/5.10.1/lib/MSWin32-x86-multi-thr +ead , C:/perl/site/5.10.1/lib , .) at - line 4.
        Nice idea. However, it does not seem to work if a sub-directory of a directory in @INC has been modified. Since directories under @INC can be arbitrarily deep, it would be necessary to perform a find on all directories, which is what the init file was designed to avoid.

        Perhaps there is a more efficient way to check if any directory has been modified throughout a tree.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: CUFP [id://795418]
Approved by zwon
Front-paged by zwon
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (2)
As of 2024-04-16 16:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found