Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

parsing CPAN urls

by perl5ever (Pilgrim)
on May 12, 2011 at 17:32 UTC ( #904505=perlquestion: print w/replies, xml ) Need Help??

perl5ever has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

Is there a CPAN module which extracts the author, module and version from strings like:

MSERGEANT/Time-Piece-1.20.tar.gz http://backpan.perl.org/authors/id/A/AR/ARANDAL/Pod-Simple-3.07.tar.gz http://search.cpan.org/CPAN/authors/id/T/TE/TELS/math/Math-BigInt-1.89 +.tar.gz

Are there any other odd-ball cases like the Math-BigInt url?

Alternatively, what's your favorite method/regexp to parse them?

Note: I realize that the _correct_ module and version won't always be derivable from the CPAN url, but I am only looking for something which gives the best guess at what they should be.

Replies are listed 'Best First'.
Re: parsing CPAN urls
by bingos (Vicar) on May 12, 2011 at 19:09 UTC
Re: parsing CPAN urls
by Khen1950fx (Canon) on May 12, 2011 at 20:41 UTC
    Based on bingos recommendation, I tried CPAN::Easy. It's built on top of CPAN::DistnameInfo, and it's fast. It will also fetch the tarball for you, if you want.
    #!/usr/bin/perl use strict; use warnings; use CPAN::Easy; use Data::Dumper::Concise; my(@mods) = ( 'Time::Piece', 'Pod::Simple', 'Math::BigInt' ); foreach my $mod(@mods) { my($info) = CPAN::Easy->get_info($mod); print Dumper($info); }
Re: parsing CPAN urls
by educated_foo (Vicar) on May 12, 2011 at 19:31 UTC
    Here's a regex that should usually work:
    while (<DATA>) { next unless my ($a, $p, $v) = m!id/[A-Z]/[A-Z]{2}/([A-Z]+) # id/X/XY/XYNAME .*/([^/]+) # /Module-Name -([\d.]*\d) \.[targz.]+$ # -N.M.O.tar.gz !x; $p =~ s/-/::/g; print "author=$a, package=$p, version=$v\n"; } __END__ MSERGEANT/Time-Piece-1.20.tar.gz http://backpan.perl.org/authors/id/A/AR/ARANDAL/Pod-Simple-3.07.tar.gz http://search.cpan.org/CPAN/authors/id/T/TE/TELS/math/Math-BigInt-1.89 +.tar.gz
Re: parsing CPAN urls
by DrHyde (Prior) on May 13, 2011 at 09:25 UTC
    Why settle for a "best guess" when you can use CPAN::ParseDistribution! This is based on the code in PAUSE that indexes uploads.
      Because you have to download the actual tarball to use it, while the OP only has a list of URLs or filenames.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://904505]
Approved by herveus
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (7)
As of 2021-04-23 14:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?