Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

parsing CPAN urls

by perl5ever (Pilgrim)
on May 12, 2011 at 17:32 UTC ( [id://904505]=perlquestion: print w/replies, xml ) Need Help??

perl5ever has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

Is there a CPAN module which extracts the author, module and version from strings like:

MSERGEANT/Time-Piece-1.20.tar.gz http://backpan.perl.org/authors/id/A/AR/ARANDAL/Pod-Simple-3.07.tar.gz http://search.cpan.org/CPAN/authors/id/T/TE/TELS/math/Math-BigInt-1.89 +.tar.gz

Are there any other odd-ball cases like the Math-BigInt url?

Alternatively, what's your favorite method/regexp to parse them?

Note: I realize that the _correct_ module and version won't always be derivable from the CPAN url, but I am only looking for something which gives the best guess at what they should be.

Replies are listed 'Best First'.
Re: parsing CPAN urls
by bingos (Vicar) on May 12, 2011 at 19:09 UTC
Re: parsing CPAN urls
by Khen1950fx (Canon) on May 12, 2011 at 20:41 UTC
    Based on bingos recommendation, I tried CPAN::Easy. It's built on top of CPAN::DistnameInfo, and it's fast. It will also fetch the tarball for you, if you want.
    #!/usr/bin/perl use strict; use warnings; use CPAN::Easy; use Data::Dumper::Concise; my(@mods) = ( 'Time::Piece', 'Pod::Simple', 'Math::BigInt' ); foreach my $mod(@mods) { my($info) = CPAN::Easy->get_info($mod); print Dumper($info); }
Re: parsing CPAN urls
by educated_foo (Vicar) on May 12, 2011 at 19:31 UTC
    Here's a regex that should usually work:
    while (<DATA>) { next unless my ($a, $p, $v) = m!id/[A-Z]/[A-Z]{2}/([A-Z]+) # id/X/XY/XYNAME .*/([^/]+) # /Module-Name -([\d.]*\d) \.[targz.]+$ # -N.M.O.tar.gz !x; $p =~ s/-/::/g; print "author=$a, package=$p, version=$v\n"; } __END__ MSERGEANT/Time-Piece-1.20.tar.gz http://backpan.perl.org/authors/id/A/AR/ARANDAL/Pod-Simple-3.07.tar.gz http://search.cpan.org/CPAN/authors/id/T/TE/TELS/math/Math-BigInt-1.89 +.tar.gz
Re: parsing CPAN urls
by DrHyde (Prior) on May 13, 2011 at 09:25 UTC
    Why settle for a "best guess" when you can use CPAN::ParseDistribution! This is based on the code in PAUSE that indexes uploads.
      Because you have to download the actual tarball to use it, while the OP only has a list of URLs or filenames.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://904505]
Approved by herveus
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (6)
As of 2024-03-29 01:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found