Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re^5: grepping CPAN?

by marto (Cardinal)
on Oct 01, 2021 at 14:27 UTC ( [id://11137170]=note: print w/replies, xml ) Need Help??


in reply to Re^4: grepping CPAN?
in thread grepping CPAN?

That's not the issue, minicpan, unless you're doing something weird, should only pull back the latest releases required to build distributions. Over the years people have uploaded many modules, and some very large in the App space (including vast bundles of other software). Unless you configure it to ignore bloat then you won't avoid this, and even then I've come across legitimate modules that have a dependency on ACME modules (for 'test' data).

Replies are listed 'Best First'.
Re^6: grepping CPAN?
by LanX (Saint) on Oct 01, 2021 at 14:53 UTC
    How do you define bloat in a filterable way?

    (Did I miss a bloat flag in the meta files? ;)

    FWIW: For the purpose of this thread downloading only pure text like Perl code should be fine. (or just excluding any binary)

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

      I made a mirror, was shocked at the size, then dug into where all the disk space was going. Stuff I'd never use, single App distros that were >100mb, apps for platforms I'd never use, ridiculous, insecure crap nobody should use.

        I followed your link out of curiosity, and FullAuto doesn’t seem bad to me. Orchestrating dozens of servers can be a pain. I usually just use .ssh/authorized_keys but it does have a connection overhead on every single remote command I run. I get around that by putting scripts on the remote host so that I’m only making a few ssh calls and the remote scripts do the lifting. FullAuto looks like a reasonable alternative design.
        so you are manually maintaining a blacklist of distros?

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

Re^6: grepping CPAN?
by LanX (Saint) on Oct 02, 2021 at 14:05 UTC
    > Unless you configure it to ignore bloat then you won't avoid this

    For the aim of parsing all Perl&POD source locally I'd need to pull all text and ignore binaries and other "bloat" (to be defined) to save disc space.

    But this won't be faster in net-load, since AFAIK does filtering happen after downloading the full dist's tgz. °

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

    °) well probably avoiding extracting certain files from the tgz might speed up things a little tho.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11137170]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (8)
As of 2024-04-19 14:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found