Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re: Fastest way to minimally check that file contains perl code?

by haj (Vicar)
on Mar 13, 2020 at 13:39 UTC ( [id://11114222]=note: print w/replies, xml ) Need Help??


in reply to Fastest way to minimally check that file contains perl code?

Perl is notorious for being able to parse stuff which looks like garbage, there's a whole category for Obfuscated code on PerlMonks. So let's hope your Perl programmers do this in less than 20% of their files ;)

For the general task of classifying data, there's AI::NaiveBayes and AI::Categorizer. They both need some adaption to parse text into the categories "Perl source code" and "garbage". I would guess that you get 80% accuracy with a filter based on the regular expressions presented by other monks, so only if this fails, training a Bayesian might be an alternative.

  • Comment on Re: Fastest way to minimally check that file contains perl code?

Replies are listed 'Best First'.
Re^2: Fastest way to minimally check that file contains perl code?
by LanX (Saint) on Mar 13, 2020 at 16:28 UTC
    > there's a whole category for Obfuscated code on PerlMonks

    On a side note: It's possible to run Perl::Tidy in a server mode, which is far faster than starting it up for each file.

    Though I doubt it's faster than perl -c , unless using/requiring a large tree of dependencies (like Moose) is causing the lag here.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

      The Moose argument is why I like vr's idea. If perl -c doesn't bomb out early on you can be pretty confident it is actually compiling Perl (with Moose or something equally heavy).

        > it is actually compiling Perl (with Moose or something equally heavy)

        Well, after the timeout you'd only have proven (again and again) that Moose contains Perl, the file in question could still be just garbage starting with use Moose

        But yeah, this should be sufficient for the 80% threshold. :)

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11114222]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (4)
As of 2024-04-26 06:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found