comment on

Thank you all for your thoughts!

I guess being coy isn't really helpful, I was just trying to avoid distractions. This is (really) part of a parser I've written for structural Verilog netlists in an IC design environment. The whole parser does much more work and handles many more nuances of the input format, but I've been happy enough (for years) with the performance without trying to speed it up further (eg sw1's whitespace handling tips). For example, a typical report on a typical netlist input file (200+Mb) would run in ~10 seconds. But then CAD upgraded our central Perl (from 5.8.8 to 5.30) and the same code, on the same input took 1 hour 38 minutes! (With the same output results, so it's purely a performance issue, not a correctness issue.) So then I set about cutting it down to a very small testcase with the 5 sec to 105 sec delta.

AnomalousMonk: it's just plain ascii text; I tried throwing "aa" on it (for 5.30) and didn't change things.

kschwab: below I've put a better fake-data generator to get a consistent 2.5x slowdown (still not as bad as the 10x on the real data, but hopefully more representative of the problem. The majority of the names would be unique strings, etc, etc). And also here's the un-foo-ified cut-down parser, too.

SBECK: thank you! I guess I'll start looking at 5.20-related deltas. But I don't know that I'll be able to infer anything on my own; suggestions welcome from all on how to proceed (file a github issue, etc?)

faker & parser are used like this:

$ ./makev.pl 10000 > test.10k.v
$ time perl_5.8.8 -w testv.pl test.10k.v
LAST MODULE (Perl 5.008008): je0dhj
perl_5.8.8 -w testv.pl test.10k.v  0.01s user 0.01s system 0% cpu 11.3
+52 total
$ time perl_5.30 -w testv.pl test.10k.v
LAST MODULE (Perl 5.030000): je0dhj
perl_5.30 -w testv.pl test.10k.v  0.01s user 0.02s system 0% cpu 26.29
+1 total
[download]

makev.pl:

!/usr/bin/perl -w
use Text::Wrap;
use strict;

my $num = shift or die "num?\n";

my @chars = ( "a" .. "z", 0 .. 9 );

for my $i (0 .. $num) {

    my $name = join("", @chars[ map { rand @chars } ( 1 .. 2+int(rand(
+8)) )]);
    my @io   = map { "p${name}${_}" } (0..int(rand(100)));
    my @hier = map { "m${name}${_}" } (0..int(rand(20)));
    my @leaf = map { "s${name}${_}" } (0..int(rand(200)));

    print "module ${name} ( ", wrap('', '  ', join(", ", @io)), "\n);\
+n";
    print "  inout  $_;\n" foreach @io;
    print "  wire   $_;\n" foreach @io;

    for my $leaf (@leaf) {
        my @conn = map { ".P${name}${_} (n${name}${_})" } (0..int(rand
+(5)));
        print "$leaf u_$leaf ( ", wrap('', '  ', join(", ", @conn)), "
+\n);\n";
    }

    for my $hier (@hier) {
        my @conn = map { ".p${name}${_} (n${name}${_})" } (0..int(rand
+(100)));
        print "$hier u_$hier ( ", wrap('', '  ', join(", ", @conn)), "
+\n);\n";
    }
    print "endmodule\n\n";
}
[download]

testv.pl:

use strict;
use File::Slurp;

my $file = shift or die "file?\n";
my $text = read_file($file);

parse_v($text);

sub parse_v {
    my $text = shift;
    my $name;
    {
        last if $text =~ /\G \s* \Z/gcmsx;

        if     ($text =~ /\G \s* ^ \s* module \s+ (\S+?) \s* \( \s* (.
+*?) \s* \) \s* ;/gcmsx) { $name = $1 }
        elsif  ($text =~ /\G \s* ^ \s* endmodule        /gcmsx) { }
        elsif  ($text =~ /\G \s* ^ \s* \S+ \s+ .*? \s* ;/gcmsx) { }
        else { die "ERROR: unknown syntax\n" }

        redo;
    }
    print "LAST MODULE (Perl $]): $name\n";
}
[download]

In reply to Re: regex gotcha moving from 5.8.8 to 5.30.0? by mordibity
in thread regex gotcha moving from 5.8.8 to 5.30.0? by mordibity

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


There's more than one way to do things
	PerlMonks