http://qs321.pair.com?node_id=11120143


in reply to Simple way to skip spaces and # comments

Treat "#" and the following characters as a single whitespace character.

/\G (?: \s | \# .* )++ /xgc

More efficient?

/\G (?: \s++ | \# .*+ )++ /xgc

Replies are listed 'Best First'.
Re^2: Simple way to skip spaces and # comments
by leszekdubiel (Scribe) on Jul 31, 2020 at 14:36 UTC
    /\G (?: \s++ | \# .*+ )++ /xgc

    ^^^^ This makes error about recursion limit...

    # for f in `seq 40123`; do echo " #alfa beta"; done | perl -e 'use str +ict; use warnings; undef $/; my $s = <STDIN>; print length $s, "\n"; +$s =~ /\G (?: \s++ | \# .*+ )++ /xgc; print pos $s, "\n"; ' 481476 Complex regular subexpression recursion limit (32766) exceeded at -e l +ine 1, <STDIN> chunk 1. 196597

      If certain things are expected to match more than 32766 times, you need to break it down.

      So if the following exceeds the limit,

      a+
      you have to use
      (?:a{1,32766})+
      So,
      /\G (?: \s++ | \# .*+ )++ /xgc
      becomes
      /\G (?: (?: \s++ | \# .*+ ){1,32766}+ )+ /xgc
      Or maybe even
      /\G (?: (?: (?: \s{1,32766}+ )++ | \# (?: .{1,32766}+ )*+ ){1,32766}+ )+ /xgc