Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: Problem with regex is a bug? or my regex (updated)

by haukex (Bishop)
on Nov 20, 2021 at 08:59 UTC ( #11138972=note: print w/replies, xml ) Need Help??


in reply to Problem with regex is a bug? or my regex

<update2> Just posting this here for visibility: The solution is further down in the thread. </update2>

Both when filing a bug report and when asking a question here, you'll need to provide a Short, Self-Contained, Correct Example that reproduces the issue. That is, runnable code that includes sample input and expected output. Note that here, the regex you showed and the output you provided do not match (and it doesn't appear you're using a common module such as Data::Dumper to output your strings?). Your regexes would also benefit greatly from the /x modifier. The following code runs fine on Perl 5.18 through 5.34 on my system.

use warnings; use strict; use Test::More tests=>4; my $str = "########################################################### +###################\r\n# This system is a restricted access system. + #\r\n# If collected security informati +on reveals possible criminal activity that #\r\n# exceeds privileges +, evidence of such activity may be provided to the rele- #\r\n# vant +authorities for further action. By continuing past this point, you + #\r\n# expressly consent to this security monitoring. + #\r\n################################################## +############################\r\n\r\nhostname: ~# "; my $re1 = qr{(([#%:>~\$\] ])(?!\2)){3,4}|([\w\-\.]*)\$ *$|\w[@\/]\w.*? +[#%>~\$\]]|^[#%\$>\:]~] *$}; my $re2 = qr{(([#%:>~\$\] ])(?!\2)){3,4}|([\w\-\.]*)\$ *$|(\w[@\/]\w|s +ftp).*?[#%>~\$\]]|^[#%\$>\:]~] *$}; ok $str =~ $re1; is $&, ": ~#"; ok $str =~ $re2; is $&, ": ~#";

Update: After looking at those regexes a little closer, I fail to see how either of them could match "  #" at all: in the first branch, every time a space matches it has to be followed by something that isn't a space or #, and in the second through fourth branches, each potential match of spaces has to be preceded by something that isn't a space, and in the second and fourth branches, the spaces need to be at the end of the line. Perhaps you made a mistake when editing \w[@\/]\w to (\w[@\/]\w|sftp), which could maybe explain the match you observed. Again, please use Data::Dumper with $Data::Dumper::Useqq=1; or Data::Dump to output strings and regexen in a representative manner. Also tweaked test code a tiny bit.

(Update 2: Why is whitespace being compacted in these <code> tags?? "    #" - hm, probably a stylesheet issue) Also clarified wording in the above update.

Replies are listed 'Best First'.
Re^2: Problem with regex is a bug? or my regex (updated)
by hanspr (Sexton) on Nov 20, 2021 at 14:31 UTC
    Thank you for your guide,

    I managed to create a self contained example that reproduces the problem.

    Now, I ran your test and it works on both perl versions, as you said.

    But running the regex inside Expect.pm fails.

    So the bug is in the expect package?

    chain.txt

    ###################################################################### +########\r\n# This system is a restricted access system. + #\r\n# If collected security information reveals po +ssible criminal activity that #\r\n# exceeds privileges, evidence of + such activity may be provided to the rele- #\r\n# vantauthorities fo +r further action. By continuing past this point, you #\r\n# expressly + consent to this security monitoring. #\r\n############# +#################################################################\r\n +\r\nhostname: ~#
    test.pl
    use strict; use Expect; my $re1 = '(([#%:>~\$\] ])(?!\2)){3,4}|([\w\-\.]*)\$ *$|(\w[@\/]\w|sft +p).*?[#%>~\$\]]|^[#%\$>\:]~] *$'; my $test; open($test,"<","chain.txt"); my $exp = Expect->exp_init($test); $exp->expect(1, [ $re1 => sub { my $exp = shift; print "Match before : ",$exp->before(),"\n"; print "Match : ",$exp->match(),"\n"; print "Match after : ",$exp->after(),"\n"; }] ); close $test;




    hans@hans-desktop ~ perl -v This is perl 5, version 30, subversion 0 (v5.30.0) built for x86_64-li +nux-gnu-thread-multi hans@hans-desktop ~ perl -MExpect -e 'print $Expect::VERSION ."\n";' 1.21 perl test.pl Match before : ####################################################### +#######################\r\n# This system is a restricted access syste +m. #\r\n# If collected security informa +tion reveals possible criminal activity that #\r\n# exceeds privileg +es, evidence of such activity may be provided to the rele- #\r\n# van +tauthorities for further action. By continuing past this point, you # +\r\n# expressly consent to this security monitoring. #\r +\n################################################################### +###########\r\n\r\nhostname Match : : ~# Match after :



    [hans@fedora ~]$ perl -v This is perl 5, version 34, subversion 0 (v5.34.0) built for x86_64-li +nux-thread-multi [hans@fedora ~]$ perl -MExpect -e 'print $Expect::VERSION ."\n";' 1.35 [hans@fedora ~]$ perl test.pl Match before : ####################################################### +#######################\r\n# This system is a restricted access syste +m. Match : # Match after : \r\n# If collected security information reveals possibl +e criminal activity that #\r\n# exceeds privileges, evidence of such + activity may be provided to the rele- #\r\n# vantauthorities for fur +ther action. By continuing past this point, you #\r\n# expressly cons +ent to this security monitoring. #\r\n################## +############################################################\r\n\r\nh +ostname: ~#

      Thanks for posting the details. Expect 1.21 is about ten years older than 1.35 (2007 vs 2017). Since I'm unable to reproduce your issue with Expect 1.35 on both versions of Perl, I am guessing that the issue lies with one of the bugs that was fixed in Expect over those 10 years. I'd say your best course of action is to upgrade the module.

      Update: Sorry, I see now that you're getting your expected behavior on the older version of the module instead of the newer version. The Changelog does mention "Eliminate $` and $' from the code. part of (RT #61395) This fix might break some existing code n some extreme cases when the regex being matched has a lookbehind or a lookahead at the edges." which could potentially be a hint, but finding out if this actually is the issue will take a bit more digging.

        Strange, I can reproduce in my machine, I upgraded to 1.35 and now its broken.
        hans@hans-desktop ~ perl -MExpect -e 'print $Expect::VERSION ."\n";' 1.35 hans@hans-desktop ~ perl test.pl Match before : ####################################################### +#######################\r\n# This system is a restricted access syste +m. Match : # Match after : \r\n# If collected security information reveals possibl +e criminal activity that #\r\n# exceeds privileges, evidence of such + activity may be provided to the rele- #\r\n# vantauthorities for fur +ther action. By continuing past this point, you #\r\n# expressly cons +ent to this security monitoring. #\r\n################## +############################################################\r\n\r\nh +ostname: ~#

      For reference:

      $ perl -v This is perl 5, version 32, subversion 1 (v5.32.1) built for x86_64-li +nux-gnu-thread-multi $ perl -MExpect -e 'print $Expect::VERSION ."\n";' 1.21 $ perl 11138979.pl Match before : ####################################################### +#######################\r\n# This system is a restricted access syste +m. #\r\n# If collected security informa +tion reveals possible criminal activity that #\r\n# exceeds privileg +es, evidence of such activity may be provided to the rele- #\r\n# van +tauthorities for further action. By continuing past this point, you # +\r\n# expressly consent to this security monitoring. #\r +\n################################################################### +###########\r\n\r\nhostname Match : : ~# Match after :

      Good Day,
          Dean

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11138972]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (3)
As of 2022-01-28 23:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    In 2022, my preferred method to securely store passwords is:












    Results (74 votes). Check out past polls.

    Notices?