Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re: Match all Non-0 and Letters

by CountZero (Bishop)
on Jun 24, 2017 at 08:35 UTC ( #1193438=note: print w/replies, xml ) Need Help??


in reply to Match all Non-0 and Letters

Regexes are a cool and important part of your Perl-toolchest. But as with any tool, one must use it wisely.

In this case, you want to distinguish between "good" and "bad" words. Sometimes it is easy to define what is "good" and sometimes it is more easy to define what is "bad".

In this particular case, the definition of a good word is easy: 7 zeroes followed by a digit. It then follows logically that all words that to not comply with this simple format must be "bad". Hence we extract all "good" words and simply drop all others and we don't care in which way they may be bad.

The only regex you need is therefore qr/0{7}\d/ and depending on how the words are presented to you, you may wish to "anchor" the regex in the front or the back to avoid some false positives.

By concentrating upon the "bad" words you made it yourself unnecessary difficult.

CountZero

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

My blog: Imperial Deltronics

Replies are listed 'Best First'.
Re^2: Match all Non-0 and Letters
by arblargan (Acolyte) on Jun 24, 2017 at 22:08 UTC

    All, thank you very much for the help. My apologies with the confusing post as I typed this out before bed last night in desperation. The word extraction happens farther up in the subroutine than I've shown, but by the time it gets to this point, it will always be 8 continuous digits (or letters if there's corruption) not separated by whitespace.

    I realize that using the $D1 and $D2 variables makes the regex much more difficult than it needed to be, but I created those to try and figure out where the regex was failing at. When I tried my initial regex it looked something like this

    if ($Disc =~ /[1-9a-zA-Z]{7}\D/)

    However, this still did not perform the functions that I was wanting. I did try something similar to if ($Disc !~ /0{7}\d/) but I think I may have used a D by mistake. I just tried if ($Disc !~ /(0{7})(\d$)/) and the regex worked great!

    Thank you all for the quick replies and showing the correct syntax for what I'm trying to do. As I mentioned before, I'm relatively new to Perl, so I still have quite a ways to go, especially with the regex syntax.

      The word ... will always be 8 continuous digits (or letters if there's corruption) not separated by whitespace.
      ...
      I just tried if ($Disc !~ /(0{7})(\d$)/) and the regex worked great!

      Note that if  $Disc can ever possibly be longer than eight characters (update: with extra characters at the beginning), that regex will fail:

      c:\@Work\Perl\monks>perl -wMstrict -le "my $Disc = 'foo00000008'; ;; if ($Disc !~ /(0{7})(\d$)/) { print qq{'$Disc' is bad}; } else { print qq{'$Disc' is OK!}; } " 'foo00000008' is OK!
      If the string can only possibly be exactly eight characters, the  $ end-of-string anchor is redundant. OTOH, I would tend to play it safe and include both start-of-string  ^ and end-of-string anchors: it can't hurt, and may save you someday when one of your upstream assumptions fails you.

      The other thing I notice about the  /(0{7})(\d$)/ regex is that  (0{7}) captures a substring that can't possibly be anything other than '0000000', so why bother? (I assume you have some reason for capturing the trailing digit.)

      So what I might end up with would be something like  m{ \A 0{7} (\d) \z }xms (in a testing matrix):

      c:\@Work\Perl\monks>perl -wMstrict -le "for my $Disc (qw( 00000000 00000001 00000002 00000003 00000004 00000005 00000006 00000007 00000008 00000009 0 00 000 0000 00000 000000 0000000 000000000 FFFFFFFF ffffffff 6C163512 x00000000 00000000x x00000000x x0000000 0000000x x0000000x x000000000 000000000x x000000000x ), '') { ;; my $proper_word = my ($righmost_digit) = $Disc =~ m{ \A 0{7} (\d) \z }xms; ;; if ($proper_word) { print qq{'$Disc' ok, rightmost digit '$righmost_digit'}; } else { print qq{'$Disc' is bad}; } } " '00000000' ok, rightmost digit '0' '00000001' ok, rightmost digit '1' '00000002' ok, rightmost digit '2' '00000003' ok, rightmost digit '3' '00000004' ok, rightmost digit '4' '00000005' ok, rightmost digit '5' '00000006' ok, rightmost digit '6' '00000007' ok, rightmost digit '7' '00000008' ok, rightmost digit '8' '00000009' ok, rightmost digit '9' '0' is bad '00' is bad '000' is bad '0000' is bad '00000' is bad '000000' is bad '0000000' is bad '000000000' is bad 'FFFFFFFF' is bad 'ffffffff' is bad '6C163512' is bad 'x00000000' is bad '00000000x' is bad 'x00000000x' is bad 'x0000000' is bad '0000000x' is bad 'x0000000x' is bad 'x000000000' is bad '000000000x' is bad 'x000000000x' is bad '' is bad
      (See also Test::More for more thorough testing possibilities.)


      Give a man a fish:  <%-{-{-{-<

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1193438]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (2)
As of 2020-09-25 22:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    If at first I donít succeed, I Ö










    Results (141 votes). Check out past polls.

    Notices?