regex help

kelscat18 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: regex help by Corion (Patriarch) on Oct 05, 2013 at 19:03 UTC
Why not simply test the two conditions? First test that the word contains a letter, and in a second test check that the word contains a number?	[reply]
Re^2: regex help by kelscat18 (Initiate) on Oct 05, 2013 at 19:16 UTC
well.. that works fine too^^ thanks.	[reply]
Re: regex help by kcott (Archbishop) on Oct 05, 2013 at 19:38 UTC
G'day kelscat18, You're using a substitution (i.e. `s/pattern/replacement/`) when you really want a pattern match (i.e. `/pattern/`). You're also using the '`g`' modifier, which is unnecessary here. Take a look at "perlretut - Perl regular expressions tutorial" to get an understanding of the basics. Here's how I might have coded that (which, I suspect, is close to what Corion had in mind): `#!/usr/bin/env perl use strict; use warnings; my @tests = qw{jHj8nniO I87jjj8y jUjngnkk ikbHH 12345 !@$%^&*}; my @words = grep { /[A-Za-z]/ && /\d/ } @tests; print "@words\n";` [download] Output: `jHj8nniO I87jjj8y` [download] -- Ken	[reply] [d/l] [select]
Re: regex help by jethro (Monsignor) on Oct 05, 2013 at 20:06 UTC
Another solution, with only one regex: `m/\d[a-zA-Z]\|[a-zA-Z]\d/;` [download] This works because in a string with both letters and numbers there has to be at least one location where a letter and a number touch Update: To Laurent_R: Absolutely. Clarity and simplicity always wins. Except when this line is in the 3% of code that needs 99,7% of the runtime of a program and you have to optimise for speed	[reply] [d/l]
Re: regex help by Laurent_R (Canon) on Oct 05, 2013 at 21:27 UTC
This last solution from jethro is clever and effective, but, with such a problem, I would rather take the solution offered by Corion. I think that, faced with a problem like that, it is often better to think is terms of several simple regexes checking individual conditions, rather than building a single more complicated regex to match all cases. Assuming I have to read and understand some undocumented code, I certainly prefer to have something like: `do_something() if /\d/ and /[A-Za-z]/;` which tells me immediately that I need at least one letter and one digit, rather than: `do_something() if /\d[a-zA-Z]\|[a-zA-Z]\d/;` which is quite clear in term of what it does, but less obvious in terms of what the intended underlying rule should really be. Having said that, I also sometimes use these types of supposedly clever shortcuts when they save some typing. But that often implies that I need to add a comment to explain the whole shebang, meaning that I don't save so much typing after all.	[reply] [d/l] [select]
Re: regex help by AnomalousMonk (Archbishop) on Oct 06, 2013 at 03:27 UTC
(Further to kcott's reply:) kelscat18: Not only does the substitution you show in the OP select the wrong strings when used with grep, it changes them and also changes strings in the input array. `>perl -wMstrict -le "my @lines = qw(aaa 111 a2a2 a2==a2 aa==aa); printf '@lines before: '; printf qq{'$_' } for @lines; print ''; ;; my @words = grep(s/[^a-zA-Z0-9]/ /g, @lines); printf '@lines after: '; printf qq{'$_' } for @lines; print ''; printf '@words: '; printf qq{'$_' } for @words; print ''; " @lines before: 'aaa' '111' 'a2a2' 'a2==a2' 'aa==aa' @lines after: 'aaa' '111' 'a2a2' 'a2 a2' 'aa aa' @words: 'a2 a2' 'aa aa'` [download]	[reply] [d/l]
Re: regex help by AnomalousMonk (Archbishop) on Oct 06, 2013 at 04:39 UTC
... must contain a mix of both letters and numbers. ... good words are the words that are a mix of letters and numbers. The specification and example in the OP is a bit unclear to me, but, taken with some of the other replies, leads me to think that a "word" is a string that either: must contain only alphanumeric characters, with at least one alphabetic character and at least one numeric character; or may contain any characters, but with at least one alphabetic character and at least one numeric character; or may contain any characters, but with at least one contiguous alphabetic and numeric character pair in any order. The other replies seem to lean toward alternatives 2 and 3 above. My own first guess was for alternative 1, as in the last code examples below: >perl -wMstrict -le "my @lines = qw(abc 345 a1 1a a1a 1a1 abc1 1abc a1==a1 a==1); printf '@lines: '; printf qq{'$_' } for @lines; print qq{\n}; ;; printf 'and 1: '; printf qq{'$_' } for grep { /[[:alpha:]]/ && /\d/ } @lines; print ''; ;; printf 'regex 1: '; printf qq{'$_' } for grep m{ [[:alpha:]] \d \| \d [[:alpha:]] }xms, @l +ines; print qq{\n}; ;; ;; printf 'and 2: '; printf qq{'$_' } for grep { !/[^[:alnum:]]/ && /[[:alpha:]]/ && /\d/ +} @lines; print ''; ;; my $al_num = qr{ [[:alpha:]] \d \| \d [[:alpha:]] }xms; printf 'regex 2: '; printf qq{'$_' } for grep m{ \A [[:alnum:]]* $al_num [[:alnum:]]* \z +}xms, @lines; print qq{\n}; ;; ;; printf '@lines as was: '; printf qq{'$_' } for @lines; " @lines: 'abc' '345' 'a1' '1a' 'a1a' '1a1' 'abc1' '1abc' 'a1==a1' 'a==1 +' and 1: 'a1' '1a' 'a1a' '1a1' 'abc1' '1abc' 'a1==a1' 'a==1' regex 1: 'a1' '1a' 'a1a' '1a1' 'abc1' '1abc' 'a1==a1' and 2: 'a1' '1a' 'a1a' '1a1' 'abc1' '1abc' regex 2: 'a1' '1a' 'a1a' '1a1' 'abc1' '1abc' @lines as was: 'abc' '345' 'a1' '1a' 'a1a' '1a1' 'abc1' '1abc' 'a1==a1 +' 'a==1' [download]	[reply] [d/l]


Problems? Is your data what you think it is?
	PerlMonks