Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

regex help

by kelscat18 (Initiate)
on Oct 05, 2013 at 19:00 UTC ( [id://1057061]=perlquestion: print w/replies, xml ) Need Help??

kelscat18 has asked for the wisdom of the Perl Monks concerning the following question:

Hi I'm using the following code to extract some words from a file: my  @words = grep(s/[^a-zA-Z0-9]/ /g, @lines); the problem is that the words i want must contain a mix of both letters and numbers.
jHj8nniO - good I87jjj8y - good jUjngnkk - bad ikbHH - bad
the good words are the words that are a mix of letters and numbers. thanks for any help.

Replies are listed 'Best First'.
Re: regex help
by Corion (Patriarch) on Oct 05, 2013 at 19:03 UTC

    Why not simply test the two conditions? First test that the word contains a letter, and in a second test check that the word contains a number?

      well.. that works fine too^^ thanks.
Re: regex help
by kcott (Archbishop) on Oct 05, 2013 at 19:38 UTC

    G'day kelscat18,

    You're using a substitution (i.e. s/pattern/replacement/) when you really want a pattern match (i.e. /pattern/). You're also using the 'g' modifier, which is unnecessary here. Take a look at "perlretut - Perl regular expressions tutorial" to get an understanding of the basics. Here's how I might have coded that (which, I suspect, is close to what Corion had in mind):

    #!/usr/bin/env perl use strict; use warnings; my @tests = qw{jHj8nniO I87jjj8y jUjngnkk ikbHH 12345 !@$%^&*}; my @words = grep { /[A-Za-z]/ && /\d/ } @tests; print "@words\n";

    Output:

    jHj8nniO I87jjj8y

    -- Ken

Re: regex help
by jethro (Monsignor) on Oct 05, 2013 at 20:06 UTC

    Another solution, with only one regex:

    m/\d[a-zA-Z]|[a-zA-Z]\d/;

    This works because in a string with both letters and numbers there has to be at least one location where a letter and a number touch

    Update: To Laurent_R: Absolutely. Clarity and simplicity always wins. Except when this line is in the 3% of code that needs 99,7% of the runtime of a program and you have to optimise for speed

Re: regex help
by Laurent_R (Canon) on Oct 05, 2013 at 21:27 UTC

    This last solution from jethro is clever and effective, but, with such a problem, I would rather take the solution offered by Corion. I think that, faced with a problem like that, it is often better to think is terms of several simple regexes checking individual conditions, rather than building a single more complicated regex to match all cases. Assuming I have to read and understand some undocumented code, I certainly prefer to have something like:

    do_something() if /\d/ and /[A-Za-z]/;

    which tells me immediately that I need at least one letter and one digit, rather than:

    do_something() if /\d[a-zA-Z]|[a-zA-Z]\d/;

    which is quite clear in term of what it does, but less obvious in terms of what the intended underlying rule should really be. Having said that, I also sometimes use these types of supposedly clever shortcuts when they save some typing. But that often implies that I need to add a comment to explain the whole shebang, meaning that I don't save so much typing after all.

Re: regex help
by AnomalousMonk (Archbishop) on Oct 06, 2013 at 03:27 UTC

    (Further to kcott's reply:)

    kelscat18: Not only does the substitution you show in the OP select the wrong strings when used with grep, it changes them and also changes strings in the input array.

    >perl -wMstrict -le "my @lines = qw(aaa 111 a2a2 a2==a2 aa==aa); printf '@lines before: '; printf qq{'$_' } for @lines; print ''; ;; my @words = grep(s/[^a-zA-Z0-9]/ /g, @lines); printf '@lines after: '; printf qq{'$_' } for @lines; print ''; printf '@words: '; printf qq{'$_' } for @words; print ''; " @lines before: 'aaa' '111' 'a2a2' 'a2==a2' 'aa==aa' @lines after: 'aaa' '111' 'a2a2' 'a2 a2' 'aa aa' @words: 'a2 a2' 'aa aa'
Re: regex help
by AnomalousMonk (Archbishop) on Oct 06, 2013 at 04:39 UTC
    ... must contain a mix of both letters and numbers.
    ... good words are the words that are a mix of letters and numbers.

    The specification and example in the OP is a bit unclear to me, but, taken with some of the other replies, leads me to think that a "word" is a string that either:

    1. must contain only alphanumeric characters, with at least one alphabetic character and at least one numeric character; or
    2. may contain any characters, but with at least one alphabetic character and at least one numeric character; or
    3. may contain any characters, but with at least one contiguous alphabetic and numeric character pair in any order.

    The other replies seem to lean toward alternatives 2 and 3 above. My own first guess was for alternative 1, as in the last code examples below:

    >perl -wMstrict -le "my @lines = qw(abc 345 a1 1a a1a 1a1 abc1 1abc a1==a1 a==1); printf '@lines: '; printf qq{'$_' } for @lines; print qq{\n}; ;; printf 'and 1: '; printf qq{'$_' } for grep { /[[:alpha:]]/ && /\d/ } @lines; print ''; ;; printf 'regex 1: '; printf qq{'$_' } for grep m{ [[:alpha:]] \d | \d [[:alpha:]] }xms, @l +ines; print qq{\n}; ;; ;; printf 'and 2: '; printf qq{'$_' } for grep { !/[^[:alnum:]]/ && /[[:alpha:]]/ && /\d/ +} @lines; print ''; ;; my $al_num = qr{ [[:alpha:]] \d | \d [[:alpha:]] }xms; printf 'regex 2: '; printf qq{'$_' } for grep m{ \A [[:alnum:]]* $al_num [[:alnum:]]* \z +}xms, @lines; print qq{\n}; ;; ;; printf '@lines as was: '; printf qq{'$_' } for @lines; " @lines: 'abc' '345' 'a1' '1a' 'a1a' '1a1' 'abc1' '1abc' 'a1==a1' 'a==1 +' and 1: 'a1' '1a' 'a1a' '1a1' 'abc1' '1abc' 'a1==a1' 'a==1' regex 1: 'a1' '1a' 'a1a' '1a1' 'abc1' '1abc' 'a1==a1' and 2: 'a1' '1a' 'a1a' '1a1' 'abc1' '1abc' regex 2: 'a1' '1a' 'a1a' '1a1' 'abc1' '1abc' @lines as was: 'abc' '345' 'a1' '1a' 'a1a' '1a1' 'abc1' '1abc' 'a1==a1 +' 'a==1'

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1057061]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (1)
As of 2024-04-24 13:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found