Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

regex: only want [a-zA-Z] and comma chars in a string

by heezy (Monk)
on Oct 12, 2003 at 17:13 UTC ( [id://298670]=perlquestion: print w/replies, xml ) Need Help??

heezy has asked for the wisdom of the Perl Monks concerning the following question:

Fellow monks, This unfortunatly is another regex question, I have tried searching for previous postings but it has yielded nothing. There are just too many postings regarding regex!

I want to check that a certain string has only letters (upper and lower) and optional commas.

  • Commas should never be next to each other
  • A comma at the end of the string is not allowed
  • A comma at the start of the string is not allowed

Valid strings

  • "notalot"
  • "england,rugby"
  • "chicken,egg,duck,feet"

Invalid Strings

  • ",southampton"
  • "bristol,"
  • "iran, canada,france"
  • ""

I tried the following but had no luck...

unless ($txt_collection =~ /([a-zA-Z]|([a-zA-Z],))+/) { print "<font color=\"#ff0000\"><i>Incorrect format</i></font>"; $errors ++; }

thanks for your time

-M

Replies are listed 'Best First'.
Re: regex: only want [a-zA-Z] and comma chars in a string
by tilly (Archbishop) on Oct 12, 2003 at 18:36 UTC
    If you are trying to figure out regular expressions, I highly recommend japhy's module YAPE::Regex::Explain.

    When installing it you will have trouble unless you first install YAPE::Regex first. There is a dependency there, but he didn't properly indicate it. I reported that bug.

    If you have trouble installing things (eg you are on Windows and aren't familiar with CPAN and CPANPLUS) you can get the sources by following these links to .\YAPE\Regex.pm and .\YAPE\Regex\Explain.pm. Save those files with those paths (I assumed a Windows delimiter) and then write the following script:

    #! perl use strict; use YAPE::Regex::Explain; print YAPE::Regex::Explain->new(shift)->explain;
    And now you can get explanations like the following:
    tilly@localhost:~$ perl re-explain foo The regular expression: (?-imsx:foo) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- foo 'foo' ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- tilly@localhost:~$ perl re-explain '(foo|bar)' The regular expression: (?-imsx:(foo|bar)) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- foo 'foo' ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- bar 'bar' ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
    (I actually ran this under Linux. On Windows you will want to quote the RE with ", not '.)

    This makes it easier to for beginners to understand what a given regular explanation should do.

Re: regex: only want [a-zA-Z] and comma chars in a string
by davido (Cardinal) on Oct 12, 2003 at 17:34 UTC
    You're close.
    unless ( $tax_collection =~ /^[a-zA-Z]+(?:,[a-zA-Z]+)*?$/ ) { print "<font color=\"#ff0000\"><i>Incorrect format</i></font>"; $errors++; }

    What that does is it says match [a-zA-Z] as many times as possible (the first word) followed by a sequence that can be repeated as many times as necessary (or no times). That sequence may start with a comma, and finish with as many [a-zA-Z] characters as possible. The match is anchored from the front of the string to the end (assuming a single-line string).


    Dave


    "If I had my life to do over again, I'd be a plumber." -- Albert Einstein
Re: regex: only want [a-zA-Z] and comma chars in a string (Don't use a regex!)
by dragonchild (Archbishop) on Oct 12, 2003 at 19:22 UTC
    Try this:
    sub is_valid_string { my $string = shift; # Handle the empty case return 0 unless length $string; # Preserve trailing empty fields my @string = split ',', $string, -1; # No commas at the beginning, end, or next to one another. return 0 if grep { !$_ } @string; return 0 if grep { /[^a-zA-Z]/ } @string; return 1; }

    It was tested with your test strings and passed with flying colors.

    ------
    We are the carpenters and bricklayers of the Information Age.

    The idea is a little like C++ templates, except not quite so brain-meltingly complicated. -- TheDamian, Exegesis 6

    Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

Re: regex: only want [a-zA-Z] and comma chars in a string
by Anonymous Monk on Oct 12, 2003 at 18:03 UTC
    How about just:
    unless($tax_collection ~= /^([a-zA-Z]+|\b,\b)+$/) {
      unless($tax_collection ~= /^([a-zA-Z]+|\b,\b)+$/) {...

      No, that's not going to work because it allows the comma to appear at the beginning and/or end of the string, which doesn't meet the original post's spec. Beginning and End of string count as word boundries, so your method fails.

      Update: Now it's a matter of public record: I made a simple mistake in intrepreting Anonymous Monk's regexp. (S)He is correct in his assertion. The method works.


      Dave


      "If I had my life to do over again, I'd be a plumber." -- Albert Einstein
        Yes, Beginning and End of string count as word boundaries, but only adjacent to a \w character, which ',' is not a member of.
Re: regex: only want [a-zA-Z] and comma chars in a string
by delirium (Chaplain) on Oct 12, 2003 at 20:56 UTC
    If you're new to regexes, it might be helpful (although it will slow down processing slightly) to break this down into each thing you are trying to test. That may help you later when coming back to the code to figure out what you were testing for.

    For example:

    print "$_\n" if !/^,/ && !/,$/ && !/,,/ && /^[a-zA-Z,]$/;

Re: regex: only want [a-zA-Z] and comma chars in a string
by Anonymous Monk on Oct 12, 2003 at 17:27 UTC
    @strings = ("notalot", "england,rugby" , "chicken,egg,duck,feet" , ",southampton" , "bristol," , "iran, canada,france" , "" , ); for (@strings) { print "$_\n" if /^[a-zA-Z]([a-zA-Z]+|,(?!,|$))*$/; }
Re: regex: only want [a-zA-Z] and comma chars in a string
by jeroenes (Priest) on Oct 12, 2003 at 17:36 UTC
    Hi heezey,

    Why not try without regex?

    print "yes" if not tr/a-zA-Z,//c and not ( str( $_, ',', 0) or str $_, + ',' ,-1);
    Did not check it but should work.

    Cheers,

    Jeroen "We are not alone"(FZ)

    After some self-flaggalation:

    for (qw/ ,asd asd:asd asd, asd,asd asd,,asd/){ my $test = $_; print "$test: "; print "yes" unless $test=~tr/a-zA-Z,//c or index( $test, ',') == 0 or + index( $test, ',') == -1 + length( $test) or index( $test, ',,') > 0 +; print "\n"; }

    My perl clearly is rusty, and I even don't know anymore in which language str is valid ;).

    However, still possible without regexes {grin}.

      bzzt! Why post guesses when 10 seconds would have allowed you to check? The strings are not allowed to have consecutive commas, and str($_,',',0) isn't even a valid Perl function.
Re: regex: only want [a-zA-Z] and comma chars in a string
by mshiltonj (Sexton) on Oct 13, 2003 at 22:02 UTC
    This example...
    -----
    #!/usr/bin/perl -w use strict; my %commas = ( 'notalot' => 1, 'england,rugby' => 1, 'chicken,egg,duck,feet' => 1, ',southampton' => 0, 'bristol,' => 0, 'iran,,canada,france' => 0, ); foreach my $key (keys %commas) { unless (($key =~ /^,|,,|,$/)) { print "PASS"; } else { print "FAIL"; } print ": $key\n"; }
    ----
    ... seems to work for me. This was it's output:
    FAIL: bristol, PASS: notalot PASS: england,rugby FAIL: iran,,canada,france PASS: chicken,egg,duck,feet FAIL: ,southampton
      Ahhh ... the pitfalls of coding to the test cases instead of the specification. The OP specifically stated that he needed the characters to be in the class [a-zA-Z], which your code doesn't check for. Try the following string "1", which should fail.

      ------
      We are the carpenters and bricklayers of the Information Age.

      The idea is a little like C++ templates, except not quite so brain-meltingly complicated. -- TheDamian, Exegesis 6

      ... strings and arrays will suffice. As they are easily available as native data types in any sane language, ... - blokhead, speaking on evolutionary algorithms

      Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

      unless (($key =~ /^,|,,|,$/))

      Cute, testing for the negative is much more readable here ... but probably something I wouldn't have tried. You need to check for the characters being in the valid class though (and I'd also put brackets around the anchors) so that would make it...

      unless (($key =~ /(?:^,)|,,|[^a-zA-Z,]|(?:,$)))

      Which isn't quite as nice anymore :(. So I'd probably still use the "obvious" non-negative...

      if (($key =~ /^[a-zA-Z]+(?:,[a-zA-Z]+)*$))

      Or if I was feeling really special, I might even do...

      my $field = qr([a-zA-Z]+); if (($key =~ /^$field(?:,$field)*$))

      ...which is slightly more readable IMO.

      update (broquaint): changed <pre> to <code> tags

Re: regex: only want [a-zA-Z] and comma chars in a string
by Anonymous Monk on Oct 15, 2003 at 03:00 UTC
    Hello Monks

    I really learned some things here. My solution forgot to include the ^ and $ anchors at first and didn't work. After reading Dragonchilds reply, I went back to examine the behavior of split (was instructive). By suppling the -1 (infinitely large) number of field to split into, he preseved the trailing item. Really liked Nevyn's solution - very staightforward and clear, easy to understand.

    Once I put in the anchors, my regex worked correctly - surprise!

    my $pat = qr{^([A-Za-z]|[A-Za-z],(?=[A-Za-z]))+$};

    Nice to meet all,

    Chris

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://298670]
Approved by gmax
Front-paged by davido
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (4)
As of 2024-04-24 11:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found