Re: Simple Regex
by Juerd (Abbot) on Apr 08, 2002 at 17:12 UTC
|
You love "5", don't you? :)
I'd probably filter all non-alphanumerics, and then split on any string of letters.
while (<DATA>) {
chomp;
tr/A-Za-z0-9//cd;
my ($number, $extension, $overflow) = split /[A-Za-z]+/;
if ($overflow) {
warn "Don't know how to handle number '$_'.\n";
next;
}
print "Number: $number";
print ", extension: $extension" if defined $extension and length $
+extension;
print "\n";
}
__DATA__
(555) 555-5555
555.555.5555
555-555-5555
(555)555.5555
(555) 555-5555 x.555
555.555.5555 Ext. 555
555-555-5555 ext.555
U28geW91IGNhbiBhbGwgcm90MTMgY
W5kIHBhY2soKS4gQnV0IGRvIHlvdS
ByZWNvZ25pc2UgQmFzZTY0IHdoZW4
geW91IHNlZSBpdD8gIC0tIEp1ZXJk
| [reply] [Watch: Dir/Any] [d/l] |
|
Good solution, cutting to the chase.
However, you still have to worry about malformations, such as phone numbers that aren't 7 or 10 digits. Often times, people will want to have the area code somewhere else, too. *shrugs*
------ We are the carpenters and bricklayers of the Information Age. Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.
| [reply] [Watch: Dir/Any] |
|
However, you still have to worry about malformations, such as phone numbers that aren't 7 or 10 digits.
So every database has numbers only within a single country? Not like any database I've ever used. I even thought about not filtering out leading plusses, but didn't do so, because I think this was homework anyway - and there should still be a challenge.
As for 7 or 10 digits, I had no idea about how other countries have their telephone numbers, and think I should not guess.
The fix: check length.
U28geW91IGNhbiBhbGwgcm90MTMgY
W5kIHBhY2soKS4gQnV0IGRvIHlvdS
ByZWNvZ25pc2UgQmFzZTY0IHdoZW4
geW91IHNlZSBpdD8gIC0tIEp1ZXJk
| [reply] [Watch: Dir/Any] |
|
|
|
|
You love "5", don't you? :)
The 555 area code is a well-known area code that
appears only in Hollywood movies and other fiction
stuff. There is even a web page about that:
http://home.earthlink.net/~mthyen/
So, if you want to "sanitize" a piece of code
containing phone numbers (for
privacy reasons and to fight spam... er telemarketing),
you replace these by phone numbers from the 555-area.
Yet, maybe Sevrin could have said 555-1234-5678
or 555-2002-0408 :-)
update
I have forgotten the following example. In
Mac Perl, Power and Ease (published by Prime
Time Freeware http://www.ptf.com/), both authors
(Vicki Brown and Chris Nandor) give
their phone numbers:
$phone{"Vicki"} = "555-1234";
$phone{"Chris"} = "555-4321";
You can read it on-line at
http://ptf.com/macperl/ptf_book/r/MP/120.SS.html#03
Another update. Sevrin, may be you could look at some of
the modules you get in
http://search.cpan.org/search?mode=module&query=phone | [reply] [Watch: Dir/Any] |
Re: Simple Regex
by buckaduck (Chaplain) on Apr 08, 2002 at 17:12 UTC
|
I think you mean the first 10 digits are the phone number, not the first 7.
If you can assume that the first 10 digits are the phone number, and the remaining digits are the extension (which is a big assumption):
# Get rid of all non-digits
$number =~ s/\D//g;
# Break the number into groups of digits
my $phone = substr($number,0,10);
my $extension = substr($number,10);
buckaduck | [reply] [Watch: Dir/Any] [d/l] |
Re: Simple Regex
by ilcylic (Scribe) on Apr 08, 2002 at 17:29 UTC
|
Another thing you might want to consider is looking in the string for an "x" somewhere, grabbing the characters after it which include at least one number, up to the first space you see, and moving that whole string (x 534, ext 611, xt9411, etc) to the back of the overall string, in order to ensure that the number string you have left over once you've done your s/\D*/g has the areacode and phonenumber at the beginning.
If you have ext 433 (505)666-7777, you don't want to just strip the non digit chars and substr the first 10 digits as the phone number.
Of course, if you know that the phone numbers always have the extension second (because of the way they were put into the database) then you don't have to worry about it.
Good luck.
-il cylic | [reply] [Watch: Dir/Any] [d/l] |
Re: Simple Regex
by jwest (Friar) on Apr 08, 2002 at 17:19 UTC
|
One way to do it is to eliminate all of your problem characters first. This way, you won't have to think through a more complex RE:
s/\D//g;
/(\d{10})(\d*)/;
Of course, this assumes that all phone numbers have the correct numbers in all the right places (three digit area code, seven digit number, and possibly an extension). It'll probably be right for a good number of rows, but the only way to be sure is to eyeball the output and compare it.
Hope this helps!
--jwest
-><- -><- -><- -><- -><-
All things are Perfect
To every last Flaw
And bound in accord
With Eris's Law
- HBT; The Book of Advice, 1:7
| [reply] [Watch: Dir/Any] [d/l] |
|
And then there are the local versions of non-U.S. numbers, and the long distance versions of dialing them... Unless the application is itself going to be involved in dialing, it may be permissible to leave them as is, and suggest database users to edit/correct the fields on subsequent viewings on the fly.
| [reply] [Watch: Dir/Any] |
Re: Simple Regex
by sevrin (Initiate) on Apr 08, 2002 at 18:05 UTC
|
Thanks, everyone. Combined, you've given me enough to go on to solve the problem myself, which is the way it should be.
/Scott
| [reply] [Watch: Dir/Any] |
Re: Simple Regex
by mrbbking (Hermit) on Apr 08, 2002 at 18:34 UTC
|
I use this to format US phone numbers as 10 straight digits. Once you do this, you can use substr to split it up and insert parens or dots or hyphens or what have you...
sub format_phone {
my @out = @_;
foreach (@out){
tr/a-cA-C/2/; tr/d-fD-F/3/; # change letters to digits.
tr/g-iG-I/4/; tr/j-lJ-L/5/;
tr/m-oM-O/6/; tr/p-sP-S/7/;
tr/t-vT-V/8/; tr/w-zW-Z/9/;
s/[^\d]//g; # remove non-digits.
s/^1//g; # remove first digit if it's a one.
$_ = pack( 'A10', $_ ); # Only take the first ten digits.
}
return wantarray ? @out : $out[0];
}
Since there seems to be no standard place for the 'Q' or the 'Z' on the numeric keypad, I put them where I like them; sequentially. I've seen some phones that put them both on the nine or the zero, for some reason passing understanding. If you want them somewhere else, I won't complain.
| [reply] [Watch: Dir/Any] [d/l] |