Keep It Simple, Stupid | |
PerlMonks |
Re: Pull 3-digit and 4-digit numbers from stringby bitingduck (Chaplain) |
on Apr 10, 2015 at 06:07 UTC ( [id://1123012]=note: print w/replies, xml ) | Need Help?? |
When I do stuff like this I like to regularize the data by stripping out punctuation that makes things more complicated. In most of the US it's not too hard to determine if something is a phone number-- it will generally have 7,10, or 11 numerical digits (except inside companies' private exchanges and a few small towns like Volcano Village, HI) and some form of separators that depend on where whoever wrote it is from and what mood they were in when they wrote it. I included a little twist for extensions, which are usually appended as x\d+, where there may or may not be a space before the x. The example below will strip out the punctuation that's around the numbers then check the length of any runs. If it's in the 7 to 11 range I declare it to be a phone number and anything else is part of an address.
with output
Note that I got lazy and didn't bother pulling out all the numbers within an address string, which I let be lengths other than just your 3 & 4 digit runs. I also miss on numbers like 1-(800)-222-2222, but that's just a little more regex tweaking. I don't strip commas, since I don't think I've ever seen commas used to punctuate a US phone number. They might also be your big flag for lists of apt numbers. If you're dealing with phone numbers in Europe you're probably doomed-- they seem to have random numbers of digits over a very large range.
In Section
Seekers of Perl Wisdom
|
|