Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

pricing and phone number regexes

by coldfingertips (Pilgrim)
on Jun 05, 2004 at 02:28 UTC ( [id://361324]=perlquestion: print w/replies, xml ) Need Help??

coldfingertips has asked for the wisdom of the Perl Monks concerning the following question:

There is a text file I need to re-organize so it can be placed into a database in a specific order. A sample of this file can be found below along with an attempt I made in getting things to fall into place.

the file

You’ll love this Beautiful Cape Cod on the Boardman side of Southern B +lvd. We can do a Rent-to-Own for a short time if you can qualify quickly for this 2-bedroom home wtih all appliances, full +basement and garage, with payments as easy as rent. We have only one home at these terms, so call 330-207-0989 NOW - it won’t + last long! ==== Other Mahoning County $154,900 Columbiana, Lake Arrowhead Unique, condo home w/sunroom, fireplace & more! $154,900. Lease from $ +995. A must see! 330- 702-8787 702-8787 ==== Campbell $400 ***BUY With “0” Down or land Contract. $400/mo. completely remodeled 3 + Bedroom Home in good area near High School. 291 Sanderson Ave. 330-759-2420 or cashflowforlife.com ==== Learn how you can stop renting and buy your own home, even with $0 dow +n and bad credit. Pre-recorded message explains how to order your special FREE report. CALL 24 HOURS/DAY Toll Free 1-888-534-9659 ID#3006 or visit our website at zerodownhousesforsale.com ==== Boardman $157,000 COLONIAL/1 ACRE WOW! We finally found the 4 bedroom home you’ve been looking for. Call our +office today for details and directions for the best buy in Boardman. Appraised at $157,000. Asking ONLY $139,900 David Realty 330-758-8363 330-758-8363 ==== Boardman $475 A BARGAIN - 0 DOWN $475 & up/mo. 2 Bedroom, extra sharp ranch. Ready to occupy, not for rent. Easy to p +urchase. 2 homes available. 7372 Oregon Trail 7421 Siera Madre All Credit Considered Jim Rich Realty 330-783-9300 ==== North Jackson JACKSON MEADOWS Custom built 3&4 bedroom homes 330-538-4663 or 330-503-1985 330-538-4663 ==== Dogs PLEASE HELP! Being relocated-looking for a good home for my best frien +d 2 yr old blonde Cocker Spaniel well trained Loves to play Frisbee & Squeezy ball 330-797-8724 ==== Austintown STANJIM HOMES 1850 Countryside Drive Kingwood Real Estate 793-2010 793-2010 ==== Boardman $339,900 JUST move in! this home has over 3000 sq-ft; $339,900. Cocca Real Estate 330-758-9904 330-758-9904 ====
I have to read through this file and resort all the information in: phone1, phone2, price, everything else order. An example would be
330-758-9904,330-758-9904,$339,900,Boardman JUST move in! this home ha +s over 3000 sq-ft;. Cocca Real Estate
The problem lies in the regexes. They can have more than one phone number so the first one it finds has to be first, then the second would be the second field (or you would get a phone1,,price set.

This regex I used on another script doesn't seem to be fully functional and it doesn't match 1-800 numbers. m|[(]?(\d{3})[. )-][ ]?(\d{3}\d?)[. -](\d{4})|g . This regex is for the prices but it only matches numbers that have a comma in them OR just $0, it refuses to match any other number including $475 which is in the text file. m/(\$([0-9,?])+)\s/; Can someone possibly take a look at the regexes and give me a better solution or possibly give ideas of a better way of going about doing something like this? This is the code-to-date

#!/usr/bin/perl use warnings; use strict; ##################### # configuration section ##################### my $readfrom = "test.txt"; ## change the above to the file you're reading from my $writeto = "output.txt"; ## change the above to the file you're writing to ##################### # do not edit below this line ##################### $/="====\n\n"; open (READFROM, "$readfrom") or die "Cannot open $readfrom: $!"; open (WRITETO, ">$writeto") or die "Cannot open $writeto: $!"; while ( <READFROM> ) { chomp; ## doesn't work # my $phone; # if (m|[(]?(\d{3})[. )-][ ]?(\d{3}\d?)[. -](\d{4})|g) # { # $phone = "$1 $2 $3"; # } # print "$phone\n"; ## $_ =~ m/(\$([0-9,?])+)\s/; my $price = $1; print "$price\n"; } close (WRITETO) or die "Cannot close $writeto: $!"; close (READFROM) or die "Cannot close $readfrom: $!";

Replies are listed 'Best First'.
Re: pricing and phone number regexes
by davido (Cardinal) on Jun 05, 2004 at 03:11 UTC
    These are untested, and still have limitations, on which I'll elaborate in a moment:

    • Phone numbers (USA only):

      m/ (?: (?:1-?)? (?:\(?[0-9]{3}\)?-)? )? [0-9]{3}-?[0-9]{4} /x

    • And now for prices:

      m/ \$ (?:\d{1,3},?)+ (?:\.\d{0,2})? (?![.\d]) /x

    The limitations: With respect to phone numbers, if the number is something like 1-800-GO-FISH you're out of luck. Also, it has to be a USA number of seven or ten, or eleven (including the leading 1) digits.

    With respect to the pricing, you have to have something in front of the optional decimal place, and you have to have a dollar sign. Otherwise it will fail to match. Oh, and if the price is "One Million Dollars!" you won't capure it, since it's not numeric.

    There are probably other limitations as well.

    You should also consider looking at Number::Phone::US. Here's a snippet:

    use Number::Phone::US qw/is_valid_number/; print "Phone: $number\n" if is_valid_number( $number );

    Also, be sure to have a look at Regexp::Common::number. You might be able to either make use of it, or glean some knowledge from its source code.

    UPDATE: I've had time to test and rework them a little now. I fixed the missing ')' in the phone number RE, and added some logic to invalidate prices that have more than two digits to the right of the decimal point.

    PS: I recommend passing my RE's through YAPE::Regex::Explain so that you can see a detailed description of what they do. It's pretty easy to use.


    Dave

      I can't seem to get any of those to work, they keep producing use uninitialized errors when I try to print the results. I know you said it was unterested but can you tell me where the missing ) is supposed to go in the phone regex? I've tried a few spots but I can't figure out where it's supposed to be.

      Thanks!

        I just did some checking and testing (finally had a moment to do it). I've fixed both RE's. Actually the second one worked all along, but had a problem with allowing prices with more than two decimal places. That's fixed. And the phone number RE was missing a close paren.

        Anyway, it's sorted out now, and the versions now appearing in my original followup are correct, still with the limitations I mentioned before.


        Dave

Re: pricing and phone number regexes
by karmacide (Acolyte) on Jun 05, 2004 at 02:52 UTC
    Sorry in advance, but this probably won't be the best answer you could get, (but hey, i'm reasonably drunk now so..)

    Zen teaches that it is as important to understand what you cannot be aware of as that which you can.

    Hence, it seems to me that the problem is not your regexs, that you are trying not only to match two different sets of numbers, whilst relying on normal people to apply a convention to the numbers. This is not something you can guarantee.

    Perhaps try to preprocess the number within the file to the standards you expect to see, in order to differeniate between prices and telephone numbers. It is, I believe, unlikely that a telephone number (in terms of numerical value) could be mistaken for a price, or vice versa (in terms of currency notation).

    damn I just re-read that, and although I stand by my advice, doesn't it sound pretentious? :P

    Although to be fair, I ain't a programmer. But the reason bioinformaticians use PERL is essentially just because of the excellent regexes.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://361324]
Approved by davido
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (4)
As of 2024-04-19 04:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found