Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

Did you ever settle upon a solution?

For grins, I just ran a test that looked up 1000 randomly generated 10-digit telephone numbers (nnn-nnn-nnnn) in a flatfile database containing approximately 6.6% (2e6 / 3e7) of the 1e10 numbers:

c:\test>572961 9991230061 9991230061 is not found 9991230062 9991230062 is found 9991230063 9991230063 is not found Terminating on signal SIGINT(2) c:\test>perl -wle"printf qq[%03d%03d%04d\n], int( rand 1000 ), int( ra +nd 1000 ), int( rand 10000 ) for 1 .. 1e3" | perl 572961.pl >nul File for area code '000' not found at 572961.pl line 12, <STDIN> line +57. 999 trials of lookup (32.287s total), 32.319ms/trial

Each lookup takes around 33 ms which ought to be quick enough for most purposes.

The disk files (for all 999 possible area codes) require 10 GB, though that could trivially be reduced to 2.5 GB. Each area code is stored in a separate file, with one line of 10,000 characters for each of the 999 subarea codes; and each byte in the line representing a single telephone number by a simple '0' or '1'.

The lookup process is:

  1. Split the number into it's 3 component parts. (nnn-nnn-nnnn);
  2. Open the appropriate areacode file.
  3. Seek to the appropriate subarea line and read it.
  4. substr the appropriate byte of the line and it's value tells you whether the number is 'found' or 'not found'.

Care to trade 10 MB (2.5 MB) of diskspace per area code for 32 ms lookup time regardless of how the application grows?

#! perl -slw use strict; use Benchmark::Timer; my $T = new Benchmark::Timer; while( my $number = <STDIN> ) { chomp $number; $T->start( 'lookup' ); if( my( $area, $subarea, $no ) = $number =~ m[^(\d{3})(\d{3})(\d{4 +})$] ) { open FILE, '<', "./tele/$area" or warn "File for area code '$area' not found" and next; seek FILE, ( $subarea - 1 ) * 10002, 0; my $mask = <FILE>; print "$number is ", ( substr $mask, ( $no - 1 ), 1 ) ? 'found' : 'not found'; } else { print "Invalid telephone number: $number"; } $T->stop( 'lookup' ); } $T->report;

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

In reply to Re: Searching text files by BrowserUk
in thread Searching text files by SteveS832001

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (2)
As of 2024-04-25 20:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found