Re^2: Searching text files

#3 is the idea I like most

I don't know much about american phone numbers, but if they all have a fixed length of 10, you'd just need slightly more than 1GB disk space to store one bit for each existing number.

I wouldn't create this bit vector in memory. Just create a big enough file, initialized with zeros and then go through your text file and position with fseek to $phone_num >> 3 and set bit number $phone_num & 7.

do the same positioning for read access, but check the bit.

I think searching will be done in less than a second.

Update: Of course you can couple this with the idea of splitting for each area code. This should reduce the summed size of your three files to 1/333 (about 4MB) if the area code has 3 numbers.

Update #2: If you have 10 numbers in each phone number and have 2million numbers you already have 21MB disk space used. So the bit vector on disk will save you 16MB.

s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
+.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e

Comment on Re^2: Searching text files Select or Download Code

Replies are listed 'Best First'.

Re^3: Searching text files
by rminner (Chaplain) on Sep 15, 2006 at 05:18 UTC

Update:

rminner@Rosalinde:~$ bc
bc 1.06
Copyright 1991-1994, 1997, 1998, 2000 Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'. 
obase=1024
10^7-1
 0009 0549 0639
last/8
 0001 0196 0719
[download]

[reply]
[d/l]