XP is just a number | |
PerlMonks |
Re^2: Searching text filesby Skeeve (Parson) |
on Sep 14, 2006 at 23:52 UTC ( [id://573037]=note: print w/replies, xml ) | Need Help?? |
#3 is the idea I like most I don't know much about american phone numbers, but if they all have a fixed length of 10, you'd just need slightly more than 1GB disk space to store one bit for each existing number. I wouldn't create this bit vector in memory. Just create a big enough file, initialized with zeros and then go through your text file and position with fseek to $phone_num >> 3 and set bit number $phone_num & 7. do the same positioning for read access, but check the bit. I think searching will be done in less than a second. Update: Of course you can couple this with the idea of splitting for each area code. This should reduce the summed size of your three files to 1/333 (about 4MB) if the area code has 3 numbers. Update #2: If you have 10 numbers in each phone number and have 2million numbers you already have 21MB disk space used. So the bit vector on disk will save you 16MB. s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{% +.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e
In Section
Seekers of Perl Wisdom
|
|