Re^3: Searching text files

Update: I Misread your post, your 4MiB are for 3 area codes, thus the same result (1.2MiB per Area Code). Being able to read cleary is an advantage. Sorry.

As you stated, the most practical approach would be to split it by area code. The point where i disagree is, that you think that it would eat up 4MB of space per area code.

rminner@Rosalinde:~$ bc
bc 1.06
Copyright 1991-1994, 1997, 1998, 2000 Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'. 
obase=1024
10^7-1
 0009 0549 0639
last/8
 0001 0196 0719
[download]

i get 10 Million Bits (minus 1) for 7 digits. That would mean roughly 1.2 MiB and not 4MiB. Depending on the amount of memory available, you could load only a limited number of area codes. Like this the data structure for all do not call numbers in one are should be just a little bit more than those 1.2 MiB. Thus 5 Area Codes would only eat up 6 MiB, and as i said, a lookup would be instantaneous (from a user perspective) as it requires only to check one bit. One could allocate a limited number of slots for area codes, and could free them using whatever replacement algorithm one prefers (for example LRU or LFU). Loading should be also fast using File::Slurp, as directly slurping 1MiB into Memory using sysread, should be really fast when DMA is active.(you could also seek directly in the file (as stated by skeeve), reducing it to a single seek statement is also possible, keeping memory consumption even lower, and just requiring a single hd seek.)
The Caching of the bitstring could be done very easily. Simply store the bitstring in a file, with the same name but for example with the extension .bin . Afterwards set the same mtime for the .bin file as for the .txt file. Later if the mtime is identical, you can use your precomputed bitstring and if the mtimes differs, the txt file has been modified, and the .bin file can be recreated from scratch (also shouldn't take more than 1-2 seconds). Like this your data would be always up-to-date just using plain .txt files, but speed should still be more or less instantaneous.

Comment on Re^3: Searching text files Download Code


Clear questions and runnable code get the best and fastest answer
	PerlMonks