File Searching

mattwortho has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I was wondering if someone could help me with the following. I have a text file with the following sort of lines: 192.168.0.1 | some text. I have an array containing the ip addresses and would like to look up the text for each one. What is the best way to search for them? It seems a little crazy to go through all the file lines each time I want to find the corresponding text for an ip. Thanks in advance and sorry for the long winded question.

Comment on File Searching Download Code

Replies are listed 'Best First'.
Re: File Searching by kyle (Abbot) on Sep 13, 2007 at 15:33 UTC
Start by reading the IP list into a hash. `open my $iplist_fh, '<', 'iplist.txt' or die "Can't read iplist.txt: $!\n"; my %text_for; while ( <$iplist> ) { chomp; if ( m{ \A # line start ( \d{1,3} (?: \. \d{1,3} ){3} ) # IP address \s+ \\| \s+ # separator ( .* ) # text \z # line end }xms ) { $text_for{$1} = $2; } } close $iplist_fh or die "Can't close??: $!\n";` [download] Note that this assumes each IP has only one entry in your list. If that's not the case, there's another solution for that... Once you have your list in memory, you can get the text of any IP out of it pretty easily. `my @ips_of_interest = qw( 127.0.0.1 10.0.1.51 ); foreach my $ip ( @ips_of_interest ) { print "IP is $ip\n"; print "Text is $text_for{$ip}\n"; }` [download]	[reply] [d/l] [select]
Re^2: File Searching by blazar (Canon) on Sep 14, 2007 at 13:04 UTC
Start by reading the IP list into a hash. If he can trust the "format" of his lines enough, then along with others who partecipated to this thread I would suggest going with split - and I hardly see a risk of "messing up things" if he can not. Also, we recommend all the time not to slurp a whole file in at a time if not for a good reason, but perhaps in this case there's no reason not to: `my %text_for = map { chomp; split /\s\\|\s/, $_, 2 } <$iplist>;` [download] Note that this assumes each IP has only one entry in your list. If that's not the case, there's another solution for that... Of course in that case one couldn't go with the simple map solution. I would do: `while (<$iplist>) { chomp; my ($k,$v) = split /\s\\|\s/, $_, 2; push @{ $text_for{$k} }, $v; }` [download]	[reply] [d/l] [select]
Re: File Searching by naikonta (Curate) on Sep 13, 2007 at 15:34 UTC
Just split on each line with spaces-surrounded \| as separator. You'll get a list with two elements, the first (element 0) is the IP and the second (element 1) is the text. You can store them in a hash to access it later. `my $entries; while (<DATA>) { chomp; my($ip, $text) = split /\s+\\|\s+/; $entries{$ip} = $text; } print $entries{"192.168.0.1"}; # some text __DATA__ 192.168.0.1 \| some text 192.168.0.2 \| another text` [download] Update: Anno makes it better for anticipating unexpected input. I did solely by the fact of the sample input. I know about \s* and limit in split. Open source softwares? Share and enjoy. Make profit from them if you can. Yet, share and enjoy!	[reply] [d/l]
Re^2: File Searching by Anno (Deacon) on Sep 13, 2007 at 18:22 UTC
I agree with the split solution. To make it a little more robust, I'd make spaces around "`\|`" optional (instead of requiring at least one on each side), and give split a limit of 2, so that the text can be allowed to contain "`\|`". Thus: `my($ip, $text) = split /\s\\|\s/, $_, 2;` [download] Anno	[reply] [d/l] [select]
Re^2: File Searching by mattwortho (Acolyte) on Sep 13, 2007 at 15:43 UTC
Thank you very much, that seems to work a lot lot lot faster!!	[reply]
Re: File Searching by GrandFather (Saint) on Sep 13, 2007 at 20:57 UTC
You have good solutions for splitting your string, but do you have to normalize your ip string? As a hash key '192.168.0.1' ne '192.168.000.001'. The following may help: `my $ip = '192.168.0.1'; (my $ip_norm = $ip) =~ s/(\d+)/sprintf "%03d", $1/ge; print "$ip \| $ip_norm";` [download] Prints: `192.168.0.1 \| 192.168.000.001` [download] DWIM is Perl's answer to Gödel	[reply] [d/l] [select]
Re: File Searching by sgt (Deacon) on Sep 13, 2007 at 16:00 UTC
One thing not addressed in the previous two posts is the space-time tradeoff (there is always one). If the file is big, there could be some advantage in transforming the data in a dbm file to which you tie in your search/update program (see 'perldoc perltie').. To take care of duplicates use some marker '$hash{$ip} .= $MARKER. $text' (anon arrays don't play well as dbm-hash values IIRC; choose $marker such that split is easy if you need to spit back the old flat file). cheers --stephan	[reply]

Back to Seekers of Perl Wisdom