Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

File Searching

by mattwortho (Acolyte)
on Sep 13, 2007 at 15:21 UTC ( #638823=perlquestion: print w/replies, xml ) Need Help??

mattwortho has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I was wondering if someone could help me with the following. I have a text file with the following sort of lines: | some text. I have an array containing the ip addresses and would like to look up the text for each one. What is the best way to search for them? It seems a little crazy to go through all the file lines each time I want to find the corresponding text for an ip. Thanks in advance and sorry for the long winded question.

Replies are listed 'Best First'.
Re: File Searching
by kyle (Abbot) on Sep 13, 2007 at 15:33 UTC

    Start by reading the IP list into a hash.

    open my $iplist_fh, '<', 'iplist.txt' or die "Can't read iplist.txt: $!\n"; my %text_for; while ( <$iplist> ) { chomp; if ( m{ \A # line start ( \d{1,3} (?: \. \d{1,3} ){3} ) # IP address \s+ \| \s+ # separator ( .* ) # text \z # line end }xms ) { $text_for{$1} = $2; } } close $iplist_fh or die "Can't close??: $!\n";

    Note that this assumes each IP has only one entry in your list. If that's not the case, there's another solution for that...

    Once you have your list in memory, you can get the text of any IP out of it pretty easily.

    my @ips_of_interest = qw( ); foreach my $ip ( @ips_of_interest ) { print "IP is $ip\n"; print "Text is $text_for{$ip}\n"; }
      Start by reading the IP list into a hash.

      If he can trust the "format" of his lines enough, then along with others who partecipated to this thread I would suggest going with split - and I hardly see a risk of "messing up things" if he can not.

      Also, we recommend all the time not to slurp a whole file in at a time if not for a good reason, but perhaps in this case there's no reason not to:

      my %text_for = map { chomp; split /\s*\|\s*/, $_, 2 } <$iplist>;
      Note that this assumes each IP has only one entry in your list. If that's not the case, there's another solution for that...

      Of course in that case one couldn't go with the simple map solution. I would do:

      while (<$iplist>) { chomp; my ($k,$v) = split /\s*\|\s*/, $_, 2; push @{ $text_for{$k} }, $v; }
Re: File Searching
by naikonta (Curate) on Sep 13, 2007 at 15:34 UTC
    Just split on each line with spaces-surrounded | as separator. You'll get a list with two elements, the first (element 0) is the IP and the second (element 1) is the text. You can store them in a hash to access it later.
    my $entries; while (<DATA>) { chomp; my($ip, $text) = split /\s+\|\s+/; $entries{$ip} = $text; } print $entries{""}; # some text __DATA__ | some text | another text
    Update: Anno makes it better for anticipating unexpected input. I did solely by the fact of the sample input. I know about \s* and limit in split.

    Open source softwares? Share and enjoy. Make profit from them if you can. Yet, share and enjoy!

      I agree with the split solution. To make it a little more robust, I'd make spaces around "|" optional (instead of requiring at least one on each side), and give split a limit of 2, so that the text can be allowed to contain "|". Thus:
      my($ip, $text) = split /\s*\|\s*/, $_, 2;
      Thank you very much, that seems to work a lot lot lot faster!!
Re: File Searching
by GrandFather (Saint) on Sep 13, 2007 at 20:57 UTC

    You have good solutions for splitting your string, but do you have to normalize your ip string? As a hash key '' ne ''. The following may help:

    my $ip = ''; (my $ip_norm = $ip) =~ s/(\d+)/sprintf "%03d", $1/ge; print "$ip | $ip_norm";

    Prints: |

    DWIM is Perl's answer to Gödel
Re: File Searching
by sgt (Deacon) on Sep 13, 2007 at 16:00 UTC

    One thing not addressed in the previous two posts is the space-time tradeoff (there is always one). If the file is big, there could be some advantage in transforming the data in a dbm file to which you tie in your search/update program (see 'perldoc perltie')..

    To take care of duplicates use some *marker* '$hash{$ip} .= $MARKER. $text' (anon arrays don't play well as dbm-hash values IIRC; choose $marker such that split is easy if you need to spit back the old flat file).

    cheers --stephan

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://638823]
Approved by Joost
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (3)
As of 2023-06-03 14:17 GMT
Find Nodes?
    Voting Booth?
    How often do you go to conferences?

    Results (14 votes). Check out past polls.