Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Counting words..

by No-Lifer (Initiate)
on Oct 24, 2005 at 14:54 UTC ( [id://502480]=perlquestion: print w/replies, xml ) Need Help??

No-Lifer has asked for the wisdom of the Perl Monks concerning the following question:

Dear all, A few things I can't figure out - the first, what does the following mean? I see it crop up in Form Validation scripts all the time, but can't seem to find anywhere that explains it properly. It's in the script I've bent to my own uses, but even if I comment it out, the script seems to run properly. Hrm.

$value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;

Secondly, and more importantly, how would I begin to write a function to count the number of times a specific word appears on a page. For example - a search box, returning "Your search for $keyword returned X number of results on the page". Can't seem to fathom it.

I assume that if (/$keyword/i) would only find the first occurance of the word on the page? Note: I have the search/output all working, it's just counting the number of times the $keyword is present that's got me stumped.

Just don't know enough about this perl thing :)

Many thanks,

NL.

Replies are listed 'Best First'.
Re: Counting words..
by BrowserUk (Patriarch) on Oct 24, 2005 at 15:17 UTC
    ... what does the following mean?

    $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;

    It decodes url-encoded text. Ie. When you see url's that look have characters that have been encoded, eg. http://news.bbc.co.uk/1/hi/technology/default%20.stm, it looks for the %xx encoding and translates that back to an ascii character.

    s/ % ## find (and discard) the % char ( ## capture [a-fA-F0-9][a-fA-F0-9] ## two hex charcters to $1 ) / pack("C", hex($1)) ## convert hexcharacters to a number +, ## then pack to a character /eg; ## replace all in the target string

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Counting words..
by ikegami (Patriarch) on Oct 24, 2005 at 15:08 UTC

    The substitution decodes URL-encoded strings. It does the same uri_unescape in URI::Escape. CGI already decodes parameters for you. Is there any reason you're not using that module?

    my $count = () = /$keyword/gi; will count the number of occurances. The () forces a list context.

Re: Counting words..
by japhy (Canon) on Oct 24, 2005 at 15:10 UTC
    The form validation code you've shown takes a URL-encoded escape sequence (like %7E) and replaces it with the character it encoded (like "~"). But you should let CGI.pm take care of your form processing. It does it right.

    As for the number of times a pattern appears in a string, I'd suggest: my $count = 0;  ++$count while $string =~ /$pattern/g;


    Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
    How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart
Re: Counting words..
by mulander (Monk) on Oct 24, 2005 at 15:08 UTC
    Here is one way to do it:
    perl -ne '$count++ while /seek/ig; print $count,"\n" if eof;' file.txt
    The /g modifier tells the regex to search for more matches if possible, and for each match the while loop's code block is executed ( $count++ in this case ). So this oneliner will tell you how many times it saw a 'seek' in a file.
Re: Counting words..
by lepetitalbert (Abbot) on Oct 24, 2005 at 15:06 UTC
Re: Counting words..
by Not_a_Number (Prior) on Oct 24, 2005 at 16:43 UTC

    Wrt the wordcount, you say that you want to match a specific word, but all the solutions so far provided match strings. To illustrate the difference:

    my $string = 'Cathy placated her cat, which was trying to catch a caterpillar.'; my $keyword = 'cat'; my $count = 0; ++$count while $string =~ /$keyword/gi; print "Occurrences of '$keyword': $count";

    Output: Occurrences of 'cat': 5

    If that's not what you want, wrap your keyword in word boundary metacharacters:

    $string =~ /\b$keyword\b/gi;
      Many thanks for the replies,

      I got it working using $count++ while /$keyword/ig;

      Printing the $count, then re-setting the counter (as I've got it in a loop to count each page searched).

      I'm wrestling with searching for words/strings now (which is semi-working!), and will probably result in a new post!

      Thanks again people,

      NL
        This may prove a bit more efficient:
        $count = () = /$keyword/ig;
        It puts the match into array context, which causes it to return the matches; then the resulting array is taken as a scalar, resulting in the count. This is a somewhat common Perl idiom. And you don't have to reset the counter.

        Of course, if you need to accumulate several counts, make it += instead of =, and do remember to reset the counter.


        Caution: Contents may have been coded under pressure.
Re: Counting words..
by kwaping (Priest) on Oct 24, 2005 at 17:10 UTC
    This will also work for counting words (aka occurences of a pattern in a string):
    #!/usr/bin/perl use strict; use warnings; read DATA, my $text, 40; ### this is the important line ### my $wordcount = () = $text =~ /test/gi; print $wordcount; __DATA__ TEST tester testing asdf lalala greatest

    Update: The while++ solution previously posted takes half the time to run as this one. Can anyone explain why?
      How depressing; I suggested it was more efficient. My guess is that building the list (which isn't used, after all) takes more time than repeatedly incrementing a counter.

      Caution: Contents may have been coded under pressure.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://502480]
Approved by socketdave
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (8)
As of 2024-03-28 11:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found