Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

optimize percentile counting in hash

by max210 (Novice)
on Mar 20, 2008 at 22:23 UTC ( [id://675330]=perlquestion: print w/replies, xml ) Need Help??

max210 has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I am trying to count percentile of million of hash value. I had written a subroutine for that which returns hash with values as percentile but its taking like infinite time. Kindly help me optimize the code or plz give different logic.

I learned perl just before couple of days :)
@random_values=0; $abc = 0; $ct=0; @coins = (); %hash = ("Quarter", 25, "Dime", 10, "Nickle", 5); %returned_hash = hashPercentile(%hash); print %returned_hash; sub hashPercentile{ $array_length = @_; for($i=0;$i<$array_length;$i++){ $j = $i+1; $hash {$_[$i]} = $_[$j]; $i++; } @keys = sort { $hash{$a} cmp $hash{$b} } keys %hash; # and by value while ( my ($key, $value) = each(%hash) ) { $random_values[$abc] = $value; $abc++; } @sorted = sort { $a <=> $b } @random_values; foreach(@keys){ $value = $hash{$_}; my $search_for = $value; my( $index )= grep $sorted[$_] eq $sea +rch_for, 0..$#sorted; #$index = No of elements below $searc +h_for # $n = No of elements $n= scalar(@sorted); $PR = ($index*100)/$n; $account_ID_PR {$_} = $PR; } return %account_ID_PR; }

Replies are listed 'Best First'.
Re: optimize percentile counting in hash
by holli (Abbot) on Mar 20, 2008 at 22:46 UTC
    May I ask from what language you come from? Because
    $array_length = @_; for($i=0;$i<$array_length;$i++){ $j = $i+1; $hash {$_[$i]} = $_[$j]; $i++; }
    would be a clever hack in any other language. In Perl its just
    %hash=@_;
    Impressive, isn't it? ;-)


    holli, /regexed monk/
Re: optimize percentile counting in hash
by apl (Monsignor) on Mar 20, 2008 at 22:44 UTC
    its taking like infinite time.

    No, it's not running at all because of the errors.

    First, add use strict; use warnings;

    Then, realize that variable names like $_$i, $temp1_array$ct and $temp1_array$i are invalid. (A scalar starts with a '$'; you can't have one embedded.)

    If you're trying to use them as hashes, you'd say $temp1_array{ $i }. If as an array, $temp1_array( $i ).

    Try making those changes, and see what happens as a result.

    (By the way, when posting code, it helps the rest of us if you bracket it with <code> and </code>.)

    Good luck with the program!

Re: optimize percentile counting in hash
by hipowls (Curate) on Mar 20, 2008 at 23:48 UTC

    You need to wrap your code in <code> and </code> tags. The characters [ and ] are special to perl monks and can't put put as literal text in your post. You should also use <p> and </p> tags around paragraphs.

    A general tip is to put

    use strict; use warnings;
    at the start of every script you write until you know what you are doing. They wil pick up typos and dubious practices.

    Assuming I've understood your problem then this does what you want.

    my @data = get_data(); # or however you get them my %count; foreach my $datum (@data) { ++$count{$datum}; }; foreach my $item_count ( values %count ) { $item_count *= 100/@data; }
    Now %count will have a key/value for each datum and its frequency as a percentage.

      Thanks for trying to help me out. I really appreciate it :)

      I was not using \<code\></code> may be that's y u had error. The code should work now. I HAVE TO FIND PERCENTILE AND NOT PERCENTAGE :( which makes it more difficult to execute in less time.

      Eagerly waiting for reply, thanks again
        How about
        use strict; use warnings; my @data = get_data(); # or however you get them my %count; foreach my $datum (@data) { ++$count{$datum}; } my %percentile; my $total = 0; foreach my $datum (sort { $a <=> $b } keys %count) { $total += $count{$datum}; $percentile{$datum} = $total / @data; }
        For future reference, if you modify your original post, you should also add a Revised: note, explaining the change. Otherwise it's confusing to the people new to the thread.
Re: optimize percentile counting in hash
by dwm042 (Priest) on Mar 21, 2008 at 13:41 UTC
    In my opinion, you're writing code where you don't need to write code. Almost everything (other than the use of a hash instead of an array) you're trying to do can be done by the module Statistics::Descriptive and in much more compact form. I won't guarantee it will run much faster, but the routines should be easier to manage:

    To bin data, for example:

    #!/usr/bin/perl use warnings; use strict; use Statistics::Descriptive; # # Warning: untested code. # my @data = (1,7,2,19,3,4,21,1092,6); my $stat = Statistics::Descriptive::Full->new(); $stat->add_data(@data); $stat->sort_data(); print $stat->percentile(25);

      Great Help :)
      Although, when I run it, it says, Can't locate Statistics/Descriptive.pm in @INC.
      I apologize for asking simple questions, but kindly bear with it, am new to perl
      Thanks again.

        The module Statistics::Descriptive is a Perl CPAN module, and how you add a Perl module will change depending on the flavor of Perl you are running.

        Are you running Perl on a Windows box using Active State Perl or on a Unix machine?

        In Active State, you would use the command 'ppm' to launch the tool to install modules, and in Unix, you can use the command:

        perl -MCPAN -e shell
        To start the process of accessing the CPAN repository. If you do not have Internet access from your machine, you may have to download the modules manually.

        If you need to know more, there is this node on Perl Monks that has all kinds of information about installing modules.

        Update: added PM node on module installation.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://675330]
Approved by pc88mxer
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (3)
As of 2024-04-19 05:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found