Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Counting PDL vectors in a PDL matrix

by mxb (Pilgrim)
on Feb 20, 2018 at 09:19 UTC ( [id://1209552]=perlquestion: print w/replies, xml ) Need Help??

mxb has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, I am currently learning PDL and I've come up against a situation for which I cannot figure out how to proceed. I have a 2D array, comprised of multiple 1D byte vectors which may or may not be unique. I wish to count the number of unique vectors within the array - essentially a histogram of vector counts. My data is too big for here, but the following example demonstrates the issue I am having.

pdl> p $x [ [0 1 2] [3 4 5] [6 7 8] [0 1 2] [0 1 2] [6 7 8] ]

I know I can retrieve a list of unique elements:

pdl> p uniq $x [0 1 2 3 4 5 6 7 8]

I know I can retrieve a histogram of all elements:

pdl> p scalar hist $x,0,256,1 [3 3 3 1 1 1 2 2 2 0 0 0 0 0 ....]

What I would like is something like the following output:

[0 1 2] 3 [3 4 5] 1 [6 7 8] 2
I think my issue stems from the fact that PDL is designed for operations on PDL elements. I am currently contemplating if the best solution to my problem is to have a lookup table of the data I am putting in vectors to an index, for example:
my %lookup = ( 0 => "0 1 2", 1 => "3 4 5", 2 => "6 7 8", ); # Then $x reduces down to $x = [ 0 1 2 0 0 2 ];

Any advice would be welcome, as both PDL and numerical computing are new to me, thanks!

Replies are listed 'Best First'.
Re: Counting PDL vectors in a PDL matrix
by choroba (Cardinal) on Feb 20, 2018 at 11:02 UTC
    To get the list of unique vectors, use uniqvec.

    To get the counts, I had to iterate over the lines. There might be a better solution, I'm not a PDL expert. The resulting piddle has the count as the last element:

    #!/usr/bin/perl use warnings; use strict; use PDL; my $p = pdl( [0, 1, 2], [3, 4, 5], [6, 7, 8], [0, 1, 2], [0, 1, 2], [6, 7, 8], [0, 1, 10]); my $u = $p->uniqvec; my @r; for my $i (0 .. $u->dim(1) - 1) { my $vec = $u->slice(':', $i); my $matches = $p == $vec; push @r, [ $matches->andover->sum ]; } print $u->glue(0, pdl(@r));
    Output:
    [ [ 0 1 2 3] [ 0 1 10 1] [ 3 4 5 1] [ 6 7 8 2] ]
    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re: Counting PDL vectors in a PDL matrix
by vr (Curate) on Feb 20, 2018 at 19:30 UTC

    Since using uniqvec leads to subsequent manual looping over original and perhaps a bit too much additional math, then maybe if, looks like, we are out of luck with elegant vectorized solution anyway, let's find unique lines directly. The @keys are not required to be stored, they can be extracted again using @index, or even get_dararef can be called on severed $uniq and then large string can be split into equal chunks. I didn't investigate what would be more efficient. If data are very large, then maybe md5( $$ref ) could be used. Not sure if appending counts to data is a good idea, but it's in final line, anyway.

    All this assuming that order of unique lines should be preserved, and that your data are already efficiently put into large piddle (or, otherwise, if matrix is built line by line, the lookup table can be more easily constructed in the same loop.)

    use strict; use warnings; use feature 'say'; use PDL; my $x = pdl [ [0, 1, 2], [3, 4, 5], [6, 7, 8], [0, 1, 2], [0, 1, 2], [6, 7, 8], ]; my @index; my @keys; my %count; for ( 0 .. $x-> dim( 1 ) - 1 ) { my $ref = $x-> slice( [], $_ )-> get_dataref; next if $count{ $$ref } ++; push @index, $_; push @keys, $$ref; } my $uniq = $x-> dice( 'X', pdl \@index ); my $counts = pdl @count{ @keys }; say $uniq; say $counts; say $uniq-> append( $counts-> transpose );

    Output:

    >perl pdl180220.pl [ [0 1 2] [3 4 5] [6 7 8] ] [3 1 2] [ [0 1 2 3] [3 4 5 1] [6 7 8 2] ]

      Hi,

      Thanks for the quick replies all. Thanks to choroba for linking to uniqvec, which I must have missed during reading the documentation.

      Thanks to vr for the good example code, this clears things up a bit. I came to the conclusion that my whole approach was incorrect (which I kind of expected when learning PDL).

      I am building up the matrix vector by vector, so if we consider each vector as a single entity (which is what I am doing), then rather than a 2D matrix of vectors, in essence I have a 1D vectors of entities. Therefore, I am choosing to build the lookup table during creation.

      Many thanks all.

Re: Counting PDL vectors in a PDL matrix
by corenth (Monk) on Feb 20, 2018 at 17:08 UTC
    Oh, wow! I've never heard of PDL. This is spiffy as all get out. Okay, here's how I'd go about it. . .
    #here's an array my @a = (1,2,3); #now it's a string: my $string = join '', @a; #here's another array my @b; #Now the array values your looking for become an array element. #Increment when those values come up again. $b[$string]++; # so $b[123] now = 1 #Leading zeroes disappear, so you could add a 1 to the front of $strin +g. $string = '1' . join '', @a; $b[$string]++; # so $b[1123] now = 1 #yeah. that'll work #I might write the whole thing as: my @a = all your stuff as little array refs from wherever my @b = the answers you want; for (@a) my $string = '1' . join '', @$_; $b[$string]++ } #after that, you could copy to an array of tuples: # e.g., $c[2] = (["123"],[5]); #strip that extra '1' and turn to strin +g? # $c[2] = (["info"], [number of times you find it]); # might make it easier to play with. #would %b be faster than @b? I prefer arrays, but that's my bias. #$b{$string}++ would certainly work. But, is it as easy to muck about +with? #that extra '1' for referencing the elements does bother me a bit.
    $state{tired}?sleep(40):eat($food);

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1209552]
Front-paged by haukex
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (4)
As of 2024-04-19 05:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found