Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Add a fixed number of unique elements to hash

by Vasek (Acolyte)
on Mar 05, 2023 at 14:51 UTC ( [id://11150757]=perlquestion: print w/replies, xml ) Need Help??

Vasek has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I have a problem. I would like to generate a random playlist for an old mxaudio program (SGI O2, perl version 5). My goal is to take the contents of a directory structure containing a very large number of audio files and generate a list of a given number (say 10 tracks) of unique mp3s. It is important that reading of the start folder is recursive and that it is possible to add exceptions. I attach the code as far as I have got. The unique list will add up, but if the random function selects the same file more than once, then of course with fewer items than I would like. Can you help me with this? Thanks Zsolt.
#!/usr/bin/perl my $target = "/home/zsolti/Temp"; my @exclude = ( #'2023_02_21_Szentendre_Pilis_EK_oldal' 'webftp' ,'x' ); my (@audioFiles, %playList); my $numOfRandFiles = 10; &fileExplore($target); #print join("\n", @audioFiles), "\n"; &uploadPlaylist; print "-" x 80, "\n", "SELECTED FILES:\n", "-" x 80, "\n"; foreach $key (sort {$playList{$a} <=> $playList{$b}} (keys(%playList)) +) { print "$playList{$key}: $key\n"; } sub fileExplore { my $dir = shift; my $hit = 0; local *DIR; opendir DIR, $dir or die "opendir $dir: $!"; my $found = 0; while ($_ = readdir DIR) { next if /^\.{1,2}$/; $FSNode = $_; my $FSObj = "$dir/$FSNode"; foreach my $exc (@exclude) {$hit = 1 if $FSObj =~ m/\/$exc\//g} next if $hit; if (-f $FSObj) { (my $fExt = $FSObj) =~ s/.*\.(.*$)/$1/i; push(@audioFiles, $FSObj) if lc($fExt) eq 'mp3'; } fileExplore($FSObj) if -d $FSObj; } closedir DIR; } sub uploadPlaylist { for (my $i = 1; $i <= $numOfRandFiles; $i++) { $playList{$audioFiles[int rand@audioFiles]} = $i; } }

Replies are listed 'Best First'.
Re: Add a fixed number of unique elements to hash
by jo37 (Deacon) on Mar 05, 2023 at 16:56 UTC

    I would shuffle the list of files and select the first $numOfRandFiles from it, like

    use List::Util 'shuffle'; ... (shuffle(@audiofiles))[0 .. $numOfRandFiles - 1];

    Note that this generates a list, not a hash.
    If you cannot (or don't want to) use a module, you need to reinvent the wheel.

    Greetings,
    -jo

    $gryYup$d0ylprbpriprrYpkJl2xyl~rzg??P~5lp2hyl0p$

      There is also this that does the equivalent of shuffle and pick N.

      use List::AllUtils qw( sample ); my @selected = sample $numOfRandFiles, @audioFiles;

        G'day tybalt89,

        When initially writing the code in my response, I checked the List::Util::shuffle() documentation and noticed the sample() function immediately after it. I hadn't encountered that previously and decided to give it a go.

        Curiously, although it did work as documented, multiple runs produced the same results. As you can see from my "sample runs" using shuffle(), multiple runs produced different results (yes, only two runs shown, but I did run it quite a few times).

        Looking at source/lib/List/AllUtils.pm, List::AllUtils::sample() should be identical to List::Util::sample(). I have List::Util v1.62 and Perl v5.36.0.

        I'm a bit short on time this morning; I was thinking of investigating further this afternoon [Aussie timezone: UTC+11:00]. If you have any insights into the behaviour of sample(), please share.

        It did seem like sample(), taking a random selection from the array, was probably a better choice than shuffle(), randomising the entire array and then taking a slice (obviously, benchmarking needed to confirm this). On the down side, sample() requires List::Util v1.54: you'd need at least Perl v5.32.0 (which has v1.55) or an upgrade from CPAN.

        — Ken

        Be careful with List::AllUtils. "All" isn't really "all". At least frequency from List::MoreUtils is missing. I prefer to use the original modules.

        Greetings,
        -jo

        $gryYup$d0ylprbpriprrYpkJl2xyl~rzg??P~5lp2hyl0p$

      If shuffle takes too long you could do this:

      sub uploadPlaylist { for my $i ( 1 .. $numOfRandFiles ) { $playList{ splice @audioFiles, int rand @audioFiles, 1 } = $i; } }
      Naked blocks are fun! -- Randal L. Schwartz, Perl hacker
        Thx a lot jwkrahn! Splice is the key for me. Other solutions that use "factory" modules are out of the question, i.e. this machine is running perl 5.004, which has quite a few syntactical differences compared to today's perl versions. Of course, I could try transplanting the modules, but my only connection to this old SGI machine is via nfs sharing over a software half-speed network interface (since it can no longer be communicated with securely with today's settings).
Re: Add a fixed number of unique elements to hash
by kcott (Archbishop) on Mar 05, 2023 at 17:42 UTC

    G'day Zsolt,

    Your code uses a lot of older coding styles which I recommend you aim to move away from. These include: leading '&' on subroutine calls; use of package rather than lexical variables; and a lack of strict and warnings pragmata.

    There are a number of ways to achieve what you want. In the code below, I've continued your use of opendir. The core File::Find module is popular, as are a number of related CPAN modules — I expect other monks may provide you with examples of those.

    I've used fairly generic options for inclusions and exclusions — adapt to your needs.

    I created this directory structure for testing (in the spoiler):

    Here's the code:

    #!/usr/bin/env perl use strict; use warnings; use autodie; use File::Spec; use List::Util 'shuffle'; # For production, you'd probably want to read the # following values from options, config, etc. my $dir = '/home/ken/tmp/pm_11150231_dir_rand_select'; my @includes = (qr{\.x$}, qr{\.z$}); my @excludes = (qr{skip$}, qr{inc2\/p\.z$}); my $list_length = 3; my $re = { inc => qr{(?:@{[join '|', @includes]})}, exc => qr{(?:@{[join '|', @excludes]})}, }; my @all_files; get_files(\@all_files, $dir, $re); my @playlist = (shuffle @all_files)[0 .. $list_length - 1]; print "$_\n" for sort @playlist; sub get_files { my ($collected, $dir, $re) = @_; return if $dir =~ $re->{exc}; opendir(my $dh, $dir); for my $file (grep ! /^(?:\.|\.\.)$/, readdir $dh) { my $path = File::Spec::->catfile($dir, $file); next if $path =~ $re->{exc}; if (-d $path) { get_files($collected, $path, $re); } elsif (-f _) { next unless $path =~ $re->{inc}; push @$collected, $path; } else { # maybe handle other file types here } } return; }

    Here's a couple of sample runs:

    $ ./rand_select_files.pl /home/ken/tmp/pm_11150231_dir_rand_select/inc1/h.x /home/ken/tmp/pm_11150231_dir_rand_select/inc1/j.z /home/ken/tmp/pm_11150231_dir_rand_select/inc2/n.x
    $ ./rand_select_files.pl /home/ken/tmp/pm_11150231_dir_rand_select/a.x /home/ken/tmp/pm_11150231_dir_rand_select/c.z /home/ken/tmp/pm_11150231_dir_rand_select/inc1/j.z

    — Ken

Re: Add a fixed number of unique elements to hash
by tybalt89 (Monsignor) on Mar 05, 2023 at 22:11 UTC

    Define "very large number".

    Anyways, here's one way I'd do it (who needs recursion?).

    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11150757 use warnings; use List::AllUtils qw( sample none ); my $target = "/home/zsolti/Temp"; $target = '../mnt/home/old'; # FIXME for testing on my system my @exclude = ( #'2023_02_21_Szentendre_Pilis_EK_oldal' 'webftp' ,'x' ); my (@audioFiles, %playList); my $numOfRandFiles = 10; my @queue = $target; while( defined( my $path = pop @queue ) ) { if( -f $path and $path =~ /\.mp3$/i ) { push @audioFiles, $path; } elsif( -d $path and none { $path =~ m{/\Q$_\E\z} } @exclude ) # prun +e { push @queue, <$path/*>; } } @playList{ sample $numOfRandFiles, @audioFiles } = 1 .. $numOfRandFile +s; use Data::Dump 'dd'; dd \%playList;

      Hmmm, 5.004 - well at least it's not perl 4

      #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11150757 use warnings; my $target = "/home/zsolti/Temp"; $target = '../mnt/home/old'; # FIXME for testing on my system my @exclude = ( #'2023_02_21_Szentendre_Pilis_EK_oldal' 'webftp' ,'x' ); my (@audioFiles, %playList); my $numOfRandFiles = 10; my @queue = $target; while( defined( my $path = pop @queue ) ) { if( -f $path and $path =~ /\.mp3$/i ) { push @audioFiles, $path; } elsif( -d $path and not grep $path =~ /\/\Q$_\E\z/, @exclude ) { push @queue, <$path/*>; } } $playList{ splice @audioFiles, rand @audioFiles, 1 } ||= $_ for 1 .. $numOfRandFiles; printf "%3d %s\n", $playList{$_}, $_ for sort keys %playList;

      Though since I don't actually have a perl 5.004 to test on, I do wonder if it works...

        "Though since I don't actually have a perl 5.004 to test on, I do wonder if it works..."

        The only thing that I can see which won't work is 'use warnings;':

        $ corelist warnings Data for 2022-05-27 warnings was first released with perl v5.6.0

        Replace with '$^W = 1;'.

        — Ken

        Simplifying a little

        #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11150757 #use warnings; # uncomment for newer perls my $target = "/home/zsolti/Temp"; $target = '../mnt/home/old'; # FIXME for testing on my system my @exclude = ( #'2023_02_21_Szentendre_Pilis_EK_oldal' 'webftp' ,'x' ); my (@audioFiles, %playList); my $numOfRandFiles = 10; my @stack = $target; while( my $path = pop @stack ) { grep $path =~ /\/\Q$_\E\z/, @exclude and next; push @audioFiles, grep -f, <$path/*.mp3>; push @stack, grep -d, <$path/*>; } $playList{ splice @audioFiles, rand @audioFiles, 1 or last } = $_ for 1 .. $numOfRandFiles; printf "%3d %s\n", $playList{$_}, $_ for sort keys %playList;
Re: Add a fixed number of unique elements to hash [UPDATE on sample() vs. shuffle()]
by kcott (Archbishop) on Mar 23, 2023 at 01:43 UTC

    TL;DR sample() works fine.

    Background: I had supplied a solution 2-3 weeks ago. I then questioned whether sample() would've been better than shuffle(). I indicated that I'd encountered a potential problem; but real-life got in the way of further testing. I've just noticed that I didn't follow-up on that when real-life got out of the way. :-)

    For ease of reference, here's my previous posts which are somewhat scattered throughout the thread:

    1. Re: Add a fixed number of unique elements to hash
    2. Re^3: Add a fixed number of unique elements to hash [sample() vs. shuffle()]
    3. Re^5: Add a fixed number of unique elements to hash [sample() vs. shuffle()]

    The OP subsequently wrote that a solution was required that could be used with Perl 5.004; accordingly, this response generally addresses the issue but will not be specifically useful to the OP.

    I made two modifications to the code posted in "Re: Add a fixed number of unique elements to hash" that change shuffle() to sample(). I also removed a sort to better highlight the randomness of sample().

    ... #use List::Util 'shuffle'; use List::Util 'sample'; ... #my @playlist = (shuffle @all_files)[0 .. $list_length - 1]; my @playlist = sample $list_length, @all_files; #print "$_\n" for sort @playlist; print "$_\n" for @playlist; ...

    Here's some sample runs (the updated script is called "rand_select_files_sample.pl"):

    ken@titan ~/tmp/pm_11150231_dir_rand_select $ ./rand_select_files_sample.pl /home/ken/tmp/pm_11150231_dir_rand_select/inc2/n.x /home/ken/tmp/pm_11150231_dir_rand_select/a.x /home/ken/tmp/pm_11150231_dir_rand_select/c.z ken@titan ~/tmp/pm_11150231_dir_rand_select $ ./rand_select_files_sample.pl /home/ken/tmp/pm_11150231_dir_rand_select/inc1/j.z /home/ken/tmp/pm_11150231_dir_rand_select/c.z /home/ken/tmp/pm_11150231_dir_rand_select/inc1/h.x ken@titan ~/tmp/pm_11150231_dir_rand_select $ ./rand_select_files_sample.pl /home/ken/tmp/pm_11150231_dir_rand_select/inc1/j.z /home/ken/tmp/pm_11150231_dir_rand_select/c.z /home/ken/tmp/pm_11150231_dir_rand_select/a.x

    Clearly, multiple runs are producing different results. I don't have any results from earlier runs using sample(). Assumption: PEBCAC

    — Ken

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11150757]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (5)
As of 2024-04-23 08:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found