http://qs321.pair.com?node_id=11108119


in reply to Unique Values within AOH

Rather than reading the data line by line you could slurp all of it into a single scalar string then split into per-team chunks at each point in the string that is followed by a "T" at the start of a line. Then for each team get rid of the [PT]: characters globally and split again on newlines into the team line and player lines. This gets rid of the need to test each line for a "T" looking for a new team. Finally, as hippo suggests, de-dup before building your HoA;

use 5.026; use warnings; use Data::Dumper; open my $dataFH, q{<}, \ <<__EOD__ or die $!; T:REDS P:GRIFFEY P:GRIFFEY P:PEREZ P:GRIFFEY P:PEREZ P:ROSE P:BENCH T:PHILLIES P:ROSE P:ROSE T:MARINERS P:GRIFFEY P:PEREZ __EOD__ my $data = do { local $/; <$dataFH>; }; close $dataFH or die $!; my @teams = split m{(?=^T)}m, $data; my %teamAccts; foreach my $teamData ( @teams ) { $teamData =~ s{[PT]:}{}g; my( $teamLine, @playerLines ) = split m{\n}, $teamData; $teamAccts{ $teamLine } = [ do { my %seen; grep { ! $seen{ $_ } ++ } @playerLines; } ]; } print Data::Dumper ->new( [ \ %teamAccts ], [ qw{ *teamAccts } ] ) ->Sortkeys( 1 ) ->Dumpxs();

The output.

%teamAccts = ( 'MARINERS' => [ 'GRIFFEY', 'PEREZ' ], 'PHILLIES' => [ 'ROSE' ], 'REDS' => [ 'GRIFFEY', 'PEREZ', 'ROSE', 'BENCH' ] );

I hope this is of interest.

Update: Clarified wording slightly.

Cheers,

JohnGG

Replies are listed 'Best First'.
Re^2: Unique Values within AOH
by afoken (Chancellor) on Oct 30, 2019 at 20:48 UTC
    Rather than reading the data line by line you could slurp all of it into a single scalar string then split into per-team chunks ...

    OK for small files, begging for trouble as soon as files grow beyond the amount of free RAM.

    On a 32-bit perl, you simply can not have a scalar larger than 4 GBytes, because you have no more address lines. The real limit may be much less, depending on operating system and other factors. So you are limited to files smaller than that. Reading line by line allows processing Petabytes of data without running out of memory.

    Even a 64-bit perl will be limited to the amount of free RAM and free swap space. Once all RAM and swap is used up and the machine has come to a grinding halt, you are lost. Again, reading line by line allows processing much more data.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
      OK for small files, begging for trouble as soon as files grow beyond the amount of free RAM.

      Granted, but for this particular problem I doubt that there are enough teams out there for the data to run up against the physical RAM limit, even on quite elderly systems.

      Cheers,

      JohnGG