Some comments:
- use strict warnings and diagnostics or die
- The file regex may produce some unwanted results
with unix-like filenames: file.txt.bak . Just think which
part you want to retain. You can find the first and last dot
with index and rindex, respectively.
-
You slurp the contents in array context only to join the
array. You can also set $/ to undef:
{
local $/ = undef;
$contenido = <LIBRO>;
}
You can leave the newlines intact, they will be catched
with '\s'. Even better, tr will take care of that.
-
Use lc or uc to change the case.
-
You can simplify the translation, by complementing the
list to the alphabetic range (see perlop):
$contenido = uc $contenido;
$contenido =~ tr/A-Z/ /cs;
-
Use '\s+' rather than '\s', so you don't have to test for
empty cases.
-
You can get the total number without array assignment:
$npalabras = keys %PF;The scalar context will
force immediate size return.
-
I would print LIBROUT in the while loop, so the
system will get the chance to buffer nicely.
It's quite a list, but I hope it will give you the chance
to learn new idiom. Result:
#....
my $contenido;
{
local $/ = undef;
$contenido = <LIBRO>;
}
$contenido = uc $contenido;
$contenido =~ tr/A-Z/ /cs;
my %PF;
$PF{$_}++ for( split /\s+/, $contenido);
open LIBROUT, ">$ar.csv";
my $npalabras = keys %PF;
while( keys %PF ){
print LIBROUT join ';', $_, my $f=$PF{$_}, $f/ $npalabras;
print LIBROUT "\n";
}
Well, you see how the use of $_ simplifies things..
Hope this helps,
Jeroen
"We are not alone"(FZ)