enable the :crlf PerlIO layer
Thx haukex, that worked. Even so, the input to google exceeded their rate limit, so I had to slow it down. I added sleep time and a means to keep track of how long a file takes to translate.
for my $file (@texts) {
local $/ = "";
open my $fh, '<:crlf', $file or die;
my $base_name = path("$file")->basename;
my $out_file = path( $out_dir, $base_name )->touchpath;
say "out_file is $out_file";
## time it
use Benchmark;
my $t0 = Benchmark->new;
while (<$fh>) {
print "New Paragraph: $_";
my $r = get_trans( $wgt, $_ );
for my $trans_rh ( @{ $r->{data}->{translations} } ) {
my $result = $trans_rh->{translatedText};
say "result is $result ";
my @lines = split /\n/, $result;
push @lines, "\n";
path("$out_file")->append_utf8(@lines);
sleep(1);
}
}
my $t1 = Benchmark->new;
my $td = timediff( $t1, $t0 );
print "$file took:", timestr($td), "\n";
sleep(3);
close $fh;
84-0.txt is Shelley's Frankenstein, which is about 450 k in length. Of the $300 credit they give anyone to sign up for their API, I used 7 cents of it, so I'm down to $297.22 left. It made for an interesting way to skim both the original and the translation. This ballparks 20 minutes as an outer limit:
/home/bob/Documents/meditations/castaways/Translate1/data/84-0.txt too
+k:1180 wallclock secs (23.34 usr + 1.36 sys = 24.70 CPU)
$
Q3: What do the usr and sys numbers mean?
Module names in all lowercase are reserved (by convention) for pragmas, so I'd name your module Translate. Also, you're not checking your open for errors.
I did fix both of these but went with Translate1 . The reason I did this is that I know there is going to be a Translate2 that will not work with Translate1. I've heard such naming called "trampolining," and something to be avoided. Q4: Am I supposed to not have such collisions using version numbers or clever use of git? The features of the package change quickly, and sometimes, I have to roll back to something that actually worked.
I found that I had to go back to make clean every time I made a change in the script, so I wrote a little helper bash script:
$ cat 1.google.sh
#!/bin/bash
pwd
make clean
perl Makefile.PL
make
make test
make install
ls
cd blib
cd script
./3.my_script.pl
$
I offer this as a keystroke reduction mechanism, not wanting to be OT.
The translations went well with the exception of certain characters. Let's look at a couple paragraphs with differing tags. Here is output with pre tags
New Paragraph: Are you mad, my friend? said he. Or whither does your
senseless curiosity lead you? Would you also create for yourself and the
world a demoniacal enemy? Peace, peace! Learn my miseries and do not seek
to increase your own.
result is - Ты злишься, друг мой? - спросил он. Или куда ты
бессмысленное любопытство приведет тебя? Не могли бы вы также создать для себя и
мир демонический враг? Мир, мир! Узнай мои страдания и не ищи
увеличить свой собственный.
New Paragraph: Frankenstein discovered that I made notes concerning his history; he asked
to see them and then himself corrected and augmented them in many places,
but principally in giving the life and spirit to the conversations he held
with his enemy. Since you have preserved my narration, said
he, I would not that a mutilated one should go down to
posterity.
result is Франкенштейн обнаружил, что я делал заметки, касающиеся его истории; он спросил
чтобы увидеть их, а затем сам исправить и дополнить их во многих местах,
но главным образом в том, чтобы дать жизнь и дух разговорам, которые он вел
со своим врагом. "Так как вы сохранили мое повествование", сказал
он, Я бы не хотел, чтобы изуродованный
posterity.
Here is what the 1st paragraph looks like in code tags:
New Paragraph: €œAre you mad, my friend?€ said he.
+€œOr whither does your
senseless curiosity lead you? Would you also create for yourself and t
+he
world a demoniacal enemy? Peace, peace! Learn my miseries and do not s
+eek
to increase your own.€
For some reason, Shelley quotes paragraphs as a matter of course, and they are getting garbled as I read in under these conditions:
#!/usr/bin/perl -w
use 5.011;
use WWW::Google::Translate;
use Data::Dumper;
use open OUT => ':utf8';
use Path::Tiny;
use lib ".";
use translate;
binmode STDOUT, 'utf8';
use POSIX qw(strftime);
Google sometimes gives the correct rendering of quotes in russian. They do it somewhat like this: << >> .
Q5: How do I change my script so that these characters are rendered correctly? They look right as I read them in gedit.
Finally, as I look at the arguments in Makefile.Pl:
my %WriteMakefileArgs = (
NAME => 'Translate1',
AUTHOR => q{gilligan <gilligan@island.coconut>},
VERSION_FROM => 'lib/Translate1.pm',
LICENSE => 'artistic_2',
MIN_PERL_VERSION => '5.006',
CONFIGURE_REQUIRES => {
'ExtUtils::MakeMaker' => '0',
},
TEST_REQUIRES => {
'Test::More' => '0',
},
PREREQ_PM => {
#'ABC' => '1.6',
#'Foo::Bar::Module' => '5.0401',
},
EXE_FILES => ['lib/3.my_script.pl'],
dist => { COMPRESS => 'gzip -9f', SUFFIX => 'gz', },
clean => { FILES => 'Translate1-*' },
);
Q6: How would I determine which version of WWW::Google::Translate to require?
Thank you for your comments,
|