Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re^4: De-googleizing translation scripts

by Aldebaran (Curate)
on Nov 07, 2022 at 06:59 UTC ( [id://11148021] : note . print w/replies, xml ) Need Help??


in reply to Re^3: De-googleizing translation scripts
in thread De-googleizing translation scripts

Note that you have included an auth-key in your SCSE. You don't want that.

Thanks everyone who told me directly that I had left my fly undone. I was amazed that the example code worked right out of the gate...I guess it wasn't just an example, I got a little fooled as I hadn't searched for my key yet. (now changed) It's a pretty slick operation at deepl.

First of all, bliako, thank you for your response, and it's good to hear from you. I had feared for your welfare with your proximity to ...Charybdis, but you sound no worse for the wear. I got some better results and then tried to extend it, make it more bliako-esque, and didn't quite get there. The writeup will be better in readmores:

You are receiving a JSON string from the remote server with your script (great!), that's stored in $response->decoded_content. Then you correctly convert that string, using decode_json(), into a perl data structure and store it in variable $data, in this case, of type HASH. You can use this data structure ($data) as usual, e.g. my $text1 = $data->{'translations'}->[0]->{'text'}.

Ok. I relegated config values to a config file, and used some cyrillic for what was homophonic to "tiny." The thing I know is that if I'm trying to work with that file and I don't have a cyrillic keyboard, I'm screwed. I'm intentionally trying to make that subroutine and file prickly pears. We're gonna be hip deep in utf8 real soon, so let's allow it into the source:

#!/usr/bin/perl # tiny is homophonic to тайный, me +aning secret in russian use v5.030; # strictness implied use warnings; use Path::Tiny; use HTTP::Tiny; use JSON::MaybeXS; use utf8; my $file_in = path("/home/fritz/Desktop/1.enchanto.txt"); my $file_out = path('/home/fritz/Desktop/1.enc_trans.txt'); my $lang = 'es'; # slurp file my $guts = $file_in->slurp_utf8; my @spl = split( '\n', $guts ); # get credentials my ( $url, $key ) = get_тайный(); #say "$url, $key"; my $ua = HTTP::Tiny->new( 'verify_SSL' => '1' ); for my $para (@spl) { say $para; my $payload = "text=$para&target_lang=$lang"; my $payloadlen = length($payload); my $response = $ua->request( 'POST' => $url, { headers => { 'Authorization' => "DeepL-Auth-Key $key", 'Content-Length' => $payloadlen, 'Accept' => '*/*', 'Content-Type' => 'application/x-www-form-urlencoded', 'User-Agent' => 'curl/7.55.1' }, content => $payload, }, ); die "Failed!\n" unless $response->{success}; print $response->{content} if length $response->{content}; my $content = $response->{content}; say "content is $content"; my $data = decode_json($content); say "data is $data"; my $text1 = $data->{'translations'}->[0]->{'text'}; $file_out->append_utf8( $para, $text1 ); sub get_тайный { use Config::Tiny; my $ini_path = qw( 1.тайный.tx +t ); my $sub_hash = "deepl"; my $Config = Config::Tiny->new; $Config = Config::Tiny->read( $ini_path, 'utf8' ); # -> is optional between brackets my $url = $Config->{$sub_hash}{'url'}; my $key = $Config->{$sub_hash}{'key'}; #dial up the server $DB::single = 1; return ( $url, $key ); } } __END__

And, yay, this compiles and behaves. I always get something from the translations. I wouldn't have thought that anything I had written would sound like a gangster, but it kind of did:

I have noticed that a sign went up in yard indicating that you were lo +oking to leave Eagle Meadows. ...Me he dado cuenta de que se ha puest +o un cartel en el jardín indicando que querías dejar Eagle Meadows.
fritz@laptop:~/Documents$ trans "puesto un cartel" puesto un cartel put up a poster Translations of puesto un cartel [ Español -> English ] puesto un cartel put up a poster, put a sign fritz@laptop:~/Documents$

I don't want to be casually mentioning a cartel to my hispanic neighbors, so I'll give the translator a better idea of a real-estate sign.

*They* have now linked your CC, your translations and your monk handle and thus your comments. Brrrr (but hey the danger is not with "They" but with evil dictators outside Western Democracies /sic/ /sarcasm-off)

I'm much less worried about my enemies, who don't know me, than my "friends" who get the raw feed from the all-slurping back door of google. "They" give that data to a "they" in London and Jerusalem. I would just as soon eat a cheeseburger without Benjamin Netanyahu knowing where that was happening in real time. Meanwhile, his thesis regarding Americans is that he can do whatever he wants to, as he has stated before, in his twisted version of "Philadelphia Freedom." Able to beat every rap..., but I can't change him, I've gotta change things on the me side of it.

Q1) Let's premise that we all live in some type of battlefield, that the battlefield is everywhere (Snowden). Being able to locate a telephone signal literally or through digital means is how a lot people get got in this world. Is a VPN in conjunction with your phone a good idea if you don't want to be broadcasting your meta-data to the usual listeners?

Definitely-OT Q2) Does one have any less exposure with an Apple phone than an android?

Let me get back to perl.

Where would I like to go with this script? I don't think it's finished mod one until the data is segregated from the code. Clearing things out of main. For example,

my $file_in = path("/home/fritz/Desktop/1.enchanto.txt"); my $file_out = path('/home/fritz/Desktop/1.enc_trans.txt'); my $lang = 'es';

These are hard-coded in main. No bueno. Also, I want to isolate the http section as a subroutine. Why this? Because I don't understand it, so I'm trying to contrast it with something I know better, namely HTTP::Request and Data::Roundtrip.

But I just can't put it all together. I get confused with hashes versus arrays, and while I aspire to be able to put together that perfectly-economical data structure like bliako does, but I don't get there without hitting every branch of the tree. Here's what I have now:

#!/usr/bin/perl # tiny is homophonic to тайный, me +aning secret in russian use v5.030; # strictness implied use warnings; use utf8; our $debug = 1; my $ref_events = init_event(); my $ref_from_http = http_tiny($ref_events, $debug); sub http_tiny { my ($ref_events, $debug) = (@_); use Path::Tiny; use HTTP::Tiny; use JSON::MaybeXS; $debug //= 0; my %h = %$ref_events; my $file_in = $h{infile}; my $file_out = $h{outfile}; my $lang = $h{lang}; # slurp file my $guts = $file_in->slurp_utf8; my @spl = split( '\n', $guts ); # get credentials my ( $url, $key ) = get_тайный(); #say "$url, $key"; my $ua = HTTP::Tiny->new( 'verify_SSL' => '1' ); for my $para (@spl) { say $para; my $payload = "text=$para&target_lang=$lang"; my $payloadlen = length($payload); my $response = $ua->request( 'POST' => $url, { headers => { 'Authorization' => "DeepL-Auth-Key $key", 'Content-Length' => $payloadlen, 'Accept' => '*/*', 'Content-Type' => 'application/x-www-form-urlencoded', 'User-Agent' => 'curl/7.55.1' }, content => $payload, }, ); die "Failed!\n" unless $response->{success}; if ( $debug > 0 ) { print "$0 : $payload\n$0 : sending above payload, of $payloadlen + bytes..."; } #print $response->{content} if length $response->{content}; my $content = $response->{content}; #say "content is $content"; my $data = decode_json($content); #say "data is $data"; my $text1 = $data->{'translations'}->[0]->{'text'}; $file_out->append_utf8( $para, $text1 ); } sub get_тайный { use Config::Tiny; my $ini_path = qw( 1.тайный.tx +t ); my $sub_hash = "deepl"; my $Config = Config::Tiny->new; $Config = Config::Tiny->read( $ini_path, 'utf8' ); # -> is optional between brackets my $url = $Config->{$sub_hash}{'url'}; my $key = $Config->{$sub_hash}{'key'}; #dial up the server $DB::single = 1; return ( $url, $key ); } } sub init_event { my $hr = { infile => '/home/fritz/Desktop/1.enchanto.txt', outfile => '/home/fritz/Desktop/1.enc_trans.txt', lang => 'es', }; return $hr; } __END__

It compiles but doesn't behave, acting as if it can't find Path::Class in a strange place. I'm mystified by it, but since I'm rusty, I'm in the process of hitting every branch on the mistake tree. To wit:

fritz@laptop:~/Documents$ ./5.trans.pl Name "DB::single" used only once: possible typo at ./5.trans.pl line 9 +6. Can't locate object method "slurp_utf8" via package "/home/fritz/Deskt +op/1.enchanto.txt" (perhaps you forgot to load "/home/fritz/Desktop/1 +.enchanto.txt"?) at ./5.trans.pl line 35. fritz@laptop:~/Documents$

So what do I want to do with it? Well, get it working again, and thanks for comments in that regard, but then I'd also like to work up what this would look like instead using HTTP::Request and Data::Roundtrip.

una velada agradable para el monasterio,

Replies are listed 'Best First'.
Re^5: De-googleizing translation scripts
by bliako (Monsignor) on Nov 07, 2022 at 10:26 UTC

    *They* be dragons, so let's leave it at that.

    I think this calls for creating separate package(s) and a small script which will take user input from the command line like translate.pl --infile 'xyz' --outfile 'aaa' --verbose 1. I said maybe more packages because what I see above is some app-specific functions like http_tiny (I would call that fetch_from_server or something?) and also some more general-purpose functions like get_secrets which is general because it reads a config file and looks for some user-specified keys and, therefore, you can reuse that for other apps you will be creating in the future.

    More concretely, app-specific functions go to (say) Net::API::DeepL and general-purpose go to Aldebaran::Util. Now that's a first thought, other Monks may have some better suggestions. But the gist is to separate code in packages, and aim at re-using code (e.g. from your Aldebaran::Util) for any other scripts you produce in the future.

    Once you have these packages, then you create the simplest script to "drive" them and here useful will be Getopt::Long which makes it easy-peasy to parse CLI user input (the --infile xxx above).

    If you are still with me, then you need to start this properly:

    Module::Starter provides the CLI command module-starter which creates a skeleton project/app directory: module-starter --module='Net::API::DeepL' --builder='ExtUtils::MakeMaker' --author='Aldebaran' (optionally add --email=xyz@... if you want to publish this and get feedback, perhaps CPAN).

    Note, I have always used ExtUtils::MakeMaker, disclaimer: I never tried the alternatives, as this covers my needs just fine.

    Now you have a dir Net-API-DeepL and in there there will be your main file lib/Net/API/DeepL.pm. It will have a skeleton pod and be ready for inserting your subs in there (those related to DeepL and not those general ones). So add one sub in there (like the tiny_http).

    Your immediate next step will be to create test(s) for testing that sub you just added. Well, the wise ones will say that your first step is creating the test and then creating your tiny_http()!

    Create file t/10-tiny_http.t which may contain (just a suggestion):

    #!/usr/bin/env perl use strict; use warnings; use utf8; # if you must our $VERSION = '0.01'; use Test::More; use Test2::Plugin::UTF8; # rids of the Wide Character in TAP message! use Net::API::DeepL qw/http_tiny/; # import our new module my $results = http_tiny(...); # this is how a test looks like: ok(defined $results, "http_tiny() : called and got defined results"); ok(ref($results) eq 'HASH', "http_tiny() : results is a HASHref"); # etc etc etc done_testing(); # epilogue

    And you are ready to test your module:

    perl Makefile.PL make all make test

    You can create other test files for different subs, don't stuff everything into one test file. All test files will be run automatically with make test (and in alphabetical order, that's why we prepend them with numbers).

    Once your subs are tested, then create the driver script. This will be your main entry to Net::API::DeepL from the command line. E.g. translate.pl --infile ...

    Here is a skeleton which demonstrates the use of Getopt::Long to parse CLI parameters.

    #!/usr/bin/env perl use strict; use warnings; use utf8; use Getopt::Long; use Net::API::DeepL qw/http_tiny/; # import our new module my ($infile, $outfile, $verbose); $verbose = 0; if( ! Getopt::Long::GetOptions( 'infile=s' => \$infile, 'outfile=s' => \$outfile, 'verbose=i' => \$verbose, 'help' => sub { print STDERR "Usage : $0 --infile xx --outfile xx [- +-verbose N]\n"; exit(0) }, ) ){ die "error, something wrong with the command-line parameters." } die "parameters needed!" unless $infile and $outfile; my $results = http_tiny($infile, $outfile, ...); # at this point consider adding all your parameters into a hash and # pass that to http_tiny($options) instead of passing a long list whic +h # may contain optional parameters. die unless $results; print "$0 : done, success.\n";

    Well, that's something to get you started: create project dir, add module functionality, create tests, create driver script. I have omitted more details on file lib/Net/API/DeepL.pm like how to export http_tiny(). Will do that when you are ready.

    Also, you may want to think about creating a database of past translations which your module fills in as data is fetched from server so that you don't translate things twice. But with free text translation this is not going to be worth.

    Edit: Also, you may want to consider using an OO approach. This allows for storing some data into your translate object (of class Net::API::DeepL), e.g. your credentials. If you need to be doing multiple translations, this will be ideal:

    my $config = get_secrets(); my $trans = Net::API::DeepL->new($config); my @results; for my $totranslate (@$translations){ push @results, $trans->http_tiny($totranslate); }

    Others may want to give their advice or make a comment for all I mentioned above as nothing is written on stone, please do.

    bw, bliako

      Thank you for the comments and helpful source. I felt like a ship lost at sea without a host because I had unmoored myself from American tech giants. I tried to sign up for gitlab and showed up instead at the server that haukex's employer uses, because I'm there so often to get the details on configuring an rpi. I was refused entrance, and I thought, "gosh, I've never been refused entrance to anything in Berlin ever. Is this a new balkanization of the net?" No. It's me showing up in the wrong place, disoriented by Charybdis, blinded by Scylla....

      So lot's of changes with hosting situations, and I found it much better to use Getopt::Long and the command line to input the data rather than have it instantiated in a sub. I'm super rusty with the module building parts that I just couldn't even start with that. Now that I have a place to host source code again, maybe this is the type of thing I should work up again.

      For now though, I have a working script, and I wanted to post it as a result for the thread.

      fritz@laptop:~/Documents$ ./4.getopt.pl --infile /home/fritz/Desktop/1 +.tran.txt --outfile /home/fritz/Desktop/1.enc_trans.txt --lang EL --v +erbose 1 /home/fritz/Desktop/1.tran.txt, /home/fritz/Desktop/1.enc_trans.txt, E +L, 1 ... ./4.getopt.pl : sending above payload, of 574 bytes... ... ./4.getopt.pl : text=Scylla and Charybdis are the horns of a dilemma.& +target_lang=EL ./4.getopt.pl : sending above payload, of 68 bytes..../4.getopt.pl : d +one, success. fritz@laptop:~/Documents$

      Source:

      #!/usr/bin/env perl use v5.030; # strictness implied use warnings; use utf8; use Getopt::Long; #use Net::API::DeepL qw/http_tiny/; # maybe someday... my ( $infile, $outfile, $lang, $verbose ); $verbose = 0; if ( !Getopt::Long::GetOptions( 'infile=s' => \$infile, 'outfile=s' => \$outfile, 'lang=s' => \$lang, 'verbose=i' => \$verbose, 'help' => sub { print STDERR "Usage : $0 --infile xx --outfile xx --lang xx [--verbose N]\n +"; exit(0); }, ) ) { die "error, something wrong with the command-line parameters."; } die "parameters needed!" unless $infile and $outfile and $lang; say "$infile, $outfile, $lang, $verbose"; # at this point consider adding all your parameters into a hash and # pass that to http_tiny($options) instead of passing a long list whic +h # may contain optional parameters. my %h; $h{infile} = $infile; $h{outfile} = $outfile; $h{lang} = $lang; $h{verbose} = $verbose; my $results = http_tiny( \%h ); die unless $results; print "$0 : done, success.\n"; sub http_tiny { my ($hr) = (shift); use Path::Tiny; use HTTP::Tiny; use JSON::MaybeXS; my %h = %$hr; my $file_in = path( $h{infile} ); my $file_out = path( $h{outfile} ); my $lang = $h{lang}; my $debug = $h{verbose}; # slurp file my $guts = $file_in->slurp_utf8; my @spl = split( '\n', $guts ); # get credentials my ( $url, $key ) = get_тайный() +; #say "$url, $key"; my $ua = HTTP::Tiny->new( 'verify_SSL' => '1' ); for my $para (@spl) { say $para; my $payload = "text=$para&target_lang=$lang"; my $payloadlen = length($payload); my $response = $ua->request( 'POST' => $url, { headers => { 'Authorization' => "DeepL-Auth-Key $key", 'Content-Length' => $payloadlen, 'Accept' => '*/*', 'Content-Type' => 'application/x-www-form-urlencoded', 'User-Agent' => 'curl/7.55.1' }, content => $payload, }, ); die "Failed!\n" unless $response->{success}; if ( $debug > 0 ) { print "$0 : $payload\n$0 : sending above payload, of $payloadlen byt +es..."; } #print $response->{content} if length $response->{content}; my $content = $response->{content}; #say "content is $content"; my $data = decode_json($content); #say "data is $data"; my $text1 = $data->{'translations'}->[0]->{'text'}; my $outstring = $para. "\n".$text1."\n"; $file_out->append_utf8( $outstring); } return $hr; } sub get_тайный { use Config::Tiny; my $ini_path = qw( 1.тайный.txt + ); my $sub_hash = "deepl"; my $Config = Config::Tiny->new; $Config = Config::Tiny->read( $ini_path, 'utf8' ); # -> is optional between brackets my $url = $Config->{$sub_hash}{'url'}; my $key = $Config->{$sub_hash}{'key'}; return ( $url, $key ); } __END__

      Output in pre tags:

      Σας ευχαριστώ για τα σχόλια και τη χρήσιμη πηγή. Ένιωθα σαν ένα πλοίο χαμένο στη θάλασσα χωρίς οικοδεσπότη, επειδή είχα ξεκολλήσει από τους αμερικανικούς τεχνολογικούς γίγαντες. Προσπάθησα να εγγραφώ στο gitlab και εμφανίστηκα αντ' αυτού στον διακομιστή που χρησιμοποιεί ο εργοδότης του haukex, επειδή βρίσκομαι εκεί τόσο συχνά για να πάρω τις λεπτομέρειες σχετικά με τη διαμόρφωση ενός rpi. Μου αρνήθηκαν την είσοδο και σκέφτηκα: "Θεέ μου, ποτέ δεν μου έχουν αρνηθεί την είσοδο σε κάτι στο Βερολίνο. Πρόκειται για μια νέα βαλκανοποίηση του δικτύου;" Όχι. Είναι ότι εμφανίστηκα σε λάθος μέρος, αποπροσανατολισμένος από τη Χάρυβδη, τυφλωμένος από τη Σκύλλα....
      
      

      It's interesting for me to see the proper nouns that are the protoliths of what we know in English.

      Σκύλλα
      never looked more imposing....

      Cheers,

        good start!

        # at this point consider adding all your parameters into a hash and # pass that to http_tiny($options) instead of passing a long list whic +h # may contain optional parameters. my %h; $h{infile} = $infile; ...

        Note that you can pack all into a hash from the beginning avoiding myriads of loose variables:

        my %h; # or %params ... Getopt::Long::GetOptions( 'infile=s' => \$h{infile}, # or more flexible: 'outfile=s' => sub { $h{$_[0]} = $_[1] }, # the anonymous sub above will be called with 2 params # when --outfile is detected: the key (outfile) and its value ... }

        Also, the sub get_... (secrets) is a good example where error checking is important. The sub itself does not check whether it found the secrets pair in the config file or whether it managed to open the file. It returns a pair of possibly undefined values (if it does not bomb midway because of IO errors). And you do not check its return value: it will be a pair but will it have defined values? Personally I prefer to pass into subs hash/array refs and return back hash/array refs. In this way if it returns undef then I know an error occured.

        sub get_... { if( /error/ ){ print STDERR "errors"; return undef } return [$url, $key]; } # use it my $ret = get_...(); if( ! defined $ret ){ print STDERR "call to get_...() has failed"; exi +t(1) } my ($url, $key) = @$ret;

        Apropos the translation, it is impressively good. The *$&?^%$^* will sell the rope with which to be "hanged" but with the current logistics standstill, keep spinning those yarns on the ol'spindle, with Perl, hehehe

        bw, bliako