Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

downloading a russian dictionary and getting matches with the arbitrary underpattern, a utility for crosswords

by Aldebaran (Curate)
on Dec 12, 2020 at 01:35 UTC ( [id://11125044]=perlquestion: print w/replies, xml ) Need Help??

Aldebaran has asked for the wisdom of the Perl Monks concerning the following question:

For the second day of perlukkah, I would like to extend the following script I shall post. In the cleverer parts of this one will see tybalt87's style and idioms, many of which I struggle with as only an intermediate practioner of perl. I won't post output, but what I want to have is this same capability in Russian. Campouts used to be my holy time for my russian crosswords, but 2020 kind of mucked that up. I'm pleased to announce that rain is saturating Clackamas county in Oregon in what would other wise be a hard day outside.

Anyways, let me stop blabbling, and get to the perl of it:

#!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11116556 use warnings; use Path::Tiny; use 5.016; ### adding grown-up perl logging use Log::Log4perl; # get rid of old one my $file = '/home/hogan/Documents/hogan/logs/4.log4perl.txt'; unlink $file or warn "Could not unlink $file: $!"; my $log_conf4 = "/home/hogan/Documents/hogan/logs/conf_files/4.conf"; Log::Log4perl::init($log_conf4); #info my $logger = Log::Log4perl->get_logger(); $logger->info($0); my $dict_path = '/home/hogan/Documents/hogan/my_data/upwords.10'; my $dictionary = path($dict_path); my @trials = ( ' f e ', 'w r e ', ' w r e', 'h og', ); for my $string (@trials) { $logger->info("============"); say "$string"; $logger->info($string); my @tiles = map +( 'a' .. 'z' )[ rand 26 ], 1 .. 7; print "tiles: @tiles\n"; $logger->info("tiles: @tiles\n"); my $pat = join '', map "$_?", sort @tiles, $string =~ /\w/g; my $tilepat = join '', map "$_?", sort @tiles; my $letters = join '', @tiles, $string =~ /\w/g; print "pat: $pat\ntilepat: $tilepat\nletters: $letters\n"; $logger->info("$pat\ntilepat: $tilepat\nletters: $letters\n"); my @matches = grep { ( join '', sort split // ) =~ /^$pat$/ } $dictionary->slurp =~ /^[$letters]{2,}$/gm; say "matches are @matches"; my @places; ( $string =~ tr/ /./r ) =~ /(?<!\w).{2,}(?!\w)(?{ push @places, $& } +)(*FAIL)/; @places = grep /\w/, @places; use Data::Dump 'dd'; dd \@places; my @matches1 = extension( 1, @places, @matches, $tilepat ); print "@matches1\n\n"; say "=============="; my @matches2 = extension( 2, @places, @matches, $tilepat ); print "@matches2\n\n"; say "=============="; } sub extension { my ( $id, @places, @matches, $tilepat ) = @_; my @found; for my $placepat ( $id == 1 ? @places : map expand($_), @places ) { for my $match (@matches) { $logger->info("id: $id match: $match"); $match =~ /^$placepat$/ or next; $logger->info("id: $id placepat: $placepat"); $logger->info("matched: $match"); my $newtiles = $match & ( $placepat =~ tr/.a-z/\xff\0/r ); my ($hex) = unpack( 'H*', $newtiles ); $logger->info("hex is: $hex"); ( join '', sort $newtiles =~ /\w/g ) =~ /^$tilepat$/ and push @found, $match; } } return @found; } sub expand { grep /\w/, glob join '', map { /\w/ ? "{$_,.}" : $_ } split //, shif +t; }

Actually, I should post the .conf files.

$ ls 3.conf 4.conf $ cat 3.conf ###################################################################### +######### # Log::Log4perl Conf + # ###################################################################### +######### log4perl.rootLogger = DEBUG, LOG1, SCREEN log4perl.appender.SCREEN = Log::Log4perl::Appender::Screen log4perl.appender.SCREEN.stderr = 0 log4perl.appender.SCREEN.layout = Log::Log4perl::Layout::PatternLayou +t log4perl.appender.SCREEN.layout.ConversionPattern = %m %n log4perl.appender.LOG1 = Log::Log4perl::Appender::File log4perl.appender.LOG1.filename = /home/hogan/Documents/hogan/logs/3. +log4perl.txt log4perl.appender.LOG1.mode = append log4perl.appender.LOG1.layout = Log::Log4perl::Layout::PatternLayou +t log4perl.appender.LOG1.layout.ConversionPattern = %d %p %m %n $ cat 4.conf ###################################################################### +######### # Log::Log4perl Conf + # ###################################################################### +######### log4perl.rootLogger = INFO, LOG1, SCREEN log4perl.appender.SCREEN = Log::Log4perl::Appender::Screen log4perl.appender.SCREEN.stderr = 0 log4perl.appender.SCREEN.layout = Log::Log4perl::Layout::PatternLayou +t log4perl.appender.SCREEN.layout.ConversionPattern = %m %n log4perl.appender.LOG1 = Log::Log4perl::Appender::File log4perl.appender.LOG1.filename = /home/hogan/Documents/hogan/logs/4. +log4perl.txt log4perl.appender.LOG1.mode = append log4perl.appender.LOG1.layout = Log::Log4perl::Layout::PatternLayou +t log4perl.appender.LOG1.layout.ConversionPattern = %d %p %m %n $

Let me interject here that I heard the maintainer for this stopped doing so. I think one aspect of such maintenance comes in the form that the rest of us talk about it and use it. I've used it on windows and ubuntu, and it's a lifesaver on the former, and worthwhile stuff on the latter. These are straight from the documentation.

Q1) Does that role need to be filled?

Q2) Where do I find a link to download a modern russian dictionary like the one that google provides for english? I have hundreds of failures I could discuss. Let's not! Let me instead talk about what I need. An official word list for gamers of which there are zillions in Russia. There's millions of them here, and I love going into their shops, getting my torte's and the кроссворд. All words are in modern cyrillic. I'm not aware that's there's an argument about what that consists of, in particular with so many recent crossover words from english. I honestly don't know what they do with orthography anymore when they introduce Putin as Nash Leader. Also, I want what is current like as fresh as the confections I buy, although doing this entire exercise with Church Slavonic sounds like a fun task for Perlukkah 2021.

I think it'll be interesting to see what accomodation we have to make for the cyrillic. If we can't find sources from within Russia, well they've been exporting their language quite successfully, so I can well imagine that I could get qualified responses from eastern europe in particular where russian instruction was obligatory for many years. Under normal circumstances, I might waltz into the local libraries with this question, which have extensive russian holdings, events and staff, but not in lockdown, which is where we are on the covid scale now. Hence the need for perlukkah 2020.

Now I've got to answer marto in yesterday's question. That write-up is harder.

"Yay, it's raining for perlukkah."

  • Comment on downloading a russian dictionary and getting matches with the arbitrary underpattern, a utility for crosswords
  • Select or Download Code

Replies are listed 'Best First'.
Re: downloading a russian dictionary and getting matches with the arbitrary underpattern, a utility for crosswords
by aitap (Curate) on Dec 12, 2020 at 08:19 UTC
    Where do I find a link to download a modern russian dictionary like the one that google provides for english?
    Do you mean a glossary (Ru-Ru) or a translation dictionary (En-Ru)? For the former, check out the resources at http://gramota.ru. (I'm afraid there might be no downloads, but it's a start.) For the latter, I'm not sure there are legally redistributable dictionaries except mueller7-dict, which is an English → Russian dictionary.
      check out the resources at http://gramota.ru.

      I always go down the rabbit hole and never come back with a word list, but this was at least a very interesting rabbit hole, which was on the far end of the tabs: link to games resources.

      I'm so blown away by this as a resource, and it would be a great way to check a given word, but I'm yet to find what I need to start playing crosswords where I can populate an array with the entire language. I'm amazed at how small that is for english.

      This is what you get as output when you input пр***. It's 76 k, so don't say I didn't warn you.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11125044]
Approved by haukex
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (4)
As of 2024-04-25 19:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found