comment on

To read freqrnc2011.csv into a Perl data structure, one could use Text::CSV_XS, which is smart enough to auto-decode UTF-8 bytes into Perl wide characters by default:

use Text::CSV_XS 'csv';
my @words = map { $_->{Lemma} } @{
 csv in => "freqrnc2011.csv", headers => "auto", sep_char => "\t"
};
[download]

Make sure to set an :encoding(...) PerlIO layer on your STDOUT when you work with (and print) wide characters.

I'll have to admit that wasn't able so far to read the .var files (which seem to contain the actual words mixed with binary data when read as CP-866) from the latter source without the use of Starling for DOS from the same website. We may have to contact the original author's son about the file format if you are interested in dictionaries from there.

In reply to Re^5: downloading a russian dictionary and getting matches with the arbitrary underpattern, a utility for crosswords by aitap
in thread downloading a russian dictionary and getting matches with the arbitrary underpattern, a utility for crosswords by Aldebaran

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


XP is just a number
	PerlMonks