http://qs321.pair.com?node_id=370039

mattihe has asked for the wisdom of the Perl Monks concerning the following question:

Hello! I got a problem that I cannot solve. For indexing purposes I would have to find out a way how to lowercase cyrillic letters... Is there some universal lowercase command that I could use or... Has anyone come across with these? -NewBie- This: $s =~ tr/\xA8\xC0-\xDF/\xB8\xE0-\xFF/; # lc RusCyr Win1251 was best for me, thanks

Replies are listed 'Best First'.
Re: Cyrillic problem...
by ccn (Vicar) on Jun 27, 2004 at 20:55 UTC
    You can use locale
    use locale; use POSIX qw (locale_h); setlocale(LC_CTYPE, 'Russian_Russia.1251'); print lc ('ПРИВЕТ');
    or use tr/// operator
    $s = "ПРИВЕТ"; $s =~ tr/\xA8\xC0-\xDF/\xB8\xE0-\xFF/; # lc RusCyr Win1251 print $s;
Re: Cyrillic problem...
by graff (Chancellor) on Jun 28, 2004 at 02:45 UTC
    If you aren't using utf8 for your Cyrillic data, what encoding are you using? (ISO8859-5? CP1251? KOI8?) Whatever it is, Perl 5.8 has a means built in to convert it into utf8. Once the text is in utf8 (or if you're using utf8 already), then it's really simple:
    $_ = lc();
    The conversion to utf8 (and back to whatever you started with) is also simple, using either a PerlIO layer or the Encode module -- e.g., if your input data is cp1251, you can open an input file like this:
    open( IN, "<:encoding(cp1251)", $filename ) or die $!;
    This way, the text you read in is automatically converted to perl's internal utf8 encoding, and all character-based operators and functions (lc, uc, length, substr, regexes, cmp, eq, and so on) will work the way you want them to, regardless of what language the text is in. Look through the Encode, PerlIO, perluniintro and perlunicode man pages for more details.
Re: Cyrillic problem...
by Joost (Canon) on Jun 27, 2004 at 20:47 UTC
Re: Cyrillic problem...
by PERLscienceman (Curate) on Jun 27, 2004 at 20:51 UTC
    Greetings Mattihe!
    Welcome to the world of Perl.
    Without any further information I would suggest that you visit the CPAN and search on the keyword cyrillic, and see what turns up. There may be a Perl Module there to suit your needs.
    Along with this perlmonks site, I have found CPAN to be an excellent source for all things related to Perl.