When shadox posted his JPEG Files ReSize with output in Spanish, and bladx replied with "Converting the Spanish text to English, (for easier readability for English-speaking persons. Or perhaps, the english equivilents maybe?) :)", that got me thinking.

After spending most of the weekend working on maintaining my computer, I finally got around to trying this.

Obviously, a Perl program can use the native or normal multi-language features of the platform. In Win32, that means using a "resource" file, and since Perl doesn't use PE (COFF) files for the executable, it would have to be an additional file. And you would need tools for making it etc. You could have some kind of external file listing all the strings per language, but no matter what format that file's in, you need to replace all the strings in the program with ID values. That can make things less-than-readable, especially for simple scripts that have lots of on-the-fly output (as opposed to labels on widgets loaded from a database).

So, I think it will help if all possible forms are supplied in the program's text, near where it is used. Ideally, it could be done right where the string would have gone.

Here is my proof-of-concept:

use strict; use warnings; use LangString; my $s1= new LangString { English => "Error opening directory", Spanish => "Error abriendo directorio" }; print "My string is $s1\n"; $LangString::Language= 'Spanish'; print "My string is $s1\n";
I wonder if anyone can further improve on the syntax and usability?

Here is the code:

use strict; use warnings; package LangString; our $Language= 'English'; use overload '""' => \&string; sub new { my $class= shift; my $self= shift; bless $self, $class; return $self; } sub string { my $self= shift; return $$self{$Language}; } 1;
It's simplistic, because it's just a concept. In a full implementation, the default would automatically obtain the language from the OS, and strings would be checked for absense and use an inheritance mechanism (e.g. British and American only specified when there are differences, but English specified when it works for both) and other error checking.


Summary of commentary and additional thoughts:

Replies are listed 'Best First'.
Re (tilly) 1: Perl Programs That Support Multiple (Human) Languages
by tilly (Archbishop) on Aug 27, 2001 at 08:47 UTC
    While I can understand the impulse that leads to an attempted quick and dirty solution, keeping different versions of the text near where they are used will lead to an unhealthy intertwining of internationalization and everything else in your program. So adding a new language means auditing the whole thing each time. Instead the people that I know have tackled internationalization are uniform in their belief that what you really want to do is create a separation of presentation and content.

    The pattern that I have most often seen mentioned for doing this is called Model-View-Controller or MVC for short. You have a different view for each language you support.

    Does this take more work and thought up front to implement? Yes. But as with any programming problem, unless you have factored out what needs to change, changing it will be a very hard task. And if you are going to create an international application, you won't just be asked to do it in English and French. Rather you will need to go do it in a lot of languages, and many of those will be added by people chosen more for their knowledge of those languages than for programming skill...

      I did it both ways. Dirty way is quick, but... dirty. Whan you face possibility that you can sell your system (40MB of source code) to Italy, but you need to translate it... you are happy later that they did not bought it.

      And before that, another system designed to be multi-lingual from very beginning. It was pain start developing application by creating dictionary, and adding every stupid message as database entry. But translation was easy: Person knowing both accounting AND german translated our system into german in about two weeks. Compile it in new language environment, resolve some screen issues (some german words are much longer), and you are done! Piece of cake!

      When thinking about multi-language system, you need to think also about translating abbreviations for codes. I.e. "BK" is excellent abbreviation for "Bankruptcy" in english, but not in italian. So your need to translate easy mnemonic status codes into other language, too. It is long way before settling on Standard Intragalactic Language...;-)

      To make errors is human. But to make million errors per second, you need a computer.

      Model-View-Controller helps generate and sychronize multiple views of an underlying data model, but even using MVC, you still run into internationalization and localization issues. Unless your friends have come up with some similar pattern that's lighter weight, I suspect they're generating a lot of code in the interest of supporting multiple languages.

      I've worked inside of several internationalized applications. All have been able to get by with a combination of separate "externalized strings" files for each supported language, plus a few extras, such as the equivalent of a custom sprintf that knew how to dynamically alter the order (position) of substitutions.

        In the early 90's, for an embedded PC that controlled a scanning densitometer, I came up with a C++ class that fit our needs.

        An associative array was indexed by a string value, which could have some nmenonic value to the code it's called from, and is easier to make unique and maintain in a large program than sequencial ID numbers as favored by MicroSoft.

        The "hash" was populated from a translation file at program startup. The format of the file was the key followed by each available translation.

        I liked the idea of having all translations of one string together, rather than a different file for each language like some schemes. I think Win32's language stuff (using FormatMessage, not the string table resource) works that way too, but is not nearly as powerful. FormatMessage has the dynamic positioning of strings like you mention; I don't remember how my old program handled that.

        In a modern Perl solution, I think we can address the dynamic ordering easily enough:

        If you pulled in the string with a function call, such as $x->format ($a,$b,$c); and the string contained markers $1, $2, and $3, you get the idea. But function calls don't interpolate in Perl 5, so using a tied hash for syntactic sugar might help.

        I'd also like to see the format function automatically take care of changes based on the actual paramters, such as number and gender. E.g. $x->format(17) would cough up "There were 17 files processed." but $x->format(1) would produce "There was 1 file processed". Changes in words due to singular/plural is so common that an escape code could specify that. E.g. the translation string holds 'There $?n1(were,was,were) $1 file$?n1(,,s) processed.'. $ followed by digit is a substitution. $? followed by code/flags/etc then digit is fancy stuff: ?n for "number", followed by forms for zero,singular,plural cases.


      I agree, for a major application keeping it separate is a good idea. But for a script like the one that inspired this, there are only a couple strings and to read and understand the script it's clearer in line than to always reference back to the ID.

      The work I do for Kodak supports English, Spanish, Italian, Brazilian Portugese, German, French, and Japanese.

      In a script like Shadox's, it's pretty clear that only two language may be standard for his shop. Likewise in Canada and Texas border towns. That is, the problem set is indeed farily well known ahead of time.

      So my thoughts at this point are: Major software engineering project with a full-blown UI (possibly GUI) should keep the UI elements (including strings) separate from the program logic. But a simple script with a few strings for prompts, warnings, and short results might be more easily done in-line.

      Note that my scheme doesn't have to be in-line. All the strings can be grouped together at the top of the file, for example.


Re: Perl Programs That Support Multiple (Human) Languages
by cLive ;-) (Prior) on Aug 27, 2001 at 08:29 UTC
    Definitelty, I think it's worth your while downloading and studying the Discus bulletin board system and looking at how they've done it. Basically, each phrase used ANYWHERE is referenced by a var to a language conf file. They even cover issues where a phrase requires two tenses in one language, but only one in another. I was very impressed.

    cLive ;-)

    PS - in fact, the whole of their source is worth looking at for style gems, if you have the time.

Re: Perl Programs That Support Multiple (Human) Languages
by arhuman (Vicar) on Aug 27, 2001 at 11:07 UTC
    You may also want to check Multunil to handle the documentation too...

    But IMHO while we're diving into unicode/internationalization we should maybe think
    to handle (all of) this in the Perl's core ?
    (any project like this in Perl 6 ?)

    "Only Bad Coders Code Badly In Perl" (OBC2BIP)
Re: Perl Programs That Support Multiple (Human) Languages
by stefp (Vicar) on Aug 28, 2001 at 05:20 UTC
    you only scratched the surface of the problem because you eventually want your strings to have variable parts. And in different languages theses parts will be assembled in different orders so printf is of no help here. The library gettext helps you to handle that (as you mentionned) and the good news is: it seems to be supported by a Perl wrapper:


    -- stefp