http://qs321.pair.com?node_id=217753

PetaMem has asked for the wisdom of the Perl Monks concerning the following question:

Dearest Monks,

for a web-project we need a nifty handling of language customisation for the users of a website. The site should make an attempt to guess the language of the user according to the IP where the request comes from. Alternatively this can be augmented by a hostname lookup and then examining the toplevel domain.

However - as for some domains you cannot be sure what language they correspond to - the IP solution should give better results. Yes - the user may change his defaults etc. its just the first guess if the user has no cookie set yet.

I vaguely remember, that there are some IP-chunk <-> country relation tables, but wonter if someone has already used them or if there already is some module.

bye,

PetaMem

Replies are listed 'Best First'.
DON'T Language Guess from IP
by dingus (Friar) on Dec 05, 2002 at 13:38 UTC
    Don't do it this way

    You will do better to check on the ACCEPT_LANGUAGE HTTP header which is produced as standard by all browsers. Otherwise you will annoy people like me who are English but work in Germany. Not to mention people working in multinational entities with an IP address range shared between multiple countries

    Dingus


    Enter any 47-digit prime number to continue.

      <Adrian puts on his usability hat>

      I second that, don't do it via IP address. It only annoys people. As well as people who work in one country, but come from another you also have people who prefer to use a particular language that may not be their native one. People also make these decisions on a site-by-site basis depending on the information the site provides, and the strength of its support for other languages.

      In fact, I'd go further and say that you shouldn't automate it at all. At least, don't do it unless the user can override it on the web site.

      I've done user testing on a few international sites and every single time some users have wanted to change the default language.

      Use ACCEPT_LANGUAGE to set the default, but allow the user to override it, or I guarantee some people will send you nasty feedback and you'll have to do it later :-)

      <Adrian goes away and tries to remember where he put his perl hat>

        In fact, I'd go further and say that you shouldn't automate it at all.

        Amen brother. That is one of the most annoying things I know - it actually bugs me more than popup ads do. I seem to recall that SourceForge and Debian are among the offenders, and thereby sites I rather avoid. Especially Debian really sucks, because even if I choose a new language (English, as it were), it forgets this as soon as I click a link.

        I want an opt-in solution for this kind of things, and preferable English as the main language. I am Swedish, but I don't usually appreciate translations, even if they are good, lots of context is lost in translations. And technical stuff is rarely suitable for translation at all, IMHO. Since so much of the technical lingo is English or derived directly from it, "thinking" in one language at the time is enough, thank you very much.

        Yeah, this is only how I feel about it. But I do so strongly. :)

        And yes, I never buy translated books either, not for fiction, nor for fact.


        You have moved into a dark place.
        It is pitch black. You are likely to be eaten by a grue.

        Here's a simple receipt on Doing It Right: return the site in the default language - usually English. Put a link to change to the guessed language and one to go to a language menu on the page; or put a language select dropdown on the page, and default it to the guessed language.

        That way, the viewer's own language is just one click away if it's guessed correctly and they want it.

        Makeshifts last the longest.

Re: Language Guess from IP
by rob_au (Abbot) on Dec 05, 2002 at 13:41 UTC
    A better approach than attempting to determine language preference from geographical location would be to make use of the Accept-Language header field as defined for HTTP/1.1 in RFC 2616.

    The module I18N::AcceptLanguage performs this task by matching language preference to available languages defined in this HTTP/1.1 header field.

    From the documentation for this module:

    use I18N::AcceptLanguage; my $supportedLanguages = [( 'en-us', 'fr' )]; my $acceptor = I18N::AcceptLanguage->new; my $language = $acceptor->accepts( $ENV{'HTTP_ACCEPT_LANGUAGES'}, $sup +portedLanguages );

     

    perl -le 'print+unpack("N",pack("B32","00000000000000000000000111110100"))'

Re: Language Guess from IP
by gjb (Vicar) on Dec 05, 2002 at 13:31 UTC

    Have a look at Apache::GeoIP or IP::Country. That leaves you with the nice task to decide what language to pick for which country. Especially nice for mine where you can choose between Dutch and French, the wrong guess won't be appreciated ;-)

    Hope this helps, -gjb-

    Update: I forgot about our German speaking community, so I'm typing this hanging from a tree (I hope you get my point ;-)

      Especially nice for mine where you can choose between Dutch and French, the wrong guess won't be appreciated ;-)
      Indeed. I was in Switzerland this summer, and google kept giving me www.google.ch, in German. I was connecting in the French speaking part of the country, but that didn't have any effect, of course - very annoying.

      Guessing may be OK for some things, but it's better if you let the user select...

      Michael

        Actually, Google goes by the Accept-Language header; for me it seems to do so, anyway. My browser is set to ask for English pages, and consequently I stay on google.com, even though I'm in Germany, where you're usually redirected to google.de.

        Makeshifts last the longest.

Re: Language Guess from IP
by FamousLongAgo (Friar) on Dec 05, 2002 at 13:33 UTC
    This sounds like a recipe for trouble. You are building in a lot of assumptions - that IP blocks do not cover more than one country, that each country has only one language, and that every user in that country is a native who speaks that language ( rather than a traveling businessman from abroad on dialup, for example ). Why not just provide alternate-language links at the top of the web page and forget the auto-detection?
Re: Language Guess from IP
by mirod (Canon) on Dec 05, 2002 at 13:31 UTC

    You should look at CPAN... a search on Geographical returns Geo::IP which looks quite a good fit for your problem.

Re: Language Guess from IP
by richardX (Pilgrim) on Dec 05, 2002 at 18:06 UTC
    AOL will throw your numbers off

    AOL uses proxy servers in Vienna and Mclean Virginia and all your AOL users will appear to be from Virginia. Depending on the type of website that you are running, (i.e. consumer products, corporate, or others) this traffic could affect your analysis. Anywhere from 5% to as much as 40%. So even if your AOL user is in another country, it will still show up as Virginia USA. This is a big gotcha that many people forget about.

    Richard

    There are three types of people in this world, those that can count and those that cannot. Anon

Re: Language Guess from IP
by mooseboy (Pilgrim) on Dec 05, 2002 at 15:41 UTC

    Just to support what some of the other respondents have already said -- please don't do this. My native language is English, but I live in a German-speaking country, and it is infuriating when a website guesses (incorrectly) on the basis of my IP address that I would prefer information in German. If you try to guess, and you get it wrong, you will a) annoy people and b) possibly lose customers. Always let your visitors make their own choices.

    HTH, mooseboy

    Update: There's a short but useful article from AG Consult on the subject.

Re: Language Guess from IP
by robot_tourist (Hermit) on Dec 05, 2002 at 17:38 UTC

    Front-door pages can be annoying, but this is where they become useful. A user can select the language initially, then you can set the cookie to remember this for again or the user can bookmark the page with the language they want.

    Don't get me wrong, primary language guess might be useful, but do make it easy to switch. An easier way might even be to have different domains e.g. de.foo.com or uk.foo.com or even easier: use www.foo.com/uk or www.foo.com/de.

    How can you feel when you're made of steel? I am made of steel. I am the Robot Tourist. Robot Tourist, by Ten Benson

      <usability hat again...>

      Front-door pages can be annoying, but this is where they become useful. A user can select the language initially, then you can set the cookie to remember this for again or the user can bookmark the page with the language they want.

      This is certainly one way of doing it, but make sure that you have some sort of 'change language' link on every page. Users can (and do) change their minds during a session, and you can find your way to URLs from many places :-)

      Also, make sure that the options you supply are languages (French, English, etc.), rather than countries (France, UK, USA, etc) or flags (Tri-colour, Union Jack, Stars and Stripes, etc.) - since there isn't a one-to-one mapping between languages and countries/flags.

      <quick change to IA hat...>

      As robot_tourist says, different URLs are a must.

      If you have the URL serve up different languages depending on the users preferences it can play merry hell with caches, proxies & search engines.

      <off to find my chef's hat - time for dinner! >