Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

(OT) Marking up alternatives

by UnderMine (Friar)
on May 29, 2006 at 15:54 UTC ( [id://552330]=perlquestion: print w/replies, xml ) Need Help??

UnderMine has asked for the wisdom of the Perl Monks concerning the following question:

Is it possible to mark up alternative translations effectively in HTML/XML?

Further thoughts in the Extracting appropriate language text from HTML data thread highlighted the fact different sections of a document may have different translations available (Thanks john_oshea). The following example markup works fine if you only request English but either French or Italian messes up badly.

<locale lang="en">English Part 1</locale> <locale lang="fr">French Part 1</locale> <locale lang="en">English Part 2</locale> <locale lang="it">Italian Part 2</locale>
Extracting English returned :-
<locale lang="en">English Part 1</locale> <locale lang="en">English Part 2</locale>
However extracting French returned :-
<locale lang="fr">French Part 1</locale>
And extracting Italian returned :-
<locale lang="it">Italian Part 2</locale>

Both of these are very wrong. Are there any standard ways of solving this problem?

A logical extention of my markup might be to wrap each section in a wrapper tag.

<locale> <locale lang="en">English Part 1</locale> <locale lang="fr">French Part 1</locale> </locale> <locale> <locale lang="en">English Part 2</locale> <locale lang="it">Italian Part 2</locale> </locale>

Alternatively giving each locale block an id would have a similar effect.

<locale lang="en" id="part1">English Part 1</locale> <locale lang="fr" id="part1">French Part 1</locale> <locale lang="en" id="part2">English Part 2</locale> <locale lang="it" id="part2">Italian Part 2</locale>
Both of these would work for graphic designers and people who are used to editing HTML but what would be the most intuative solution for a normal person.

Thanks
UnderMine

Edited to mark Off Topic and fix links. UnderMine

2006-06-03 Retitled by g0n, as per Monastery guidelines
Original title: 'OT Marking up alternatives'

2006-06-03 Retitled by g0n, as per Monastery guidelines
Original title: 'Marking up alternatives'

Replies are listed 'Best First'.
Re: (OT) Marking up alternatives
by dragonchild (Archbishop) on May 30, 2006 at 02:59 UTC
    Part of your issue is that you have an incomplete document. It should really read:
    <locale lang="en">English Part 1</locale> <locale lang="fr">French Part 1</locale> <locale lang="it"></locale> <locale lang="en">English Part 2</locale> <locale lang="fr"></locale> <locale lang="it">Italian Part 2</locale>
    That way, you have a copy for each language in each section. Alternately, you can provide additional semantic meaning by explicating the sections, as you mentioned as your first option. However, I wouldn't make it locale-specific. I'd do something like:
    <section id="part1"> <locale lang="en">English Part 1</locale> <locale lang="fr">French Part 1</locale> </section> <section id="part2"> <locale lang="en">English Part 2</locale> <locale lang="it">Italian Part 2</locale> </section>

    Now, you have a place to hang other stuff, like formatting. Don't get tunnel-vision.


    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
      I agree that we are talking about incomplete documentation and this is one of the biggest problems with translated documents. People have a tendency of only having the bits they need translated. This leaves you with partial translations. At the moment we use backout languages to find the most appropriate language but which raised the original question.

      One of the functions the system has to be able to do is report in translation coverage (ie how many documents are in all/some/one language). Your first markup makes it possible to run coverage reports that would show that some translations are missing. However the second is a lot clearer and would allow easy identification of exactly what part of the translation is missing.

      Thanks for the feedback
      UnderMine

Re: (OT) Marking up alternatives
by rhesa (Vicar) on May 29, 2006 at 20:57 UTC
    I would definitely vote for the extra wrapper tag. Otherwise your extracting code has no way of telling which tag group it should use to fall back on in case of a missing language.

    Having multiple tags with the same ID seems counter-intuitive to me, and probably would confuse parsers as well. ID's are supposed to be unique, I think.

A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://552330]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (4)
As of 2024-04-25 13:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found