Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re^4: Extracting appropriate language text from HTML data

by UnderMine (Friar)
on May 28, 2006 at 21:55 UTC ( [id://552219]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Extracting appropriate language text from HTML data
in thread Extracting appropriate language text from HTML data

Thanks for that.

I am currently treating each paragraph seperately using panic_languages to back out where no direct translation is available.

You have raised an interesting point in relation to should there be some overall scheme that balences the paragraph readability against document readability. But to do this there has to be a relationship between alternate parts of the text.

The current markup does not show how alternate parts relate but just what language that chunk is in. A better markup would indicate alternate parts and group them together.

Thanks
UnderMine

  • Comment on Re^4: Extracting appropriate language text from HTML data

Replies are listed 'Best First'.
Re^5: Extracting appropriate language text from HTML data
by john_oshea (Priest) on May 29, 2006 at 12:15 UTC

    Given your database constraints, I'm not sure that you're going to come up with a 'better' solution. Given that not every chunk is available in all languages, you're (effectively) going to have to decide at each chunk what's going to be the 'best' piece of text to return at that point, and I can't at the moment see a more elegant way of doing that...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://552219]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (4)
As of 2024-04-19 23:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found