Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
Despite the title of this reply i'm still interested in any more guidance on how to do this properly. Since jeffa asked, this problem involves giving framechat the ability to translate the chatterbox in real time using babelfish.

I couldn't figure out how to implement an HTML::TreeBuilder solution so i went with DamnDirtyApe's code because it works and resembles the way i had tried and failed to solve the problem. His example works as posted but had problems when tested in the wild. So it evolved into the following working example. Two problems: i had to use two global variables, and the links break sometimes when the translation mixes up the order of the html placeholders. Testing with the live chatterbox today revealed that the second problem is relatively rare, but i'm still thinking about how to prevent it.

This test script translates a small sentence of English into a random European language while preserving the HTML:

#!/usr/bin/perl -w use strict; use WWW::Babelfish; use CGI 'header'; use Data::Dumper; my %html = (); my $C = 1; my @langs = qw(German French Spanish Italian Portuguese); shuffle(\@langs); my $str = qq~This <b>contains</b> both text and <a href="http://www.w3c.org"> +html</a>.~; $str = translate($str); print "$langs[0]: ", $str, '<pre>', Dumper(\%html); sub translate { my ($txt) = @_; $txt =~ s|<([^>]+)>|savetags($1)|eg; # weak parsing :-/ my $obj = new WWW::Babelfish( 'agent' => 'DeBabelizer' ); return $_[0] unless defined($obj); my $ttxt = $obj->translate( 'source' => 'English', 'destination' => $langs[0], 'text' => $txt ); return $_[0] unless defined($ttxt); $ttxt = encode($ttxt); # replace placeholders with corresponding html # need ;? cause the fish sometimes screws up that colon $ttxt =~ s|\&lt\;(\d+)\&gt\;?|<$html{$1}>|g; return $ttxt } sub savetags { # replace html tag content with placeholders my $htm = pop; $html{$C} = $htm; # ack, global $_ = '&lt;'.$C.'&gt;'; $C++; # global return $_ } ## and two third-party subs that add to the fun sub shuffle { # Perl Cookbook recipe 4.17 my $array = shift; for(my$i = @$array; --$i;){ my$j = int rand ($i+1); next if $i == $j; @$array[$i,$j] = @$array[$j,$i] } } sub encode { # UTF-8 to latin1 regex from XML::TiePYX (thanks to mirod) my($text) = @_; $text =~ s{([\xc0-\xc3])(.)}{ my $hi = ord($1); my $lo = ord($2); chr((($hi & 0x03) <<6) | ($lo & 0x3F)) }ge; return $text; }
Thanks to everyone who replied to my original query!

--
Check out my Perlmonks Related Scripts like framechat, reputer, and xNN.


In reply to Temporarily strip HTML (solved!) by epoptai
in thread Temporarily strip HTML by epoptai

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (9)
As of 2024-04-18 13:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found