http://qs321.pair.com?node_id=183517


in reply to Temporarily strip HTML

Despite the title of this reply i'm still interested in any more guidance on how to do this properly. Since jeffa asked, this problem involves giving framechat the ability to translate the chatterbox in real time using babelfish.

I couldn't figure out how to implement an HTML::TreeBuilder solution so i went with DamnDirtyApe's code because it works and resembles the way i had tried and failed to solve the problem. His example works as posted but had problems when tested in the wild. So it evolved into the following working example. Two problems: i had to use two global variables, and the links break sometimes when the translation mixes up the order of the html placeholders. Testing with the live chatterbox today revealed that the second problem is relatively rare, but i'm still thinking about how to prevent it.

This test script translates a small sentence of English into a random European language while preserving the HTML:

#!/usr/bin/perl -w use strict; use WWW::Babelfish; use CGI 'header'; use Data::Dumper; my %html = (); my $C = 1; my @langs = qw(German French Spanish Italian Portuguese); shuffle(\@langs); my $str = qq~This <b>contains</b> both text and <a href="http://www.w3c.org"> +html</a>.~; $str = translate($str); print "$langs[0]: ", $str, '<pre>', Dumper(\%html); sub translate { my ($txt) = @_; $txt =~ s|<([^>]+)>|savetags($1)|eg; # weak parsing :-/ my $obj = new WWW::Babelfish( 'agent' => 'DeBabelizer' ); return $_[0] unless defined($obj); my $ttxt = $obj->translate( 'source' => 'English', 'destination' => $langs[0], 'text' => $txt ); return $_[0] unless defined($ttxt); $ttxt = encode($ttxt); # replace placeholders with corresponding html # need ;? cause the fish sometimes screws up that colon $ttxt =~ s|\&lt\;(\d+)\&gt\;?|<$html{$1}>|g; return $ttxt } sub savetags { # replace html tag content with placeholders my $htm = pop; $html{$C} = $htm; # ack, global $_ = '&lt;'.$C.'&gt;'; $C++; # global return $_ } ## and two third-party subs that add to the fun sub shuffle { # Perl Cookbook recipe 4.17 my $array = shift; for(my$i = @$array; --$i;){ my$j = int rand ($i+1); next if $i == $j; @$array[$i,$j] = @$array[$j,$i] } } sub encode { # UTF-8 to latin1 regex from XML::TiePYX (thanks to mirod) my($text) = @_; $text =~ s{([\xc0-\xc3])(.)}{ my $hi = ord($1); my $lo = ord($2); chr((($hi & 0x03) <<6) | ($lo & 0x3F)) }ge; return $text; }
Thanks to everyone who replied to my original query!

--
Check out my Perlmonks Related Scripts like framechat, reputer, and xNN.