Hi all. I have some tweets in Russla/Ukrainian/Bulgarian in a database on another machine. I have extracted ones I want to translate using google cloud translate, which is installed on another machine. I loaded them into an array of hashes, dumped them into a Storable file, and moved that to the other machine. I am now trying to load these into a MySQL database and an getting an error I can't figure out on the other machine. Here is the code (I set the die on $sth->execute to print the error and offending values):
use DBI;
use Storable qw(retrieve);
$| = 1;
my $j = retrieve("/mnt/c/temp/to-translate.sto");
my $dbh = connectdb('****','****','****','****','****');
foreach my $i (0..@$j-1) {
my $r;
$r->{tid} = $j->[$i]->{id};
$r->{orig} = $j->[$i]->{text};
#print "$r->{tid}\t$r->{orig}\n";
insertsql($dbh,'translate',$r);
}
sub connectdb { # connects to mysql or PgPP
my ($database,$user,$password,$driver,$server) = @_;
unless ($driver) {
$driver = "mysql";
}
unless ($server) {
$server = "hdshcresearch.asu.edu";
}
my $url = "DBI:$driver:$database:$server";
unless ($user) {
$user = "root";
$password = "research.HDSHC.mysql";
}
my $dbh = DBI->connect( $url, $user, $password ) or die "connectdb
+ can't connect to mysql: $!\n";
return $dbh;
}
sub insertsql {
my ($dbh,$table,$data,$ignore) = @_;
my @qm;
my @keys;
my @values;
my $i = -1;
foreach my $k (keys %$data) {
if (defined($data->{$k})) {
$i++;
$keys[$i] = $k;
$values[$i] = $data->{$k};
$qm[$i] = '?';
}
}
my $keylist = join(",",@keys);
my $qlist = join(",",@qm);
my $sqlstatement = "insert into $table ($keylist) values ($qlist)"
+;
if ($ignore) {
my $sqlstatement = "insert ignore into $table ($keylist) value
+s ($qlist)";
}
my $sth = $dbh->prepare($sqlstatement);
#$sth->execute(@values) || die "putsql could not execute MySQL sta
+tement: $sqlstatement $sth->errstr";
$sth->execute(@values) || die $sth->errstr. " ".join(" ",@values);
$sth->finish();
return $dbh->{'mysql_insertid'};
}
The encoding on the original db is utf8, and so are the table and columns on the target db. When it gets to one particular item it croaks:
Incorrect string value: '\xF0\x9F\x98\x84 "...' for column 'orig' at row 1 530248086468063232 Нужно срочно брать на роботу. Цель для него есть 😄 "Рассекречена личность морпеха застрелившего бин Ладена" at /home/steve/load-tweets.pl line 61.
Is the problem the emoticon? If so, how can I filter these out?