Re^10: Text::CSV encoding parse()

Replies are listed 'Best First'.
Re^11: Text::CSV encoding parse() by Tux (Canon) on Aug 21, 2019 at 06:48 UTC
So here it gets interesting. Is it possible to get us that list of url's online somewhere so I/we could test on them? If not, would it be possible to install Data::Peek and show me/us the output of `foreach my $row (@sorted_urls) { DPeek ($row); $csv->parse ($row); my @csv = $csv->fields; #using Text::CSV my @row = split m/\\|/ => $row; #using split on same $row DPeek "CSV: $csv[0]"; DPeek "SPLIT: $row[0]"; }` [download] And I also have no idea what `$q->p ()` has as influence on the output and I also guess that `%q->p` is a typo. Enjoy, Have FUN! H.Merijn	[reply] [d/l] [select]
Re^12: Text::CSV encoding parse() by slugger415 (Monk) on Aug 21, 2019 at 17:56 UTC
I can't really give you the whole shebang but here are a couple of URLs, including the first one which has the spanish characters. `https://www.ibm.com/support/knowledgecenter/es/search/¿Cuales son las +partes de una cadena de conexión??scope=SSGU8G_12.1.0\|https://www.ibm +.com/support/knowledgecenter/es/SSGU8G_12.1.0/com.ibm.jdbc_pg.doc/ids +_jdbc_011.htm\|0\|1\|1\|0 https://www.ibm.com/support/knowledgecenter/search/onsmsync?scope=SSGU +8G_12.1.0\|https://www.ibm.com/support/knowledgecenter/SSGU8G_12.1.0/c +om.ibm.sec.doc/ids_lb_002.htm\|1\|1\|1\|1` [download] Thanks!	[reply] [d/l]
Re^13: Text::CSV encoding parse() by Tux (Canon) on Aug 22, 2019 at 08:48 UTC
The problem with you pasting the data here inside the code tags, does not reflect the binary compatibility of your actual data. If I download this snippet, the code works fine: $ cat test.csv https://www.ibm.com/support/knowledgecenter/es/search/�Cuales s +on las partes de una cadena de conexi�n??scope=SSGU8G_12.1.0\|h +ttps://www.ibm.com/support/knowledgecenter/es/SSGU8G_12.1.0/com.ibm.j +dbc_pg.doc/ids_jdbc_011.htm\|0\|1\|1\|0 https://www.ibm.com/support/knowledgecenter/search/onsmsync?scope=SSGU +8G_12.1.0\|https://www.ibm.com/support/knowledgecenter/SSGU8G_12.1.0/c +om.ibm.sec.doc/ids_lb_002.htm\|1\|1\|1\|1 $ perl -CEO -MData::Peek -MText::CSV_XS -wE'my$c=Text::CSV_XS->new({se +p_char=>"\|",auto_diag=>1,binary=>1});while(<>){$c->parse($_);DPeek fo +r$c->fields}' test.csv PV("https://www.ibm.com/support/knowledgecenter/es/search/\277Cuales s +on las partes de una cadena de conexi\363n??scope=SSGU8G_12.1"...\0) PV("https://www.ibm.com/support/knowledgecenter/es/SSGU8G_12.1.0/com.i +bm.jdbc_pg.doc/ids_jdbc_011.htm"\0) PV("0"\0) PV("1"\0) PV("1"\0) PV("0"\0) PV("https://www.ibm.com/support/knowledgecenter/search/onsmsync?scope= +SSGU8G_12.1.0"\0) PV("https://www.ibm.com/support/knowledgecenter/SSGU8G_12.1.0/com.ibm. +sec.doc/ids_lb_002.htm"\0) PV("1"\0) PV("1"\0) PV("1"\0) PV("1"\0) [download] The output is, as you could see, iso-8859-1 (latin1) instead of your expected utf-8, because the source data is iso-8859-1 (or a variety thereof) and does not require an upgrade to utf-8. You can however make the data utf-8 by decoding your source data: $ perl -CEO -MEncode=decode -MData::Peek -MText::CSV_XS -wE'my$c=Text: +:CSV_XS->new({sep_char=>"\|",auto_diag=>1,binary=>1});while(<>){$c->pa +rse(decode("utf-8",$_));DPeek for$c->fields}' test.csv PV("https://www.ibm.com/support/knowledgecenter/es/search/\357\277\275 +Cuales son las partes de una cadena de conexi\357\277\275n??s"...\0) +[UTF8 "https://www.ibm.com/support/knowledgecenter/es/search/\x{fffd} +Cuales son las partes de una cadena de conexi\x{fffd}n??scope=SSGU8G_ +12.1.0"] PV("https://www.ibm.com/support/knowledgecenter/es/SSGU8G_12.1.0/com.i +bm.jdbc_pg.doc/ids_jdbc_011.htm"\0) PV("0"\0) PV("1"\0) PV("1"\0) PV("0"\0) PV("https://www.ibm.com/support/knowledgecenter/search/onsmsync?scope= +SSGU8G_12.1.0"\0) PV("https://www.ibm.com/support/knowledgecenter/SSGU8G_12.1.0/com.ibm. +sec.doc/ids_lb_002.htm"\0) PV("1"\0) PV("1"\0) PV("1"\0) PV("1"\0) [download] Note that Text::CSV_XS only decodes to utf-8 if it needs to or is explicitly told to: it needs to be able to deal with pure binary data. Enjoy, Have FUN! H.Merijn	[reply] [d/l] [select]
Re^11: Text::CSV encoding parse() by haukex (Archbishop) on Aug 20, 2019 at 21:52 UTC
Sadly problem still exists, yes. To summarize: That's strange. Could you please show a complete example of the code that doesn't work, i.e. the full script along with its output? Also, to play it safe, try upgrading your installations of Text::CSV and Text::CSV_XS.	[reply]
Re^12: Text::CSV encoding parse() by slugger415 (Monk) on Aug 21, 2019 at 18:12 UTC
Hi @haukex sorry I can't give you the whole script due to security and privacy concerns, but I can give you the salient parts of it. #execute the query using Aginity Workbench; output saved to flat filfe my($res) = system($cmd); ### read the output my(@urls); my($header); open my $fh, "<:encoding(utf8)", "$resultsFile" \|\| die("cannot open re +sults file $resultsFile for reading.\n cmd: $cmd"); my($c)=0; # just here for counting my($d)=0; # just here for counting while(<$fh>){ $c++; if($c == 1) { # get header row $header = $_; } if ($_ =~ /\/search\//){ push(@urls, $_); } else{ $d++; } } close($fh); # sort @urls based on the search string # e.g. https://www.ibm.com/support/knowledgecenter/es/search/¿Cuales s +on las partes de una cadena de conexión??scope=SSGU8G_12.1.0\|https:// +www.ibm.com/support/knowledgecenter/es/SSGU8G_12.1.0/com.ibm.jdbc_pg. +doc/ids_jdbc_011.htm\|0\|1\|1\|0 my @sorted_urls = map { $_->[0] } sort { $a->[1] cmp $b->[1] } map { m\|/search/\s*([^\?]+)\?\|; [$_, $1] } @urls; # parse and print print $q->header(-charset => 'utf-8'); print $q->start_html( -title => 'SearchME', -style=>{'src'=>$stylesheet}); print $q->start_table(); foreach my $row (@sorted_urls){ # print TEMP $row; $csv->parse($row); print "<tr>"; $count++; my @els = $csv->fields; my(@splits) = split('\\|',$row); $els[0] =~ /\/search\/(.+)\?scope=/i; my($term) = $1; my($link) = $els[0]; print "<td>"; # print $link; print $q->a({-href=>$link,-target=>'_blank'},$term); print "</td>"; # print other @fields here inside <td></td> } print $q->end_table, $q->end_html; [download] oh and I reinstalled both modules.	[reply] [d/l]
Re^13: Text::CSV encoding parse() by haukex (Archbishop) on Aug 21, 2019 at 20:12 UTC
Hi @haukex sorry I can't give you the whole script due to security and privacy concerns, but I can give you the salient parts of it. I understand, but please understand that we do need to be able to reproduce the issue you're having on our end, which doesn't require you to disclose any secrets, but it does require you to give us something representative that is runnable as-is (standalone). For example, in what you've posted here, I don't see whether you've changed `STDOUT` to UTF-8, I don't see any of the Data::Dumper output that I provided in my example code (which is essential to debugging encoding issues), you don't show the output this script is producing on your end, and so on. If you take the time to read and understand Short, Self-Contained, Correct Example and I know what I mean. Why don't you?, we might be able to help you further, but I'm sorry, as it stands there simply isn't enough coherent information to help you.	[reply] [d/l]
Re^14: Text::CSV encoding parse() by slugger415 (Monk) on Aug 21, 2019 at 22:22 UTC
Re^15: Text::CSV encoding parse() by hippo (Bishop) on Aug 22, 2019 at 08:49 UTC
Some notes below your chosen depth have not been shown here


Perl: the Markov chain saw
	PerlMonks