Re^2: Text::CSV encoding parse()

Replies are listed 'Best First'.
Re^3: Text::CSV encoding parse() by haukex (Archbishop) on Aug 13, 2019 at 19:44 UTC
Hi, yes I'm using the CGI module and have it properly set: `print $q->header(-charset => 'utf-8');` And as mentioned if I don't use Text::CVS the characters display correctly. Ok, but I'm sorry, there still isn't enough information to answer your question - have another look at my reply above, plus the links therein.	[reply] [d/l]
Re^4: Text::CSV encoding parse() by slugger415 (Monk) on Aug 14, 2019 at 17:43 UTC
Hello, ok here's as short and succinct a sample as I can create. use Text::CSV; use CGI; my($row) = "search/¿Cuales son las partes de una cadena de conexión??s +cope\|ids_jdbc_011.htm\|0\|1\|1\|0"; my $csv = Text::CSV->new ({ binary => 1, sep_char => "\|" }); my $q = new CGI; # print the HTML header and start html print $q->header; print $q->start_html; # first, print $row as is print $q->p("ROW: $row"); # next, parse with $csv $csv->parse($row); my @els = $csv->fields; # print the first field # this displays the black diamond ? symbol for ¿ and ó print $q->p("CSV Parse, field 0:",$els[0]); # split instead my(@splits) = split('\\|',$row); # print the first element in @splits. # As noted, this one displays properly in the browser. print $q->p("split 0:", $splits[0]); print $q->end_html; exit; [download] thanks ====================== UMM, update, when I actually ran the above in my http server I got the opposite results, but with weird errors. `ROW: search/Â¿Cuales son las partes de una cadena de conexiÃ³n??scope\| +ids_jdbc_011.htm\|0\|1\|1\|0 CSV Parse, field 0: search/¿Cuales son las partes de una cadena de con +exión??scope split 0: search/Â¿Cuales son las partes de una cadena de conexiÃ³n??sc +ope` [download] Paint me confused. In the real script, $row is coming from a @sorted_array from an SQL query. This is getting confusing so maybe I should withdraw my question.	[reply] [d/l] [select]
Re^5: Text::CSV encoding parse() by haukex (Archbishop) on Aug 14, 2019 at 19:55 UTC
I don't see any mention of any encoding in this code, which is not good. And earlier you said: "I'm using the CGI module and have it properly set: `print $q->header(-charset => 'utf-8');`" so I doubt this code is representative. You need to: Use a Perl version >= 5.12 and say `use feature 'unicode_strings';` or `use 5.012;` (or higher). If you have any non-ASCII characters in your Perl script, save it as UTF-8 and add the `use utf8;` directive at the top. Make sure your data is coming from the database properly encoded. As I linked to above, you can check this via Devel::Peek. If you need that output to go to the browser, see this. Make sure you are doing `binmode STDOUT, ':encoding(UTF-8)';` or `use open qw/:std :utf8/;`. Make sure you are telling your browser what encoding you are sending it. Text::CSV is not the problem: `use warnings; use strict; use Devel::Peek; use Text::CSV; my $str = "\N{U+20AC}\|\N{U+20AC}"; Dump($str); # ... UTF8 "\x{20ac}\|\x{20ac}" ... my ($s1,$s2) = split /\\|/, $str; Dump($s1); # ... UTF8 "\x{20ac}" ... Dump($s2); # ... UTF8 "\x{20ac}" ... my $csv = Text::CSV->new ({ binary => 1, sep_char => "\|" }); $csv->parse($str); my ($c1,$c2) = $csv->fields; Dump($c1); # ... UTF8 "\x{20ac}" ... Dump($c2); # ... UTF8 "\x{20ac}" ...` [download]	[reply] [d/l] [select]
Re^6: Text::CSV encoding parse() by slugger415 (Monk) on Aug 14, 2019 at 21:41 UTC
Re^7: Text::CSV encoding parse() by haukex (Archbishop) on Aug 15, 2019 at 06:43 UTC
Some notes below your chosen depth have not been shown here
Re^7: Text::CSV encoding parse() by jcb (Parson) on Aug 14, 2019 at 23:16 UTC
Re^5: Text::CSV encoding parse() by choroba (Cardinal) on Aug 14, 2019 at 19:35 UTC
In what encoding have you saved the source code? The recommended practice is to use UTF-8 and tell Perl that your source code contains non-ascii UTF-8 characters (i.e. use utf8). `map{substr$_->[0],$_->[1]\|\|0,1}[\\|\|{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^ARGV,3]`	[reply] [d/l]
Re^6: Text::CSV encoding parse() by slugger415 (Monk) on Aug 14, 2019 at 20:50 UTC
Re^3: Text::CSV encoding parse() by jcb (Parson) on Aug 14, 2019 at 03:24 UTC
That means that you are declaring to the browser that your output is UTF-8. Is it actually* UTF-8?*	[reply]
Re^4: Text::CSV encoding parse() by slugger415 (Monk) on Aug 14, 2019 at 17:48 UTC
Hi Jeb, not sure, but please see my sample https://www.perlmonks.org/?node_id=11104415 I'm getting this string returned from a database via SQL	[reply]
Re^5: Text::CSV encoding parse() by afoken (Chancellor) on Aug 14, 2019 at 23:03 UTC
not sure Get information. If you have no better idea, use the `dumpstr()` function in t/UChelp.pm from DBD::ODBC. Just copy the few lines into your code and print its result for each string that should be UTF-8. Alexander -- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)	[reply] [d/l]
Re^5: Text::CSV encoding parse() by jcb (Parson) on Aug 14, 2019 at 23:55 UTC
Seconding afoken here — if you do not know, find out! Try the `hexdump` function from this sample: (lightly tested, hopefully correct) hexdump-test.pl: `#!/usr/bin/perl use strict; use warnings; # given: string of bytes # return: hexdump of argument sub hexdump ($) { use bytes; my @bytes = map {[$_, ord]} split //, shift; return '['.join(' ', map {sprintf('%02x', $_->[1])} @bytes).']' .'\|'.join('', map { ($_->[1] >= 0x20 && $_->[1] < 0x7F) ? $_->[0] : '.' } @bytes).'\|' } use utf8; my $text = q[search/¿Cuales son las partes de una cadena de conexión?? +scope]; print hexdump($text), "\n"; __END__` [download] Sample output: `[73 65 61 72 63 68 2f c2 bf 43 75 61 6c 65 73 20 73 6f 6e 20 6c 61 73 +20 70 61 72 74 65 73 20 64 65 20 75 6e 61 20 63 61 64 65 6e 61 20 64 +65 20 63 6f 6e 65 78 69 c3 b3 6e 3f 3f 73 63 6f 70 65]\|search/..Cuale +s son las partes de una cadena de conexi..n??scope\|` [download] I am fairly sure that if `hexdump` dies, the string you gave it was definitely not UTF-8. :-)	[reply] [d/l] [select]
Re^6: Text::CSV encoding parse() by afoken (Chancellor) on Aug 15, 2019 at 17:38 UTC
Re^7: Text::CSV encoding parse() by jcb (Parson) on Aug 15, 2019 at 22:33 UTC


Syntactic Confectionery Delight
	PerlMonks