in reply to Re^5: Perl custom sort for Portuguese Lanaguage (updated x2) in thread Perl custom sort for Portuguese Lanaguage
I used Text::CSV to read the data file, but AFAIK it doesn't support ignoring comment lines.
This works for me:
csv (in => 'quux.csv', filter => {1 => sub { !/^#/ }});
Re^7: Perl custom sort for Portuguese Lanaguage
by haukex (Archbishop) on Jul 08, 2020 at 21:06 UTC
|
This works for me: csv (in => 'quux.csv', filter => {1 => sub { !/^#/ }});
Unfortunately that also filters lines whose first field is "#foo" (with the quotes). I remember Tux recently saying filtering before parsing wasn't supported, though I'm having trouble finding the reference at the moment (it could have been in the chatterbox too*). It may be a bit tricky because this is valid CSV too:
abc,"d
#e
f",ghi
(That's one row, ["abc", "d\n#e\nf", "ghi"].)
* Update: I looked again and I think it must have been in the chatterbox; I do distinctly remember someone having a similar question recently... | [reply] [d/l] [select] |
|
The meta info knows whether the field was quoted or not.
#!/usr/bin/perl
use warnings;
use strict;
use Text::CSV_XS;
my $csv = 'Text::CSV_XS'->new ({ binary => 1,
auto_diag => 1,
keep_meta_info => 1 });
open my $in, '<:encoding(utf8)', shift or die $!;
while (my $row = $csv->getline($in)) {
next if $row->[0] =~ m/^#/ && ! $csv->is_quoted(0);
$csv->say(*STDOUT, $row);
}
Tested with
#x,y,z skip
abc,"d
#e
f",ghi keep
#comment skip
a,b,c,#xyz keep
"#foo",x,y,z keep
map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
| [reply] [d/l] [select] |
|
use warnings;
use strict;
use Data::Dump;
use Text::CSV;
my $csv = Text::CSV->new({ binary=>1, auto_diag=>2,
keep_meta_info=>1, escape_char=>"\\" });
while ( my $row = $csv->getline(*DATA) ) {
dd $row, $csv->meta_info;
}
$csv->eof or $csv->error_diag;
__DATA__
foo,bar
"#foo","bar"
#foo,bar
\#foo,bar
| [reply] [d/l] [select] |
|
In this special case it looks like there won't be portuguese words starting with a "#", so it would work for OP, as long as he is aware of it
| [reply] |
|
| [reply] [d/l] |
Re^7: Perl custom sort for Portuguese Lanaguage
by Tux (Canon) on Jul 09, 2020 at 13:54 UTC
|
If you only want the first lines starting with # to be filtered, that is indeeed what filter is for:
use Data::Peek;
use Text::CSV_XS qw( csv );
my $r = 0;
my $aoa = csv (in => *DATA, filter => sub { $_[1][0] =~ m/^\s*#/ ? $r
+: ++$r; });
DDumper $aoa;
__END__
# This is comment
# and so is this
# and this
a,b,c
#but,not,this
1,2,3
-->
[
[ 'a',
'b',
'c'
],
[ '#but',
'not',
'this'
],
[ '1',
'2',
'3'
]
]
Enjoy, Have FUN! H.Merijn
| [reply] [d/l] [select] |
|
Thanks! I see you're filtering lines beginning with # when they occur at the beginning of the file; the way I understood the OP's sample data is that the comments can occur anywhere. And my worry was that, even though in the OP's data this is probably not the case, filter-based solutions will remove lines that may actually not be comments, and I wasn't sure if there was a easy solution for this?
use warnings;
use strict;
use Data::Peek;
use Text::CSV_XS qw/csv/;
DDumper csv( in=>*DATA, escape_char=>"\\",
filter => sub { $_[1][0] !~ m/^\s*#/ });
__DATA__
# This is a comment
a,b,c
# Also a comment
x,y,z
"#not",a,comment
\#also,not,"a comment"
Output:
[
[ 'a',
'b',
'c'
],
[ ''
],
[ 'x',
'y',
'z'
],
[ ''
]
]
| [reply] [d/l] [select] |
|
DDumper csv (
in => *DATA,
sep => "|",
filter => sub {
$_[1][0] =~ m/^\s*#/ && @{$_[1]} == 1 ? 0 : 1;
},
);
Which would not even need a ternary if slightly rewritten
Enjoy, Have FUN! H.Merijn
| [reply] [d/l] |
|
|
|