Re^7: Perl custom sort for Portuguese Lanaguage

If you only want the first lines starting with # to be filtered, that is indeeed what filter is for:

use Data::Peek;
use Text::CSV_XS qw( csv );

my $r = 0;
my $aoa = csv (in => *DATA, filter => sub { $_[1][0] =~ m/^\s*#/ ? $r 
+: ++$r; });
DDumper $aoa;
__END__
# This is comment
# and so is this
# and this
a,b,c
#but,not,this
1,2,3
[download]

-->

[
    [   'a',
        'b',
        'c'
        ],
    [   '#but',
        'not',
        'this'
        ],
    [   '1',
        '2',
        '3'
        ]
    ]
[download]

Enjoy, Have FUN! H.Merijn

Comment on Re^7: Perl custom sort for Portuguese Lanaguage Select or Download Code

Replies are listed 'Best First'.
Re^8: Perl custom sort for Portuguese Lanaguage by haukex (Archbishop) on Jul 09, 2020 at 14:08 UTC
Thanks! I see you're filtering lines beginning with `#` when they occur at the beginning of the file; the way I understood the OP's sample data is that the comments can occur anywhere. And my worry was that, even though in the OP's data this is probably not the case, `filter`-based solutions will remove lines that may actually not be comments, and I wasn't sure if there was a easy solution for this? `use warnings; use strict; use Data::Peek; use Text::CSV_XS qw/csv/; DDumper csv( in=>DATA, escape_char=>"\\", filter => sub { $_[1][0] !~ m/^\s#/ }); __DATA__ # This is a comment a,b,c # Also a comment x,y,z "#not",a,comment \#also,not,"a comment"` [download] Output: `[ [ 'a', 'b', 'c' ], [ '' ], [ 'x', 'y', 'z' ], [ '' ] ]` [download]	[reply] [d/l] [select]
Re^9: Perl custom sort for Portuguese Lanaguage by Tux (Canon) on Jul 09, 2020 at 14:14 UTC
So more or like like this:? `DDumper csv ( in => DATA, sep => "\|", filter => sub { $_[1][0] =~ m/^\s#/ && @{$_[1]} == 1 ? 0 : 1; }, );` [download] Which would not even need a ternary if slightly rewritten Enjoy, Have FUN! H.Merijn	[reply] [d/l]
Re^10: Perl custom sort for Portuguese Lanaguage (updated x2) by haukex (Archbishop) on Jul 09, 2020 at 14:20 UTC
So more or like like this:? Closer, but then this no longer filters comments that contain the `sep` character (e.g. add "`# This is a comment, too`" to my example above)... Update: I realize this is less likely when `sep=>'\|'`, but my question is basically whether there's a "generic" way to filter lines. For example, I could load the file into memory and do `s/^\s#.(?:\n\|\z)//mg`, but that would break any CSV data that contains embedded newlines that happen to match this pattern. In other words, with Text::CSV_XS, `filter` is only applied after parsing fields like `"#foo"` or `\#foo` to `#foo`, and I'm wondering if there's a hook into the parser before that takes place? Update 2: In the CB, you suggested `in => \do { local $/; <DATA> =~ s/^\s#.(?:\n\|\z)//mgr }`, which gets closer as well, though it breaks this test case. Just for completeness, here are all the test cases so far combined into one data set: `# This is a comment not,a,comment # This is a comment, too not,a,comment "#not",a,comment \#also,not,"a comment" foo,"bar # Not a comment, either! quz",baz` [download]	[reply] [d/l] [select]


No such thing as a small change
	PerlMonks