Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re^7: Perl custom sort for Portuguese Lanaguage

by Tux (Canon)
on Jul 09, 2020 at 13:54 UTC ( [id://11119083]=note: print w/replies, xml ) Need Help??


in reply to Re^6: Perl custom sort for Portuguese Lanaguage
in thread Perl custom sort for Portuguese Lanaguage

If you only want the first lines starting with # to be filtered, that is indeeed what filter is for:

use Data::Peek; use Text::CSV_XS qw( csv ); my $r = 0; my $aoa = csv (in => *DATA, filter => sub { $_[1][0] =~ m/^\s*#/ ? $r +: ++$r; }); DDumper $aoa; __END__ # This is comment # and so is this # and this a,b,c #but,not,this 1,2,3

-->

[ [ 'a', 'b', 'c' ], [ '#but', 'not', 'this' ], [ '1', '2', '3' ] ]

Enjoy, Have FUN! H.Merijn

Replies are listed 'Best First'.
Re^8: Perl custom sort for Portuguese Lanaguage
by haukex (Archbishop) on Jul 09, 2020 at 14:08 UTC

    Thanks! I see you're filtering lines beginning with # when they occur at the beginning of the file; the way I understood the OP's sample data is that the comments can occur anywhere. And my worry was that, even though in the OP's data this is probably not the case, filter-based solutions will remove lines that may actually not be comments, and I wasn't sure if there was a easy solution for this?

    use warnings; use strict; use Data::Peek; use Text::CSV_XS qw/csv/; DDumper csv( in=>*DATA, escape_char=>"\\", filter => sub { $_[1][0] !~ m/^\s*#/ }); __DATA__ # This is a comment a,b,c # Also a comment x,y,z "#not",a,comment \#also,not,"a comment"

    Output:

    [ [ 'a', 'b', 'c' ], [ '' ], [ 'x', 'y', 'z' ], [ '' ] ]

      So more or like like this:?

      DDumper csv ( in => *DATA, sep => "|", filter => sub { $_[1][0] =~ m/^\s*#/ && @{$_[1]} == 1 ? 0 : 1; }, );

      Which would not even need a ternary if slightly rewritten


      Enjoy, Have FUN! H.Merijn
        So more or like like this:?

        Closer, but then this no longer filters comments that contain the sep character (e.g. add "# This is a comment, too" to my example above)...

        Update: I realize this is less likely when sep=>'|', but my question is basically whether there's a "generic" way to filter lines. For example, I could load the file into memory and do s/^\s*#.*(?:\n|\z)//mg, but that would break any CSV data that contains embedded newlines that happen to match this pattern. In other words, with Text::CSV_XS, filter is only applied after parsing fields like "#foo" or \#foo to #foo, and I'm wondering if there's a hook into the parser before that takes place?

        Update 2: In the CB, you suggested in => \do { local $/; <DATA> =~ s/^\s*#.*(?:\n|\z)//mgr }, which gets closer as well, though it breaks this test case. Just for completeness, here are all the test cases so far combined into one data set:

        # This is a comment not,a,comment # This is a comment, too not,a,comment "#not",a,comment \#also,not,"a comment" foo,"bar # Not a comment, either! quz",baz

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11119083]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (5)
As of 2024-04-25 05:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found