Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re^6: Perl custom sort for Portuguese Lanaguage

by hippo (Bishop)
on Jul 08, 2020 at 20:04 UTC ( [id://11119044]=note: print w/replies, xml ) Need Help??


in reply to Re^5: Perl custom sort for Portuguese Lanaguage (updated x2)
in thread Perl custom sort for Portuguese Lanaguage

I used Text::CSV to read the data file, but AFAIK it doesn't support ignoring comment lines.

This works for me:

csv (in => 'quux.csv', filter => {1 => sub { !/^#/ }});

Replies are listed 'Best First'.
Re^7: Perl custom sort for Portuguese Lanaguage
by haukex (Archbishop) on Jul 08, 2020 at 21:06 UTC
    This works for me: csv (in => 'quux.csv', filter => {1 => sub { !/^#/ }});

    Unfortunately that also filters lines whose first field is "#foo" (with the quotes). I remember Tux recently saying filtering before parsing wasn't supported, though I'm having trouble finding the reference at the moment (it could have been in the chatterbox too*). It may be a bit tricky because this is valid CSV too:

    abc,"d #e f",ghi

    (That's one row, ["abc", "d\n#e\nf", "ghi"].)

    * Update: I looked again and I think it must have been in the chatterbox; I do distinctly remember someone having a similar question recently...

      The meta info knows whether the field was quoted or not.
      #!/usr/bin/perl use warnings; use strict; use Text::CSV_XS; my $csv = 'Text::CSV_XS'->new ({ binary => 1, auto_diag => 1, keep_meta_info => 1 }); open my $in, '<:encoding(utf8)', shift or die $!; while (my $row = $csv->getline($in)) { next if $row->[0] =~ m/^#/ && ! $csv->is_quoted(0); $csv->say(*STDOUT, $row); }

      Tested with

      #x,y,z skip abc,"d #e f",ghi keep #comment skip a,b,c,#xyz keep "#foo",x,y,z keep
      map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
        The meta info knows whether the field was quoted or not.

        True, though AFAICT the meta_info doesn't seem to keep track of escaped characters:

        use warnings; use strict; use Data::Dump; use Text::CSV; my $csv = Text::CSV->new({ binary=>1, auto_diag=>2, keep_meta_info=>1, escape_char=>"\\" }); while ( my $row = $csv->getline(*DATA) ) { dd $row, $csv->meta_info; } $csv->eof or $csv->error_diag; __DATA__ foo,bar "#foo","bar" #foo,bar \#foo,bar
      In this special case it looks like there won't be portuguese words starting with a "#", so it would work for OP, as long as he is aware of it
        In this special case it looks like there won't be portuguese words starting with a "#", so it would work for OP, as long as he is aware of it

        True as well :-) (I guess this is more about the generic case of filtering comments from CSV files.)

Re^7: Perl custom sort for Portuguese Lanaguage
by Tux (Canon) on Jul 09, 2020 at 13:54 UTC

    If you only want the first lines starting with # to be filtered, that is indeeed what filter is for:

    use Data::Peek; use Text::CSV_XS qw( csv ); my $r = 0; my $aoa = csv (in => *DATA, filter => sub { $_[1][0] =~ m/^\s*#/ ? $r +: ++$r; }); DDumper $aoa; __END__ # This is comment # and so is this # and this a,b,c #but,not,this 1,2,3

    -->

    [ [ 'a', 'b', 'c' ], [ '#but', 'not', 'this' ], [ '1', '2', '3' ] ]

    Enjoy, Have FUN! H.Merijn

      Thanks! I see you're filtering lines beginning with # when they occur at the beginning of the file; the way I understood the OP's sample data is that the comments can occur anywhere. And my worry was that, even though in the OP's data this is probably not the case, filter-based solutions will remove lines that may actually not be comments, and I wasn't sure if there was a easy solution for this?

      use warnings; use strict; use Data::Peek; use Text::CSV_XS qw/csv/; DDumper csv( in=>*DATA, escape_char=>"\\", filter => sub { $_[1][0] !~ m/^\s*#/ }); __DATA__ # This is a comment a,b,c # Also a comment x,y,z "#not",a,comment \#also,not,"a comment"

      Output:

      [ [ 'a', 'b', 'c' ], [ '' ], [ 'x', 'y', 'z' ], [ '' ] ]

        So more or like like this:?

        DDumper csv ( in => *DATA, sep => "|", filter => sub { $_[1][0] =~ m/^\s*#/ && @{$_[1]} == 1 ? 0 : 1; }, );

        Which would not even need a ternary if slightly rewritten


        Enjoy, Have FUN! H.Merijn

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11119044]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (6)
As of 2024-04-24 13:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found