Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re^3: how to split a file.txt in multiple text files

by Tux (Canon)
on Feb 14, 2019 at 16:24 UTC ( [id://1229913]=note: print w/replies, xml ) Need Help??


in reply to Re^2: how to split a file.txt in multiple text files
in thread how to split a file.txt in multiple text files

  • What is your OS?
  • What is your perl version? (perl -v)
  • Did you invoke the script with the required -CS command-line option?
    $ perl -CS split2.pl < inputfile

My example was used on UTF-8 encoded files that contained quite a few characters outside of the iso-8895-1 range, so I should have noted the same warnings if my example was seriously flawed.

Is your data secret, or is it sharable, in which case, some of us might want to download it (in a zip) to check.

As you converted my command-line example to a script, maybe it would be a goor idea to show what the script looks like. You might have missed a crucial issue. It might look a bit like this:

use strict; use warnings; use autodie; local $/ = \3000; my $i = "0000"; while (<>) { my $fn = "zz" . $i++; open my $fh, ">:encoding(utf-8)", $fn or die "$fn: $!"; print $fh $_; close $fh; }

Enjoy, Have FUN! H.Merijn

Replies are listed 'Best First'.
Re^4: how to split a file.txt in multiple text files
by saulnier (Initiate) on Feb 14, 2019 at 21:18 UTC
    OS: Windows 10 Home
    perl 5, version 14, subversion 2 (v5.14.2) built for MSWin32-x86-multi-thread

    This is my script split2.pl
    use strict; use warnings; use autodie; $/=\3000; my$i="000"; while(<>){open my $fh, ">:encoding(utf-8)", "input".$i++.".txt"; print $fh $_; close $fh;}
    If I invoke the script in this way:  perl -CS split2.pl <input.txt
    I obtain this message
    utf8 "\xE1" does not map to Unicode at split2.pl line 11, <> chunk 2. Close with partial character at (eval 21) line 67, <> chunk 2.
    and only the first fragment is created "input000.txt"

    If I run the script without -CS, no warning message and all the files are created. But they include inintelligible characters and not my greek text splitted.

    I can share my greek text (346 kB) but I do not exactly in which way I can do from here.
      -CS uses UTF-8 for standard input and output, but the diamond operator uses the ARGV handle, not STDIN.

      Use -Ci to set UTF-8 encoding of all input; or use -CD to use utf-8 for all input and output, and you can even drop the encoding from the open line.

      Update: Using redirection works for me (Linux). Are you sure the input is encoded in UTF-8?

      Update2: Verified with the file you linked to.

      perl -CS split2.pl < input.txt
      works correctly on Linux.

      map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
      This is my input file

      https://ufile.io/v0g1c

        It is a perl thing. 5.14.x has the error, so has 5.16.3. 5.18.0 and up are fine.

        Digging deeper, it was "fixed" between 5.17.6 and 5.17.7 (hence 5.18.0 is oké)

        As both 5.17.6 and 5.17.7 were released with the same version of Encode, it must be a core issue.

        I suspect that it was fixed somewhere in the IO layer, but that is not my area of expertise.

        Are you able to update your perl version to 5.18.0 or higher?


        Enjoy, Have FUN! H.Merijn

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1229913]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (2)
As of 2024-04-26 07:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found