Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^4: how to split a file.txt in multiple text files

by saulnier (Initiate)
on Feb 14, 2019 at 21:18 UTC ( [id://1229921]=note: print w/replies, xml ) Need Help??


in reply to Re^3: how to split a file.txt in multiple text files
in thread how to split a file.txt in multiple text files

OS: Windows 10 Home
perl 5, version 14, subversion 2 (v5.14.2) built for MSWin32-x86-multi-thread

This is my script split2.pl
use strict; use warnings; use autodie; $/=\3000; my$i="000"; while(<>){open my $fh, ">:encoding(utf-8)", "input".$i++.".txt"; print $fh $_; close $fh;}
If I invoke the script in this way:  perl -CS split2.pl <input.txt
I obtain this message
utf8 "\xE1" does not map to Unicode at split2.pl line 11, <> chunk 2. Close with partial character at (eval 21) line 67, <> chunk 2.
and only the first fragment is created "input000.txt"

If I run the script without -CS, no warning message and all the files are created. But they include inintelligible characters and not my greek text splitted.

I can share my greek text (346 kB) but I do not exactly in which way I can do from here.

Replies are listed 'Best First'.
Re^5: how to split a file.txt in multiple text files
by choroba (Cardinal) on Feb 14, 2019 at 21:52 UTC
    -CS uses UTF-8 for standard input and output, but the diamond operator uses the ARGV handle, not STDIN.

    Use -Ci to set UTF-8 encoding of all input; or use -CD to use utf-8 for all input and output, and you can even drop the encoding from the open line.

    Update: Using redirection works for me (Linux). Are you sure the input is encoded in UTF-8?

    Update2: Verified with the file you linked to.

    perl -CS split2.pl < input.txt
    works correctly on Linux.

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
Re^5: how to split a file.txt in multiple text files
by saulnier (Initiate) on Feb 15, 2019 at 07:34 UTC
    This is my input file

    https://ufile.io/v0g1c

      It is a perl thing. 5.14.x has the error, so has 5.16.3. 5.18.0 and up are fine.

      Digging deeper, it was "fixed" between 5.17.6 and 5.17.7 (hence 5.18.0 is oké)

      As both 5.17.6 and 5.17.7 were released with the same version of Encode, it must be a core issue.

      I suspect that it was fixed somewhere in the IO layer, but that is not my area of expertise.

      Are you able to update your perl version to 5.18.0 or higher?


      Enjoy, Have FUN! H.Merijn
        ok. will I try with berrybrew?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1229921]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (2)
As of 2024-04-20 03:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found