How to split big files with Perl ?

zalezny has asked for the wisdom of the Perl Monks concerning the following question:

Hi Perl Gurus, does anybody knows how to split big files (for example: 10GB) to multiple small ones ? For example, I would like to take each big file in my backup folder and split for small peaces. For example 10GB file backup.dat needs to be splitted to :

backup.dat.aa

backup.dat.ab

backup.dat.ac

Is there any library in Perl for splitting files base on the size ? Or maybe some compress parameter to split automaticly files if they are bigger than size XX ? Thanks in advance for Your support ! Zalezny

Comment on How to split big files with Perl ?

Replies are listed 'Best First'.
Re: How to split big files with Perl ? by GotToBTru (Prior) on Dec 26, 2014 at 17:50 UTC
`man split` 1 Peter 4:10	[reply] [d/l]
Re: How to split big files with Perl ? by pme (Monsignor) on Dec 26, 2014 at 17:55 UTC
Hi zalezny, Have you tried 'split' command of Fedora? It can be called from perl like this: `split filename`; [download] Regards	[reply] [d/l]
Re: How to split big files with Perl ? by herveus (Prior) on Dec 26, 2014 at 17:28 UTC
Howdy! Have you tried looking on CPAN for something like, say, "split"? yours, Michael	[reply]
Re^2: How to split big files with Perl ? by zalezny (Novice) on Dec 26, 2014 at 17:40 UTC
Not realy, I asked only uncle Google, but he didnt provide me any sensible answer. Unfortunately, I`m not using CPAN on that server, only packages from Fedora repository (ughggg...). Would be perfect if You can send me some hint ;).	[reply]
Re^3: How to split big files with Perl ? by Anonymous Monk on Dec 26, 2014 at 18:25 UTC
split, split, split - split a file into pieces, split - split a file into pieces	[reply]
Re^3: How to split big files with Perl ? by herveus (Prior) on Dec 29, 2014 at 15:04 UTC
Howdy! Um...I did provide a pretty broad hint. yours, Michael	[reply]
Re: How to split big files with Perl ? by james28909 (Deacon) on Dec 26, 2014 at 18:28 UTC
Get the length of the file, divide that by how many times you want to split it, then read it into a buffer and write it to a file :) `use strict; use warnings; open my $fh, '<', 'filename.dat'; binmode($fh); my $len = -s $fh; my $split_length = $length / 5; #would split 10gb into 2gb chunks my $split_fh = $fh . 'split'; #creates 'filename.split' my $num = '1'; for ( 1 .. 5 ) { read $fh, $buf, $split_length; open my $out_file, '>', $split_fh . $num; #vreates '$filename.spli +t000, 001, 002 ect binmode($out_file); print $out_file, $buf; close($out_file); $num++; } close($fh);` [download] I am sure there are other ways to do it. It is completely untested code.	[reply] [d/l]
Re^2: How to split big files with Perl ? by Anonymous Monk on Dec 26, 2014 at 18:48 UTC
I'm sorry but this is really not good. Aside from the fact that it doesn't compile, what is `my $split_fh = "$fh" . 'split';` supposed to do? `print $buf $outfile;` or opening `$out_file` in read mode are pretty obvious errors. No error handling on `open` or `read` is also not great. Do you think that reading 2GB of the input file into memory at a time is a very efficient way to go about it? What happens when the size of the file is not exactly divisible by 5?	[reply] [d/l] [select]
Re^3: How to split big files with Perl ? by james28909 (Deacon) on Dec 26, 2014 at 19:01 UTC
Well honestly it was like I said, it was purely untested code and was just for an example. I did not intend on it being a copy and paste example. All this does is reads the file then makes another file on the fly appending 001++ to it, thats all. ~~I will however revise it and make any corrections so user can copy and paste it.~~ I stand corrected, it will take more than what i posted to be able to split it up. What I was considering was takeing a 10gb file, and split it into exactly 4gb chunks. I think that would require to read 1 byte at a time, and write the buf to outfile until a counter reaches the 4gb limit. That way it is not filling the memory with all this data at one time and would work smoothly. Ill see what i can cook up.	[reply]
Re^4: How to split big files with Perl ? by ww (Archbishop) on Dec 26, 2014 at 19:11 UTC
Re^4: How to split big files with Perl ? by LanX (Saint) on Dec 26, 2014 at 19:15 UTC
Re^4: How to split big files with Perl ? by Anonymous Monk on Dec 26, 2014 at 21:41 UTC
Re^4: How to split big files with Perl ? by Anonymous Monk on Dec 26, 2014 at 19:17 UTC
Re^5: How to split big files with Perl ? by james28909 (Deacon) on Dec 26, 2014 at 19:29 UTC
Some notes below your chosen depth have not been shown here
Re^2: How to split big files with Perl ? by james28909 (Deacon) on Dec 27, 2014 at 08:31 UTC
This works much better :) This splits the file into 2gb chunks, I have tested on about 25-30 iso's is have stored on my PC and it works great, though sometimes writing performance is a little bit slow. You can also change how many gb's you want to split it into by changing the iterator's value. use strict; use warnings; files(); sub files { foreach (@ARGV) { print "processing $_\n"; open my $fh, '<', $_ \|\| die "cannot open $_ $!"; binmode($fh); my $num = '000'; my $iterator = 0; split_file( $fh, $num, $_, $iterator ); } } sub split_file { my ( $fh, $num, $name, $iterator ) = @_; my $split_fh = "$name" . '.split'; open( my $out_file, '>', $split_fh . $num ) \|\| die "cannot ope +n $split_fh$num $!"; binmode($out_file); while (1) { $iterator++; my $buf; read( $fh, $buf, 32 ); print( $out_file $buf ); my $len = length $buf; if ( $iterator == 67108864 ) { #split into 2gb chun +ks $iterator = 0; $num++; split_file( $fh, $num, $name ); } elsif ( $len !~ "32" ) { last; } } } [download] Works pretty quickly! split almost 5gb in 4.4333 mins. I do see a decrease in performance sometimes, though other times it writes very quickly. Go ahead and test it on one of your iso's. What would be the most efficient read/write buffer?	[reply] [d/l]
Re^3: How to split big files with Perl ? by RichardK (Parson) on Dec 27, 2014 at 17:20 UTC
The most efficient block size will depend on lots of things, but the memory page size of your OS will likely be the most significant. 32 bytes is way too small, I'd start with 4k or 8k and go up from there. Why not try several different multiples of 4K and see which one works best for you? Also, read returns the number of bytes actually read so there's really no need to use length. `my $len = read($in,$buf,4*1024); ...` [download] And $len is an integer so it would be better to use the numeric not equal '!=' rather than the pattern match operator.	[reply] [d/l]
Re^4: How to split big files with Perl ? by james28909 (Deacon) on Dec 28, 2014 at 03:46 UTC
Re^3: How to split big files with Perl ? by Anonymous Monk on Dec 28, 2014 at 04:22 UTC
Thanks for taking the time to update. Some points to review: Calling `split_file` recursively means that your stack will fill up as the number of chunks goes up. You've got one buffer per `sub` call, so that's probably the source of the memory usage and slowdown you reported. Your algorithm/logic, even though it works, is confusing, and actually can possibly go wrong: Right after you read from the file, you use `$iterator` to determine whether to call `split_file` again - I think you need to look at `$len` first. Keeping a running count of the bytes written to the current chunk and comparing it to the desired chunk size might be better. Also, inside the `while(1)` loop, you don't seem to consider what happens after the call to `split_file` - the loop keeps going! In fact, if the file being split is exactly divisible by the chunk size, you create one final .splitNNN file that is empty. This is not correct: `open my $fh, '<', $_ \|\| die "cannot open $_ $!";`, since it gets parsed as `open(my $fh, '<', ($_ \|\| die("cannot open $_ $!")));` (you can see this by running `perl -MO=Deparse,-p -e 'open my $fh, "<", $_ \|\| die "cannot open $_ $!";'`). Either write `open my $fh, '<', $_ or die "cannot open $_ $!";` (`or` has lower precedence) or write `open( my $fh, '<', $_ ) \|\| die "cannot open $_ $!";` You're still not checking the return value of read, which is `undef` on error. The code could also use a bit of cleanup. Just a couple of examples: The name `$split_fh` is a bit confusing, and you could append `$num` to it right away. In `split_file` you set `$iterator = 0;` but then don't use it in the recursive call to `split_file`. I think this might be one of those situations where it would make sense to take a step back and try to work the best approach out without a computer - how would you solve this problem on paper? But anyway, I am glad you took the time to work on and test your code! Tested code is important for a good post.	[reply] [d/l] [select]
Re^4: How to split big files with Perl ? by james28909 (Deacon) on Dec 28, 2014 at 06:49 UTC
Re^5: How to split big files with Perl ? by Anonymous Monk on Dec 28, 2014 at 16:21 UTC
Re: How to split big files with Perl ? by sundialsvc4 (Abbot) on Dec 28, 2014 at 19:58 UTC
Given that there is a `split` command (at least, on most Unixes), I would be very strongly inclined to try to use it, hoping that its implementation is most efficient. (In any case, it is an existing implementation of a classic Thing That Is Already Done.™) “Splitting a file” is never a thing that should call for recursion: all that you’re really doing is reading from one file and switching from one output-file to the next one at specified intervals. I’ve seen lots of ways to do that, probably the most-elaborate ones (not in Perl ...) using memory-mapped files to actually exploit the virtual-memory subsystem’s I/O capabilities as the means of reading from the target and getting the data (in one step) where it needs to go. Still, in my mind, it all comes back to the same thing: this is A Classic Thing That Has Already Been Done.™ Search for an existing tool that you can rely-upon, and use it, to avoid having to write-and-debug “yet another” piece of software to do such a trivial task. Surely you can find one that will meet your project’s performance expectations.	[reply]
Re^2: How to split big files with Perl ? by karlgoethebier (Abbot) on Dec 28, 2014 at 21:23 UTC
"...in my mind, it all comes back to the same thing ..." Holy shit, nothing but the truth. You got it! �The Crux of the Biscuit is the Apostrophe�	[reply]

Back to Seekers of Perl Wisdom