Re: split of files

if you mean do file split like unix split does but on pattern-matched boundaries instead of byte counts - I rather imagine something like: (update: with linux, split with -p is available to split on a regexp - then the perl script only has to shell that and cleanup after as follows: glob for the per sequence files, cat each 1000 files at a time together into some second naming convention and remove each 1000 per iteration - on second thoughts I prefer what follows after all!)

my $suffix = 'z';
my $sequence = 0;
my $maxseq = 1000;

my $input = shift @ARGV or die "usage";
open my $ifh, $input or die "$!: $input\n";
my $ofh;
while( <$ifh> ) {
    /\AINPUT\sSEQUENCE/
      and SwitchFile( $input, \$ofh, \$suffix, \$sequence, $maxseq );
    $ofh or die "Unexpected prelude: $_";
    print $ofh $_;
}
close $ofh;

sub SwitchFile {
    my ( $input, $oref, $sref, $qref, $max ) = @_;
    if ( defined( $$oref ) )
        ( ++$$qref < $max ) and return;
        $$qref = 0;
        close $$oref;
    }

    my $newfile = "$input." . ++$$sref;
    open my $ofh, ">$newfile" or die "$!: $newfile";
    $$oref = $ofh;
}
[download]

This would create the 270 files with suffixes .aa thru .jj

__________________________________________________________________________________

^M Free your mind!

Comment on Re: split of files Download Code


Come for the quick hacks, stay for the epiphanies.
	PerlMonks