I've made the following script to generate a large set of text files. The generated files looks like real text files, they are compressible but not too much (about 50%). Should work on any Unix-like system (or windows with an additional dictionary file as a source of words). Feel free to test and adapt.
#!/usr/bin/perl
use strict;
use warnings;
use Carp;
sub loaddict {
my $dict = shift;
open my $fh, $dict or croak "can't open $dict: $!";
my @words = <$fh>;
chomp @words;
return \@words;
}
#######################
# main
my $testdir = $ARGV[0]
or die "usage : $0 <test folder> <number of files>";
my $filecount = $ARGV[1]
or die "usage : $0 <test folder> <number of files>";
my $seed = 0;
$seed = $ARGV[2] if defined $ARGV[2];
# force number
$filecount += 0;
if ( not -d "$testdir" ) {
mkdir "$testdir" or die "can't mkdir $testdir";
}
my $wordlist = loaddict("/usr/share/dict/words");
srand(42 + $seed );
for ( 1 .. $filecount ) {
open my $file, '>', "$testdir/$_" or croak "can't open file : $!";
my $filesize = int( rand(10000) ) + 5000 ;
for ( 1 .. $filesize ) {
my $dice = int( rand($#{$wordlist}) ) ;
print $file $wordlist->[$dice] . " ";
if ( $_ % 12 == 0 ) {
print $file "\n";
}
}
}
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|