Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

This small script, apart from being mostly useless, generates credible sentences from a source text.

The idea is to keep track of all the possible words following a given group of words. Once this hash of arrays has been constructed, we pick a random starting point and construct the new text by looking for words that may follow the preceding group.

The input text should be large enough to have a broad vocabulary and sentences range. You can find good samples on the Project Gutenberg site.
You can also try to create interesting mixes of texts, by feeding two novels in the input file!

Have fun!

#!/usr/bin/perl -w # # transcramble - random text generator # author: Briac Pilpré # # v.0.1 : initial release. # v.0.2 : Added Storable support and Getopt::Mixed # for a nicer interface. # # idea 'borrowed' from : use strict; use locale; use Storable; use Getopt::Mixed; use vars qw($VERSION $opt_g $opt_w $opt_s $opt_i $opt_h $opt_v ); $VERSION = 0.2; Getopt::Mixed::getOptions( 'g=i group>g w=i words>w s=s store>s i=s input h help>h v verbose> +v' ); $|++; # usage: transcramble group words stored_file > output if ($opt_h){ print <<"_HELP_"; transcrambler v$VERSION usage: transcramble -g 3 -w 500 -i stored_chains > output usage: transcramble -g 2 -w 400 -s chains_to_store < input > outpu +t -g --group : How to group words (1, 2 or 3) -w --words : Number of words to generate -i --input : Input file generated by the -s switch -s --store : Name of the file where the chains generated are to be stored -h --help : Display this help message and exit _HELP_ exit 0; } # How to group words in the text # 1: Group words one by one, makes nonsenses # 2: List every word following each preceding couple of words # 3: List every word following each preceding triplet of words # (make the text really like the original) my $group = $opt_g || 2; # Number of words to spew my $words = $opt_w || 1000; # Stored chains file my $in_file = $opt_i; # File to store the generated chains (can't be used with -i though) my $out_file = $opt_i ? undef : $opt_s; # Frequency list creation my ( %tuple, @remains ); if ($in_file){ print STDERR "Retrievieng the chains from $in_file:" if $opt_v; %tuple = %{ retrieve($in_file) }; print STDERR " OK\n" if $opt_v; } else { print STDERR "Creating the chains. This may take a while\n" if $op +t_v; while (<>) { # Split the current line for every 'word', including punctuati +on my @w = split ( /[^\w\n,.!"?;’'-]+/, $_ ); # If there was some words left of the previous line, add them @w = ( @remains, @w ); while (@w) { # If we have enough words to fill the group if ( scalar @w >= $group + 1 ) { # Shove the next word in the word array push @{ $tuple{ join ( ' ', @w[ 0 .. $group - 1 ] ) } +}, $w[$group]; # And move to the next word shift (@w); } else { # At the end of the line, we keep the remaining words. @remains = @w; undef @w; } } } print STDERR "Saving chains to $out_file:" if $opt_v; store \%tuple, $out_file; print STDERR " OK\n" if $opt_v; } # Used to pick a random key in the hash my $lh = scalar keys %tuple; print STDERR "Done constucting the tuples. $lh tuples\n"; # Select a random key to begin my $key = ( keys %tuple )[ int rand($lh) ]; while ( $words-- ) { my @first = split (/ /, $key ); shift @first; # Pick a word at random in the possible following words my $last = @{ $tuple{$key} }[ int rand( @{ $tuple{$key} } ) ]; print " $last"; # And see if we can continue with the sentence or if ( defined $tuple{ join ( ' ', @first, $last ) } ) { $key = join ( ' ', @first, $last ); } # Start a new one by picking a new starting group. else { $key = ( keys %tuple )[ int rand($lh) ]; } } print STDERR "\nTranscrambling done.\n\n" if $opt_v;
my $OeufMayo = new PerlMonger::Paris({http => ''});</kbd>

In reply to Transcramble - Random text generator by OeufMayo

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others making s'mores by the fire in the courtyard of the Monastery: (3)
    As of 2020-10-29 05:47 GMT
    Find Nodes?
      Voting Booth?
      My favourite web site is:

      Results (269 votes). Check out past polls.