Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

The following is a version of the code above reworked to be a little more Perlish and to allow generation by minimum word count and minimum sentence count. Examine the while loop to see how this works. Note too the commented out line following the comment "Translate spaces to newlines" - guess what happens if you uncomment that!

use strict; use warnings; # based on original code written by Ron Starr in May 2000 my $MAXLETTERS = 1000; my $GRAIN = 3; my $LETTERS_LINE = 70; my @textletters; while(<DATA>) { # pull in the text, break it into letters, put in letter array chomp; my $text = $_ . " "; # regularize whitespace in order to split text into letters $text =~ s/^\s+//g; # remove leading blanks $text =~ s/\s+/ /g; # convert any whitespace to blanks $text =~ s/ +/ /g; # eliminate any multiple blanks... push @textletters, split (//, $text); # split text into letters } # generate the frequency table my $loopmax = $#textletters - ($GRAIN - 2) - 1; # calculate outer loop + limits my %frequency_table; for my $j (0 .. $loopmax - 1) { # go through all lists of $GRAIN letters in the text my $key_string = ""; # build the key string (GRAIN - 1) letters $key_string .= $textletters[$_] for $j .. $j + $GRAIN - 1; $frequency_table{$key_string} .= $textletters[$j + $GRAIN]; } # generate the travesty my @lastletters; my $wordCount = 0; my $sentenceCount = 0; # start with a seed of the first $GRAIN letters from the text push @lastletters, @textletters[0 .. $GRAIN - 1]; my $line = join '', @textletters[0 .. $GRAIN - 1]; # now, do the actual generation while ($wordCount < 20 or $sentenceCount < 2) { # see if the current last letters are in the table my $key_string = ""; # construct the key string from the lastlette +rs $key_string .= join '', @lastletters[0 .. $GRAIN - 1]; if ( exists $frequency_table{$key_string} ) { # we have possible letters # split the list of letters that follow the key string my @possible = split "", $frequency_table{$key_string}; # select the next letter my $nextletter = $possible[rand @possible]; # adjust the lastletters array @lastletters[0 .. $GRAIN - 2] = @lastletters[1 .. $GRAIN - 1]; $lastletters[$GRAIN - 1] = $nextletter; ++$wordCount if $nextletter eq ' '; # Translate spaces to newlines #$nextletter = "\n" if $nextletter = ' '; # add letter to buffer and dump buffer if ready for output $line .= $nextletter; ++$sentenceCount if $line =~ /\.[\n ]$/; if(length ($line) > $LETTERS_LINE && $line =~ /[ .]$/) { print "$line\n"; $line = ''; } } else { # we drew a blank # re-seed the generation with the first $GRAIN letters from th +e text @lastletters = @textletters[0 .. $GRAIN - 1]; $line = join '', @textletters[0 .. $GRAIN - 1]; } } print "$line\n" if length $line; __DATA__ After much searching I was able to locate a character-level Markov tex +t generator: http://www.eskimo.com/~rstarr/poormfa/travesty2.html Howeve +r, this program terminates once it has reached a certain number of characters. + My knowledge of Perl is very limited (the concept of hashes escapes me co +mpletely), but I think what I need changed is relatively simple. How would this p +rogram be modified so that each word generated is printed on its own line (ie. w +ords are separated by \n instead of ' ') and it terminates after generating a c +ertain number of words, rather than characters?

Prints some variant on:

After generated is relatively), but I need (the com/~rstarr/poormfa/tr +avesty2. html Howevery line (ie. How words able to located a characterminates m +e com/~rstarr/poormfa/travesty2.html How word generating a charate a cer +tain number that I need once its on its on instead of Perl is program be mo +dified on its own line (ie.

DWIM is Perl's answer to Gödel

In reply to Re^3: Markov generator terminate condition by GrandFather
in thread Markov generator terminate condition by Philmac

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (7)
As of 2024-04-23 17:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found