Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re^3: Markov generator terminate condition

by GrandFather (Saint)
on Apr 21, 2007 at 00:36 UTC ( [id://611242]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Markov generator terminate condition
in thread Markov generator terminate condition

The following is a version of the code above reworked to be a little more Perlish and to allow generation by minimum word count and minimum sentence count. Examine the while loop to see how this works. Note too the commented out line following the comment "Translate spaces to newlines" - guess what happens if you uncomment that!

use strict; use warnings; # based on original code written by Ron Starr in May 2000 my $MAXLETTERS = 1000; my $GRAIN = 3; my $LETTERS_LINE = 70; my @textletters; while(<DATA>) { # pull in the text, break it into letters, put in letter array chomp; my $text = $_ . " "; # regularize whitespace in order to split text into letters $text =~ s/^\s+//g; # remove leading blanks $text =~ s/\s+/ /g; # convert any whitespace to blanks $text =~ s/ +/ /g; # eliminate any multiple blanks... push @textletters, split (//, $text); # split text into letters } # generate the frequency table my $loopmax = $#textletters - ($GRAIN - 2) - 1; # calculate outer loop + limits my %frequency_table; for my $j (0 .. $loopmax - 1) { # go through all lists of $GRAIN letters in the text my $key_string = ""; # build the key string (GRAIN - 1) letters $key_string .= $textletters[$_] for $j .. $j + $GRAIN - 1; $frequency_table{$key_string} .= $textletters[$j + $GRAIN]; } # generate the travesty my @lastletters; my $wordCount = 0; my $sentenceCount = 0; # start with a seed of the first $GRAIN letters from the text push @lastletters, @textletters[0 .. $GRAIN - 1]; my $line = join '', @textletters[0 .. $GRAIN - 1]; # now, do the actual generation while ($wordCount < 20 or $sentenceCount < 2) { # see if the current last letters are in the table my $key_string = ""; # construct the key string from the lastlette +rs $key_string .= join '', @lastletters[0 .. $GRAIN - 1]; if ( exists $frequency_table{$key_string} ) { # we have possible letters # split the list of letters that follow the key string my @possible = split "", $frequency_table{$key_string}; # select the next letter my $nextletter = $possible[rand @possible]; # adjust the lastletters array @lastletters[0 .. $GRAIN - 2] = @lastletters[1 .. $GRAIN - 1]; $lastletters[$GRAIN - 1] = $nextletter; ++$wordCount if $nextletter eq ' '; # Translate spaces to newlines #$nextletter = "\n" if $nextletter = ' '; # add letter to buffer and dump buffer if ready for output $line .= $nextletter; ++$sentenceCount if $line =~ /\.[\n ]$/; if(length ($line) > $LETTERS_LINE && $line =~ /[ .]$/) { print "$line\n"; $line = ''; } } else { # we drew a blank # re-seed the generation with the first $GRAIN letters from th +e text @lastletters = @textletters[0 .. $GRAIN - 1]; $line = join '', @textletters[0 .. $GRAIN - 1]; } } print "$line\n" if length $line; __DATA__ After much searching I was able to locate a character-level Markov tex +t generator: http://www.eskimo.com/~rstarr/poormfa/travesty2.html Howeve +r, this program terminates once it has reached a certain number of characters. + My knowledge of Perl is very limited (the concept of hashes escapes me co +mpletely), but I think what I need changed is relatively simple. How would this p +rogram be modified so that each word generated is printed on its own line (ie. w +ords are separated by \n instead of ' ') and it terminates after generating a c +ertain number of words, rather than characters?

Prints some variant on:

After generated is relatively), but I need (the com/~rstarr/poormfa/tr +avesty2. html Howevery line (ie. How words able to located a characterminates m +e com/~rstarr/poormfa/travesty2.html How word generating a charate a cer +tain number that I need once its on its on instead of Perl is program be mo +dified on its own line (ie.

DWIM is Perl's answer to Gödel

Replies are listed 'Best First'.
Re^4: Markov generator terminate condition
by Philmac (Novice) on Apr 23, 2007 at 20:14 UTC
    This program (which I saved as clvl.pl) produces errors:
    C:\Documents and Settings\t12121uhn\Desktop>clvl.pl intro.pl "my" variable $key_string masks earlier declaration in same scope at C +:\Documen s and Settings\t12121uhn\Desktop\clvl.pl line 54. syntax error at C:\Documents and Settings\t12121uhn\Desktop\clvl.pl li +ne 30, ne r "limits my " Global symbol "%frequency_table" requires explicit package name at C:\ +Documents and Settings\t12121uhn\Desktop\clvl.pl line 30. Global symbol "%frequency_table" requires explicit package name at C:\ +Documents and Settings\t12121uhn\Desktop\clvl.pl line 39. Global symbol "%frequency_table" requires explicit package name at C:\ +Documents and Settings\t12121uhn\Desktop\clvl.pl line 59. Global symbol "%frequency_table" requires explicit package name at C:\ +Documents and Settings\t12121uhn\Desktop\clvl.pl line 62. Execution of C:\Documents and Settings\t12121uhn\Desktop\clvl.pl abort +ed due to compilation errors.
    What do I need to do to correct this?

      The line numbers in your error messages do not agree with the line numbers in the source code which suggests either copy errors or editing has taken place. So copy it accurately and run it without editing it.

      The code runs as expected when I copy and paste the code from the node above.

      Note that if I remove the my from the %frequency_table declaration and add a my to the $key_string .= assignment I get the same pattern of errors (with slightly different line numbers) that you report.


      DWIM is Perl's answer to Gödel

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://611242]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (7)
As of 2024-04-25 15:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found