comment on

For any comparative linguists, here is what it looks like in Python (it even works):

import re
import sys
from collections import defaultdict

wordsInOrder = []
for line in sys.stdin:
    wordsInOrder.extend( re.findall(r'\w+', line.lower()) )

single = defaultdict(int) 
double = defaultdict(int) 
triple = defaultdict(int)

for i, word in enumerate(wordsInOrder):
    single[word] += 1
    try:
        next_word = wordsInOrder[i+1]
        double[word + ' ' + next_word] += 1

        next_next_word = wordsInOrder[i+2]
        triple[word + ' ' + next_word + ' ' + next_next_word] += 1
    except:
        pass

def sort_by_frequency(d):
    return sorted(d.iterkeys(), cmp = lambda x,y: cmp(d[y], d[x]))

for singlet in sort_by_frequency(single):
    print singlet
    for doublet in sort_by_frequency(double):
        if not doublet.startswith(singlet + ' '):
            continue
        print "\t", doublet
        for triplet in sort_by_frequency(triple):
            if not triplet.startswith(doublet + ' '):
                continue
            print "\t\t", triplet
[download]

I needed amusement ;-)

In reply to Re: Vow Triptych by Arunbear
in thread Vow Triptych by hashED

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Do you know where your variables are?
	PerlMonks