It's nice how you put up the *warning siren!!* on your assumptions ... Although in isolation, some might criticize the assumptions as overly simplistic (even the OP??), I bet something like this could actually work as the beginnings of a very flexible tool. It would be a matter of building up a 'catalogue' of such assumptions, make them user-configurable (eg apply only a certain subset based on the input text specimen) and give the user the opportunity to add custom assumptions. Moreover, this kind of model is realatively straightforward to understand with low entry-barrier-learning-curve. ... this one got the wheels turning hmmm ...