The Specimen
Before ...
With a population of more than 10.2 million, Seoul, the
capital of South Korea, is the world’s largest city in terms of population. Sao Paulo(Brazil), the world’s second-largest city, has a population of just over ten million. Three other cities, Bombay(India), Jakarta(Indonesia) and Karachi(Pakistan), have grown to more than nine million people.
After ...
Seoul has a population of more than 10.2 million.
Seoul is the capital of South Korea.
Seoul is the world’s largest city in terms of population. Sao Paulo(Brazil) is the world’s second-largest city.
Sao Paulo(Brazil) has a population of just over ten million.
Bombay(India) has grown to more than nine million people.
Jakarta(Indonesia) has grown to more than nine million people.
Karachi(Pakistan), has grown to more than nine million people.
The Question
How can I use perl to automatically (or at least partially) generate AFTER from BEFORE
The Background
There is a guy who wants to do this sort of thing, with the following disclaimers:
- The guy is not a linguistics professor
- The guy wants to spit out questions for a 'flashcard' type thingy
- The guy prefers practical nuts and bolts examples to pie-in-the-sky visions of 'AI'
- The guy wishes to avoid esoteric concepts beyond the grasp of a moderately competent college student who knows some perl.
- The guy admits this is the stuff of decades of reasearch, multitudinous PhD theses, and towering artifices of herculean intellectual endeavor, but still wants a simple solution from perlmonks.org.
The Speculations
The guy has toyed with the following speculations:
- Build a 'corpus' of domain-compatible 'trigger words' and use 'split' with those as delimiters (eg 'has a','is a','having a', 'has grown', etc)
- Simply split the BEFORE text on punctuation, call those 'the building blocks' and randomly generate different structures based on those building blocks, discarding (by hand) all but those which make sense.
The Disclaimer
Yes, the guy has seen the nodes on NLP and searched around a bit, but answers always seem shrouded in a funk of elaborately ornate statistical contrivances that seem overly complicated for the task at hand. The guy was reluctant to ask this question, but WTH, someone might be able to help with a breakthrough.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|