Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

The Specimen

Before ...

With a population of more than 10.2 million, Seoul, the capital of South Korea, is the world’s largest city in terms of population. Sao Paulo(Brazil), the world’s second-largest city, has a population of just over ten million. Three other cities, Bombay(India), Jakarta(Indonesia) and Karachi(Pakistan), have grown to more than nine million people.

After ...

Seoul has a population of more than 10.2 million.
Seoul is the capital of South Korea.
Seoul is the world’s largest city in terms of population.
Sao Paulo(Brazil) is the world’s second-largest city.
Sao Paulo(Brazil) has a population of just over ten million.
Bombay(India) has grown to more than nine million people.
Jakarta(Indonesia) has grown to more than nine million people.
Karachi(Pakistan), has grown to more than nine million people.

The Question

How can I use perl to automatically (or at least partially) generate AFTER from BEFORE

The Background

There is a guy who wants to do this sort of thing, with the following disclaimers:

  • The guy is not a linguistics professor
  • The guy wants to spit out questions for a 'flashcard' type thingy
  • The guy prefers practical nuts and bolts examples to pie-in-the-sky visions of 'AI'
  • The guy wishes to avoid esoteric concepts beyond the grasp of a moderately competent college student who knows some perl.
  • The guy admits this is the stuff of decades of reasearch, multitudinous PhD theses, and towering artifices of herculean intellectual endeavor, but still wants a simple solution from perlmonks.org.

The Speculations

The guy has toyed with the following speculations:

  • Build a 'corpus' of domain-compatible 'trigger words' and use 'split' with those as delimiters (eg 'has a','is a','having a', 'has grown', etc)
  • Simply split the BEFORE text on punctuation, call those 'the building blocks' and randomly generate different structures based on those building blocks, discarding (by hand) all but those which make sense.

The Disclaimer

Yes, the guy has seen the nodes on NLP and searched around a bit, but answers always seem shrouded in a funk of elaborately ornate statistical contrivances that seem overly complicated for the task at hand. The guy was reluctant to ask this question, but WTH, someone might be able to help with a breakthrough.


In reply to The (futile?) quest for an automatic paraphrase engine by dimar

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (4)
As of 2024-04-19 06:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found