comment on

Since there are so many people here saying how hard this is, I guess it wouldn't work. But I'd suggest starting with the first sentence and working it into the pattern of passive text (words that will be pulled unmodified into a result sentence) and active (the structural words from which you'll derive meaning).

It might look something like this:

  With A, B, C is D.
[download]

from which you want to pull out:

  B has A.
  B is C.
  B is D.
[download]

So, write a program that accepts a mapping of patterns to results, and get it to the point that it can parse the first sentence and produce those results. Then add a pattern and mappings for the second sentence.

At some point you'll probably find that you need to distinguish things slightly better - perhaps this first pattern only works because 'B' maps to a place name (or more generally a proper noun). So start embellishing the patterns to allow the additional semantics to be expressed.

Pronouns are likely to make things rather more difficult, so leave them out unless you can find a way to embellish the patterns to say "this would reset the current topic". Similarly, you're unlikely to cope well with classical problem cases such as "time flies like an arrow", so don't support them. You may well find that you can get a 90% solution for a restricted language space, particularly if the text to be parsed is using very standardised constructs.

Repeat until any(bored, problem is solved, you find out why this approach won't work). :)

Hugo

In reply to Re: The (futile?) quest for an automatic paraphrase engine by hv
in thread The (futile?) quest for an automatic paraphrase engine by dimar

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Pathologically Eclectic Rubbish Lister
	PerlMonks