Re: The (futile?) quest for an automatic paraphrase engine

Since there are so many people here saying how hard this is, I guess it wouldn't work. But I'd suggest starting with the first sentence and working it into the pattern of passive text (words that will be pulled unmodified into a result sentence) and active (the structural words from which you'll derive meaning).

It might look something like this:

  With A, B, C is D.
[download]

from which you want to pull out:

  B has A.
  B is C.
  B is D.
[download]

So, write a program that accepts a mapping of patterns to results, and get it to the point that it can parse the first sentence and produce those results. Then add a pattern and mappings for the second sentence.

At some point you'll probably find that you need to distinguish things slightly better - perhaps this first pattern only works because 'B' maps to a place name (or more generally a proper noun). So start embellishing the patterns to allow the additional semantics to be expressed.

Pronouns are likely to make things rather more difficult, so leave them out unless you can find a way to embellish the patterns to say "this would reset the current topic". Similarly, you're unlikely to cope well with classical problem cases such as "time flies like an arrow", so don't support them. You may well find that you can get a 90% solution for a restricted language space, particularly if the text to be parsed is using very standardised constructs.

Repeat until any(bored, problem is solved, you find out why this approach won't work). :)

Hugo

Comment on Re: The (futile?) quest for an automatic paraphrase engine Select or Download Code

Replies are listed 'Best First'.

Re: Re: The (futile?) quest for an automatic paraphrase engine
by graff (Chancellor) on May 17, 2004 at 03:10 UTC

  With A, B, C is D.
[download]

 A is (was?) B.
 C has A.
 C is D.
[download]

There are some fairly well-developed means for spotting "entities" (especially "named entities") -- i.e., the referent noun phrases that make up the subjects and objects of factoids. There has even been some progress on trying to link pronominal references to "named entities" with some degree of success (yes, this is much harder, and quite impossible to do algorithmically for a large percentage of cases -- humans often get this wrong). And some progress on "roles" of entities within sentences (agent, recipient, direct-object etc), but again with much left to be desired.

Still, if the idea is simply to provide some guidance to humans who have to come up with flash-card text (or trivia questions and answers), there are a number of Part-of-speech (POS) taggers out there that can at least do a decent job of labeling nouns, verbs, prepositions, etc. Whether this can be a useful aid to flash-card authors is another question, but there's some room for the imaginative GUI designer to try things out...

[reply]
[d/l]
[select]


Pathologically Eclectic Rubbish Lister
	PerlMonks