Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: The (futile?) quest for an automatic paraphrase engine

by hv (Prior)
on May 17, 2004 at 01:34 UTC ( [id://353851]=note: print w/replies, xml ) Need Help??


in reply to The (futile?) quest for an automatic paraphrase engine

Since there are so many people here saying how hard this is, I guess it wouldn't work. But I'd suggest starting with the first sentence and working it into the pattern of passive text (words that will be pulled unmodified into a result sentence) and active (the structural words from which you'll derive meaning).

It might look something like this:

With A, B, C is D.
from which you want to pull out:
B has A. B is C. B is D.

So, write a program that accepts a mapping of patterns to results, and get it to the point that it can parse the first sentence and produce those results. Then add a pattern and mappings for the second sentence.

At some point you'll probably find that you need to distinguish things slightly better - perhaps this first pattern only works because 'B' maps to a place name (or more generally a proper noun). So start embellishing the patterns to allow the additional semantics to be expressed.

Pronouns are likely to make things rather more difficult, so leave them out unless you can find a way to embellish the patterns to say "this would reset the current topic". Similarly, you're unlikely to cope well with classical problem cases such as "time flies like an arrow", so don't support them. You may well find that you can get a 90% solution for a restricted language space, particularly if the text to be parsed is using very standardised constructs.

Repeat until any(bored, problem is solved, you find out why this approach won't work). :)

Hugo

Replies are listed 'Best First'.
Re: Re: The (futile?) quest for an automatic paraphrase engine
by graff (Chancellor) on May 17, 2004 at 03:10 UTC
    Sorry to nit-pick, but given a template like this:
    With A, B, C is D.
    the most common examples are things like "With Ripoffsky, a first-round draft pick, Green Sox manager Frump is unlikely to see a penant this year." That is:
    A is (was?) B. C has A. C is D.
    Of course, a non-trivial part of the "project" at hand is to pick a suitable "corpus" of sentences that lend themselves to this sort of treatment -- and I don't know any automated way to handle that either.

    There are some fairly well-developed means for spotting "entities" (especially "named entities") -- i.e., the referent noun phrases that make up the subjects and objects of factoids. There has even been some progress on trying to link pronominal references to "named entities" with some degree of success (yes, this is much harder, and quite impossible to do algorithmically for a large percentage of cases -- humans often get this wrong). And some progress on "roles" of entities within sentences (agent, recipient, direct-object etc), but again with much left to be desired.

    Still, if the idea is simply to provide some guidance to humans who have to come up with flash-card text (or trivia questions and answers), there are a number of Part-of-speech (POS) taggers out there that can at least do a decent job of labeling nouns, verbs, prepositions, etc. Whether this can be a useful aid to flash-card authors is another question, but there's some room for the imaginative GUI designer to try things out...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://353851]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (3)
As of 2024-04-25 23:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found