Re: The (futile?) quest for an automatic paraphrase engine

So this guy wants Natural Language Precessing. Not just for input but also for output. You have an underatanding that this is the holy grail of AI and the subject of quite probably terrabytes of PhD theses. You have only a basic underatanding of Perl (probably not the best/worst language in which to perform AI) and you find the material you have seen too complicated?

Tell the guy to wait for quantum computers to hit the desktop then post again :-)

What you can do is split into sentences (even that is non trivial i.e. split /\./, $text Mr. Smith). See Text::Sentence. Past that you have quite possibly the most non-trivial problem in CS/AI.

The only way you could generate a (still non) trivial solution of vague utility is to constrain the problem to a very limited subset of input text.

cheers

tachyon

Comment on Re: The (futile?) quest for an automatic paraphrase engine

Replies are listed 'Best First'.
Re: Re: The (futile?) quest for an automatic paraphrase engine by allolex (Curate) on May 17, 2004 at 08:29 UTC
There seem to be some fairly nice results from statistical methods. I have a couple of references in my post in this thread, but what it boils down to is that there is the knowledge-based way (yours and my preference, apparently) and at least one statistical method being used. The statistical NLP method is called "clustering" because it creates clusters of semantically relevant sentence constituents and then re-uses those constituents to generate a summarization text. -- Damon Allen Davison http://www.allolex.net	[reply]
Re: Re: Re: The (futile?) quest for an automatic paraphrase engine by tachyon (Chancellor) on May 17, 2004 at 11:49 UTC
Stats are somewhat like doing spam with Bayes, Fisher/Robinson etc. For certain tasks they can make useful 'educated' guesses but they are still 'dumb' algorithms. If you look at how a child learns language they do seem to use a suck it and see approach. They then get feedback on if that was a 'winner' or not. As approaches to AI go I think both knowledge and stats based are 'wrong'. While there is no doubt that both can yield useful results they appear to my mind to have finite limits. I favour a fuzzy logic nodal learning framework ie try to build a machine that can learn without trying to tell it exactly how to learn that. The main issue with this is processor speed (or rather the lack of it) combined with the training time. Language processing is actually a good task for this as you have what is effectively a character based input and output stream making the interface simple. cheers tachyon	[reply]
Re*4: The (futile?) quest for an automatic paraphrase engine by allolex (Curate) on May 17, 2004 at 13:29 UTC
Humans learn language as a social tool within a social environment. I won't exactly say never, but it will be a long time before computers are able to learn language the way humans do. Some people argue that it's possible to give an AI a corpus and have it learn from that, but the conditions are still not the same because the AI is still not participating to learn, just observing. Children learn language by forming intermediate (defective) grammars, which are then corrected by others in their environment, usually adults--their parents. A computer program is never going to have that kind of exposure unless we get humans to correct them, which comes back to a knowledge-based approach. Wittgenstein pointed out that people learn not by being told what things are, but by being exposed to examples. (People are always talking about "food" and "the fridge" in the same context, so maybe there's a relation...) Given this and that language is so dependent to do with the way humans are built and live (How do we learn what "mother" or "cousin" is?) that we'd pretty much have to emulate a human before teaching our emulation how to speak in this way. So knowledge is still important to give our linguistic AIs a field of reference that it would otherwise just not have. I guess the point I am trying to make is that stats just produce results, but don't really reflect anything more than data regularities in a given context. Knowledge has its major fault in its static nature. And fuzzy logic is nice, but, at least for this application, it needs some knowledge to start with. A hybrid approach using all three might be possible by giving a knowledge-driven AI the capability of creating it's own knowledge using statistical snapshots. Who knows? -- Damon Allen Davison http://www.allolex.net	[reply]
Re: Re*4: The (futile?) quest for an automatic paraphrase engine by jonadab (Parson) on May 17, 2004 at 19:51 UTC
Re: Re*4: The (futile?) quest for an automatic paraphrase engine by husker (Chaplain) on May 17, 2004 at 15:12 UTC


Perl: the Markov chain saw
	PerlMonks