Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re^3: Is it possible to find the matching words and the percentage of matching words between two texts?

by uncoolbob (Novice)
on Dec 21, 2012 at 11:52 UTC ( #1009904=note: print w/replies, xml ) Need Help??


in reply to Re^2: Is it possible to find the matching words and the percentage of matching words between two texts?
in thread Is it possible to find the matching words and the percentage of matching words between two texts?

What you really need is to align the two texts with a "dynamic programming" algorithm. This is a common task in bioinformatics - but the atomic unit there is a single character - and there is a small number of expected characters (usually 4 or 20). You would have to hack it a fair bit to work with an array of words from an essentially unlimited "character set" - but I haven't looked in detail at the code:

Bio::Tools::dpAlign

For quick and dirty I would extend the hash comparison approach to handle words, word pairs, triplets and maybe more. Also maybe keep searching CPAN maybe there's something else out there.
  • Comment on Re^3: Is it possible to find the matching words and the percentage of matching words between two texts?

Replies are listed 'Best First'.
Re^4: Is it possible to find the matching words and the percentage of matching words between two texts?
by supriyoch_2008 (Monk) on Dec 22, 2012 at 10:46 UTC

    Hi uncoolbob,

    Thanks for providing information about Bio::Tools.

    With regards

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1009904]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (1)
As of 2021-12-02 01:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    R or B?



    Results (16 votes). Check out past polls.

    Notices?