Re: Re: Group Similar Items


No such thing as a small change
	PerlMonks

Re: Re: Group Similar Items

by wufnik (Friar)

on May 28, 2003 at 07:30 UTC ( [id://261243]=note: print w/replies, xml )

Need Help??

in reply to Re: Group Similar Items
in thread Group Similar Items

the distance approach is great if you are considering biological sequences, but i am not sure how well it will scale if you are considering text or phrases;

the key problem you will face is determining the right substitution/gap penalties with your distance metric.not so important with words, but important for phrases. if the text is words, determining similarity via phonemes sounds more natural.

if you don't have an appropriate substitution/deletion penalty matrix, you could get quite dissimilar phrases clustered together.