in reply to Re: Group Similar Items in thread Group Similar Items
the distance approach is great if you are considering biological sequences, but i am not sure how well it will scale if you are considering text or phrases;
the key problem you will face is determining the right substitution/gap penalties with your distance metric.not so important with words, but important for phrases.
if the text is words, determining similarity via phonemes sounds more natural.
if you don't have an appropriate substitution/deletion penalty matrix, you could get quite dissimilar phrases clustered together.
|