http://qs321.pair.com?node_id=287887


in reply to Constructive criticism of a dictionary / text comparison script

Hi allolex. There is a problem that nobody has yet mentioned. It concerns this line:

next if $element =~ /[^A-Za-zĄ-’]/;

This is doing a lot more than you want it too, I think. Basically, it means "ignore any $element containing a character not in the set defined between square brackets". It is therefore stripping out, for example, any 'word' with attached punctuation. For example, in a sentence such as:

"Shut up!" he said.

you are throwing away three quarters of your 'words'! And you are also, of course, ignoring hyphenated words

It also means that the line:

$element =~ s/[\s\,\!\?\.\-\_\;\)\(\"\']//g;

never actually does anything, with or without surplus backslashes...

hth

dave