The compound word could also be split "on line trading" and in general, there is more than one way to do it. I would create a personal dictionary of atomic words that you want and test against those. You would do this by implementing the grammar
<compound-word> := <word> <compound-word>
<word> := word1 | word2 | ... | wordn
Regexes can do this for you, for a reasonably small number of atomic words:
my $compound = "onlinetrading";
my $words = 'online|trading';
my ($first, $second);
if ($compound =~ /^($words)($words)$/) {
$first = $1;
$second = $2;
}
print "$first, $second\n";
Alternatively, check out Aspell, as it has some support for compound words.
| [reply] [Watch: Dir/Any] [d/l] [select] |
Think you need something smarter. How about Yahoo's spell check?
$ typo onlinetradeing
$ Corrected: online trading
Note that you should register with their developer site (it's free) to get your own developer token and use their web API with Yahoo::Search from CPAN and the following script:
#!/usr/bin/perl
# typo - Ask Yahoo for spell corrections
use strict;
my $term = "@ARGV";
die "usage: $0 word/phrase ..."
unless length $term;
use Yahoo::Search AppId => "your_yahoo_token";
my($suggestion) = Yahoo::Search->Terms(
Spell => $term);
if(defined $suggestion) {
print "Corrected: $suggestion\n";
} else {
print "No suggestions\n";
}
Here's an article with more detailed info. | [reply] [Watch: Dir/Any] [d/l] [select] |
I think you need to develope a script which would split word. Now, In case of 'onlinetradeing',
You can follow the approach mentioned below
assume that you want to split the word in only 2 words then you need only 14 combinations for the said case. (i.e. Total Letters). Your script can create a list of these words and check every word against dictionary and pop-up valid suggestions.
Your 'TO BE VALIDATED' list would be
a) onlinetradeing
b) o nlinetradeing
c) on linetradeing
d) onl inetradeing
.
.
.
n) onlinetradein g
I think splitting the words should be cakewalk through regex. | [reply] [Watch: Dir/Any] |
| [reply] [Watch: Dir/Any] |
srik4u,
the solution I was given in a recent post of mine, find all paths of length n in a graph, might offer a good place to start. Using the idea of a trie, you could recursively build the phrases as you go along. I am not good at writing recursive functions, but the idea is basically this: recursively iterate over the string and pop off substrings that match whole words. The phrase fails when you have a substring that is not a partial word. You have a valid phrase when you have reached the end of the string without failing on a substring. A function might look something like this:
sub check_string($word) {
foreach $chr (split //, $word) {
$check .= $chr;
if whole_word($check) {
push @phrase, $check;
$rem = substr($word, length($check), length($word) - lengt
+h($check));
check_string($remainder);
} elsif not_valid($check) {
@phrase = ();
return;
}
}
print "@phrase\n";
@phrase = ();
}
as I say I am not good at recursive functions, so the above is merely a starting place. (getting the recursive element to work always befuddles me). More visually, this is what would happen with the string "mycarrot"
m valid partial string, so continue
my found a whole word, so push and recurse
@phrase = ("my")
c valid partial string, so continue
ca valid partial string, so continue
car found a whole word, so push and recurse
@phrase = ("my", "car")
r valid partial string, so continue
ro valid partial string, so continue
rot found a whole word, so push
at end of string, so print valid phrase
@phrase = ("my", "car", "rot")
(backup to last iteration and continue)
@phrase = ("my")
carr valid partial string, so continue
carro valid partial string, so continue
carrot found a whole so push
at end of string, so print valid phrase
("my", "carrot")
back to last iteration and continue)
@phrase = ()
myc invalid partial string, so quit
@phrase = ()
In the scary place I call my mind, this makes sense. I hope it makes sense to you. Maybe if someone else understands what I am trying to explain, they might be able to clarify it better than I.
davidj | [reply] [Watch: Dir/Any] [d/l] [select] |