Well, i finally convinced myself that i have already
written something very similar to this:
(jeffa) Re: XML Search and Replace.
That combined with my
Lingua::Ispell review yielded the
following:
use strict;
use warnings;
use XML::Parser;
use XML::Writer;
use Lingua::Ispell qw(spellcheck);
# change me to the output of 'which ispell'
$Lingua::Ispell::path = '/path/to/ispell';
my $writer = XML::Writer->new();
my $parser = XML::Parser->new(
Handlers => {
Init => \&handle_Init,
Start => \&handle_Start,
Char => \&handle_Char,
End => \&handle_End,
Final => \&handle_Final,
}
);
$parser->parse(*DATA);
sub handle_Init {
$writer->xmlDecl('UTF-8');
$writer->doctype('xml');
}
sub handle_Start {
my($self,$name,%atts) = @_;
$writer->startTag($name,%atts);
}
sub handle_Char {
my($self,$text) = @_;
for my $r (spellcheck($text)) {
if ($r->{type} eq 'miss') {
$text =~ s/$r->{term}/$r->{misses}->[0]/;
}
}
$writer->characters($text);
}
sub handle_End {
my($self,$name) = @_;
$writer->endTag($name);
}
sub handle_Final {
$writer->end();
}
__DATA__
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xml>
<xml>
<stuff class="spelled wrong">
<item>ys we ave no banans</item>
<item>els in my hooverkraft</item>
</stuff>
<stuff class="spelled right">
<item>yes we have no bananas</item>
<item>eels in my hovercraft</item>
</stuff>
</xml>
The important part is the
handle_Char() subroutine.
Right now, it simply replaces the mispelled item with the
first 'miss'
ispell coughs up. You will need to add
an interface that allows a user to choose which miss they
really want. That should be fairly simple - print the list
of misses out for the user along with each misses' index to
$r->{misses} and have them enter the index
number. Also note that my script uses the built-in
DATA filehandle for input and
stdout for
output -- you will want to change these. Good luck,
and remember that this is Just One Way To Do It -- there
are many more. :)
jeffa
L-LL-L--L-LL-L--L-LL-L--
-R--R-RR-R--R-RR-R--R-RR
B--B--B--B--B--B--B--B--
H---H---H---H---H---H---
(the triplet paradiddle with high-hat)
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.