Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: How can I extract text from XML document and after that put the extracted text to original place?

by wanadlan (Initiate)
on Jan 28, 2003 at 17:07 UTC ( [id://230653]=note: print w/replies, xml ) Need Help??


in reply to How can I extract text from XML document and after that put the extracted text to original place?

My problem is not on spell checker but on extract text from xml document and put text in original place in xml.
eg: 1. spell error on "tex" <xmltag>tex</xmltag>. 2. extract tex to one file to do spell checking. extract <xmltag> to other file. content of file 1: tex content of file 2: <xmltag> </xmltag> 3. check the content of file 1 : tex --> text 4. after spell checking content of file 1: text content of file 2: <xmltag> </xmltag> 5. combine the content of this two file -produce new xml file that contain: <xmltag>text</xmltag>

I hope you all can consider this problem. Thanx you.

Replies are listed 'Best First'.
(jeffa) 2Re: How can I extract text from XML document and after that put the extracted text to original place?
by jeffa (Bishop) on Jan 29, 2003 at 03:37 UTC
    Well, i finally convinced myself that i have already written something very similar to this: (jeffa) Re: XML Search and Replace. That combined with my Lingua::Ispell review yielded the following:
    use strict; use warnings; use XML::Parser; use XML::Writer; use Lingua::Ispell qw(spellcheck); # change me to the output of 'which ispell' $Lingua::Ispell::path = '/path/to/ispell'; my $writer = XML::Writer->new(); my $parser = XML::Parser->new( Handlers => { Init => \&handle_Init, Start => \&handle_Start, Char => \&handle_Char, End => \&handle_End, Final => \&handle_Final, } ); $parser->parse(*DATA); sub handle_Init { $writer->xmlDecl('UTF-8'); $writer->doctype('xml'); } sub handle_Start { my($self,$name,%atts) = @_; $writer->startTag($name,%atts); } sub handle_Char { my($self,$text) = @_; for my $r (spellcheck($text)) { if ($r->{type} eq 'miss') { $text =~ s/$r->{term}/$r->{misses}->[0]/; } } $writer->characters($text); } sub handle_End { my($self,$name) = @_; $writer->endTag($name); } sub handle_Final { $writer->end(); } __DATA__ <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE xml> <xml> <stuff class="spelled wrong"> <item>ys we ave no banans</item> <item>els in my hooverkraft</item> </stuff> <stuff class="spelled right"> <item>yes we have no bananas</item> <item>eels in my hovercraft</item> </stuff> </xml>
    The important part is the handle_Char() subroutine. Right now, it simply replaces the mispelled item with the first 'miss' ispell coughs up. You will need to add an interface that allows a user to choose which miss they really want. That should be fairly simple - print the list of misses out for the user along with each misses' index to $r->{misses} and have them enter the index number. Also note that my script uses the built-in DATA filehandle for input and stdout for output -- you will want to change these. Good luck, and remember that this is Just One Way To Do It -- there are many more. :)

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://230653]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (3)
As of 2024-03-28 18:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found