Re: How can I extract text from XML document and after that put the extracted text to original place?
by rdfield (Priest) on Jan 28, 2003 at 15:14 UTC
|
in other word, the full source code
Not a chance. However, you might want to check CPAN:
A quick search of the Monestary might turn up a few suggestions too. Have you looked in the Tutorials? Perhaps a book like XML and Perl, recently reviewed here by davorg might be of use.rdfield
| [reply] |
Re: How can I extract text from XML document and after that put the extracted text to original place?
by davorg (Chancellor) on Jan 28, 2003 at 15:25 UTC
|
This sounds like a perfect use for a SAX filter.
Process your file and event at a time. When you get a text event, run it through your spell checker. Write either the original event or the corrected text into a new file.
--
<http://www.dave.org.uk>
"The first rule of Perl club is you do not talk about
Perl club." -- Chip Salzenberg
| [reply] |
(jeffa) Re: How can I extract text from XML document and after that put the extracted text to original place?
by jeffa (Bishop) on Jan 28, 2003 at 19:05 UTC
|
| [reply] |
|
What if your XML is a document, that will be later converted to HTML, PDF an text? Then it makes sense to spell-check it the XML instead of one of the target formats. Granted your XML editor probably has a spell checker already, but if you use a pure text editor that has no spell checking capability (ed? ;--) to create short XML documents, or if you receive them from other authors that do not spell check them, then it might make sense to spell check the XML as a separate step.
| [reply] |
Re: How can I extract text from XML document and after that put the extracted text to original place?
by boo_radley (Parson) on Jan 28, 2003 at 15:53 UTC
|
I can provide you with source code for a reasonable price. msg me if you're interested. | [reply] |
XML::Simple for looping through an XML structure
by Coruscate (Sexton) on Jan 28, 2003 at 18:24 UTC
|
You might want to look at the handy dandy XML::Simple module. Look at the XMLin() and XMLout() methods. XMLin() allows you to read in an XML document. Loop through the data structure that is returned from XMLin(), run your spell check on the data within in, then write the final results back to the XML document via XMLout().
As for exporting the tags and the text between the tags to two separate files and then putting them back together, just say 'NO'. On a large XML file, this would be extremely slow and you'd be doing much more work than necessary.
C:\>shutdown -s
>> Could not shut down computer:
>> Microsoft is logged in remotely.
| [reply] |
|
XML::Simple would probably not work here as it is designed for data-oriented XML and would not properly handle XML documents that include <p>some <i>mixed content</i> like this</p>.
As for this method being a problem for very large files, in that case the bottleneck would not be the processing time but more likely the time spent using the spell checker interractively. If that's really a problem (a huge file with very few spelling mistakes) you can always do it chunk by chunk using... say... XML::Twig ;--)
| [reply] [d/l] |
Re: How can I extract text from XML document and after that put the extracted text to original place?
by Anonymous Monk on Jan 29, 2003 at 11:20 UTC
|
I only want the method that I had given. It because I have built my own spell checker (also use ispell). I want extract text from xml file (to one file) because I want the extracted text is show in 'textarea box' in my application (with access text file). So user (who don't know about xml) can check their xml document without seeing xml tag. It will easy user check their xml doc.
Moreover, I do not make new spell checker. It will waste my time. I must do this application as soon as possilble. Plz. | [reply] |
A reply falls below the community's threshold of quality. You may see it by logging in. |
Re: How can I extract text from XML document and after that put the extracted text to original place?
by wanadlan (Initiate) on Jan 28, 2003 at 17:07 UTC
|
My problem is not on spell checker but on extract text from xml document and put text in original place in xml.
eg:
1. spell error on "tex"
<xmltag>tex</xmltag>.
2. extract tex to one file to do spell checking.
extract <xmltag> to other file.
content of file 1: tex
content of file 2: <xmltag> </xmltag>
3. check the content of file 1 : tex --> text
4. after spell checking
content of file 1: text
content of file 2: <xmltag> </xmltag>
5. combine the content of this two file
-produce new xml file that contain:
<xmltag>text</xmltag>
I hope you all can consider this problem. Thanx you. | [reply] [d/l] |
|
use strict;
use warnings;
use XML::Parser;
use XML::Writer;
use Lingua::Ispell qw(spellcheck);
# change me to the output of 'which ispell'
$Lingua::Ispell::path = '/path/to/ispell';
my $writer = XML::Writer->new();
my $parser = XML::Parser->new(
Handlers => {
Init => \&handle_Init,
Start => \&handle_Start,
Char => \&handle_Char,
End => \&handle_End,
Final => \&handle_Final,
}
);
$parser->parse(*DATA);
sub handle_Init {
$writer->xmlDecl('UTF-8');
$writer->doctype('xml');
}
sub handle_Start {
my($self,$name,%atts) = @_;
$writer->startTag($name,%atts);
}
sub handle_Char {
my($self,$text) = @_;
for my $r (spellcheck($text)) {
if ($r->{type} eq 'miss') {
$text =~ s/$r->{term}/$r->{misses}->[0]/;
}
}
$writer->characters($text);
}
sub handle_End {
my($self,$name) = @_;
$writer->endTag($name);
}
sub handle_Final {
$writer->end();
}
__DATA__
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xml>
<xml>
<stuff class="spelled wrong">
<item>ys we ave no banans</item>
<item>els in my hooverkraft</item>
</stuff>
<stuff class="spelled right">
<item>yes we have no bananas</item>
<item>eels in my hovercraft</item>
</stuff>
</xml>
The important part is the handle_Char() subroutine.
Right now, it simply replaces the mispelled item with the
first 'miss' ispell coughs up. You will need to add
an interface that allows a user to choose which miss they
really want. That should be fairly simple - print the list
of misses out for the user along with each misses' index to
$r->{misses} and have them enter the index
number. Also note that my script uses the built-in
DATA filehandle for input and stdout for
output -- you will want to change these. Good luck,
and remember that this is Just One Way To Do It -- there
are many more. :)
jeffa
L-LL-L--L-LL-L--L-LL-L--
-R--R-RR-R--R-RR-R--R-RR
B--B--B--B--B--B--B--B--
H---H---H---H---H---H---
(the triplet paradiddle with high-hat)
| [reply] [d/l] |
Re: How can I extract text from XML document and after that put the extracted text to original place?
by wanadlan (Initiate) on Jan 28, 2003 at 16:23 UTC
|
I'm sorry. I hope you all can help me. Not mean full source code but I want u all help me to solve this problem and give me tips to do this. plz. | [reply] |