Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

repairing a broken xml using LibXML

by deadpickle (Pilgrim)
on Feb 23, 2009 at 00:37 UTC ( [id://745676]=perlquestion: print w/replies, xml ) Need Help??

deadpickle has asked for the wisdom of the Perl Monks concerning the following question:

I'm UDPing an XML file and I want to parse it using XML::LibXML. After parsing then it must be validated against an XSD so that I know what type of XML file it is. But before all that I have to be able to handle broken XML files. I have noticed the recover flag but what do I do with it? If i send a broken XML file through the stream I get the error:
Received message: n="1.0" encoding="UTF-8"?> <Capabilities xsi:schemaLocation="http://Capabilities.CUIntegration.co +m capabili ties.xsd" xmlns="http://Capabilities.CUIntegration.com" xmlns:xsi="htt +p://www.w3 .org/2001/XMLSchema-instance"> <VehicleID>String</VehicleID> <System> <ID>Servo</ID> <AcceptedCommands>Loiter</AcceptedCommands> <AvailableStreams>Telemetry</AvailableStreams> </System> </Capabilities> :1: parser error : Start tag expected, '<' not found n="1.0" encoding="UTF-8"?> ^ The document has no document element.
So now what? How do I fix this XML file so that it will validate ok?

Here are the codes:

Server

#!/usr/bin/perl -w use strict; use IO::Socket; use XML::LibXML; my $MySocket=new IO::Socket::INET->new(LocalPort=>1234,Proto=>'udp'); my $parser = XML::LibXML->new; $parser->recover(1); my $xml1 = ""; my $schema = XML::LibXML::Schema->new(location => 'C:\Users\deadpickle +\Desktop\UAS\GRRUVI_1.50\panel\capabilities.xsd' ); while(1) { if ($MySocket) { $MySocket->recv($xml1,2669); print "\nReceived message: ", $xml1,"\n"; my $doc = $parser->parse_string($xml1); #$parser->validation(1); eval { $schema->validate( $doc ) }; die $@ if $@; print "VALID\n"; } }
Client
#!/usr/bin/perl -w use strict; use IO::Socket; use File::Slurp; my $xml1 = read_file("C:\\Users\\deadpickle\\Desktop\\UAS\\GRRUVI_1.50 +\\panel\\test.xml"); my $MySocket=new IO::Socket::INET->new(PeerPort=>1234,Proto=>'udp',Pee +rAddr=>'localhost'); while (1) { print $xml1; $MySocket->send($xml1); sleep 5; }

Replies are listed 'Best First'.
Re: repairing a broken xml using LibXML
by almut (Canon) on Feb 23, 2009 at 01:35 UTC
    I have noticed the recover flag...

    I don't think the recover option is meant to fix/ignore arbitrary garbage input. The docs say (emphasis added):

    "The recover mode helps to recover documents that are almost well-formed very efficiently. That is for example a document that forgets to close the document tag (or any other tag inside the document). The recover mode of XML::LibXML has problems restoring documents that are more like well balanced chunks."
      In my past experience with these broken xml files it seems that the beginning of the namespace always gets clipped some. Is there a way to repair the namespace?

      Update I was wrong about what gets clipped. Its actually the XML version tag (if thats what its called, I have no idea);<?xml version="1.0" encoding="UTF-8"?>.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://745676]
Approved by ikegami
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (7)
As of 2024-04-19 08:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found