Do you know where your variables are? | |
PerlMonks |
comment on |
( [id://3333]=superdoc: print w/replies, xml ) | Need Help?? |
Many of the downloads are 2+Gb long and I get memory errors if I do too much in RAM. Well, that's a constraint that you didn't share initially. Had I been aware of that I would not have proposed slurping the file(s) into memory. Now that I have a better understanding of the constraints, I would probably do something like the untested code below. For each file that needs 'cleaning', run the script below with the perl -i.bak, which opens the file for in place editing and backs it up to a file with the .bak file extension before opening the file for editing. (Without the .bak, Perl just overwrites the file with no backup.) Basically, the code below will check a file line by line for each tag/attribute pairs specified. If an attribute is missing for a tag, that line is 'deleted' from the file. This might not be exactly what you want to do, but it should give you a framework to use for your own 'noise' handling operations.
In reply to Re^3: XML cleanup - regex or ?
by dasgar
|
|