I have been having an extremely trying 30 minutes or so, trying to regex out no-break space unicode entities, represented in my very large raw text file as \302\240. I was just about to post a request for some help, but figured out a solution to my problem. Perhaps it isn't the best solution, but I was unable to find anything concise which solved my problem, on the web, but I suppose there are others out there who have, or will have the same problem - so here's a very short, very simple solution:
#!/usr/bin/perl -w
use warnings;
use strict;
binmode(STDIN,":bytes");
binmode(STDOUT,":bytes");
while(<>){
chomp;
s/\302\240//g;
s/\s+/ /g;
print $_."\n";
}
This completely solved my problem. If it is incomplete, or not a very clever thing to do, please improve it. If it solves somebody elses problem as well - GREAT!
joe
2006-09-14 Retitled by planetscape, as per Monastery guidelines
( keep:0 edit:12 reap:0 )
Original title: 'Annoying Problem: solved'