Re: ignore UTF codes

could it be that my text document is DOS formatted? Perl does not seem to be recognizing the UTF codes at all. I cannot do anything to access them, and when I try to manipulate the line, most of the time I get a line like this

Malformed UTF-8 character (overflow at 0xa0c75a60, byte 0x70, after start byte 0xbf) in uc at ./qNa.pl line 15, <IN> line 25. joe

Comment on Re: ignore UTF codes

Replies are listed 'Best First'.
Re^2: ignore UTF codes by zer (Deacon) on Mar 16, 2006 at 04:37 UTC
you can access utf code... it depends on your situation... usualy from my experiances dos has been straight ascii... Let me see if i can find something for you ------------------------------------------- Ok there is a utf8::is_utf8() module. Itll find out if your character is utf8. so for example `$a=chr(0x74); print utf8::is_utf8($a)?"yes":"no"; $a=chr(0x470); print utf8::is_utf8($a)?"yes":"no";` [download] the output is "noyes" ... if you can provide some more code i can give you a more specific answer	[reply] [d/l]
Re^3: ignore UTF codes by kettle (Beadle) on Mar 16, 2006 at 04:47 UTC
thanks! i also tried to convert from dos2unix, but that did nothing to solve my problem :-(	[reply]
Re^3: ignore UTF codes by kettle (Beadle) on Mar 16, 2006 at 05:26 UTC
In the example line I gave before: Canciones\251STAMPID\253\277De quien es la cancion "STAND BY ME"*4 the cause I would simply like to delete the '\253' from the line (there is more that I will eventually want to do, but if I could complete this simple action, the rest ought to be a piece of cake. my first attempt at this was: $_ = s/\\253//g; This failed miserably. The problem is that the '\253' is being treated as a single character (i.e., if I try to highlight just one digit, it highlights the entire 4 digit string) I'm trying to write a c++ program to convert the codes to ASCII.	[reply]


Clear questions and runnable code get the best and fastest answer
	PerlMonks