Firstly, just try to export the files in text format in such a way that you can tell the character set to office. IF that fails, just export as text in whatever Windows-specific character set it likes, and piconv that text to a "normal" character set, such as iso_8859_1, iso_8859_2, utf8, utf16, or whatever you like.
If, however, you really want a quick-and-dirty solution, and convert to ascii, here's some substitutions. This is assuming that your incoming data is cp1250. Also I omit those characters that are the same in 8859_1 and cp1250, as I think you're not angry with those. So, the only characters here are those that are different in 8859_2 and 8859_1, and the windows extensions.
s/\x80/EUR/g;
s/\x82/,/g;
s/\x84/,,/g;
s/\x85/.../g;
s/\x86/\/\/\-/g;
s/\x87/\/\/\=/g;
s/\x89/\%0/g;
s/\x8a/S\</g;
s/\x8b/\</g;
s/\x8c/S\'/g;
s/\x8d/T\</g;
s/\x8e/Z\</g;
s/\x8f/Z\'/g;
s/\x91/`/g;
s/\x92/'/g;
s/\x93/``/g;
s/\x94/''/g;
s/\x95/o/g;
s/\x96/--/g;
s/\x97/---/g;
s/\x99/TM/g;
s/\x9a/s\</g;
s/\x9b/>/g;
s/\x9c/s\'/g;
s/\x9d/t\</g;
s/\x9e/z\</g;
s/\x9f/z\'/g;
s/\xa1/\'\</g;
s/\xa2/\'\(/g;
Update: if you want better ascii equivalents, you might be able to generate them from the files in the Unicode directory of links (the browser).
Update: readmored some of the code
|