Re^3: getting rid of UTF-8

Replies are listed 'Best First'.
Re^4: getting rid of UTF-8 by BernieC (Pilgrim) on Nov 25, 2022 at 02:57 UTC
I'll try to get something together and paste a hex dump. But: i know that there are nothing but plain lower 128 ASCII characters {I just mentioned ISO-latin out of habit}. It is all data that I entered in and there's no data in the CSV files that isn't something I entered. I have no idea why there's a bom in the middle of the first record..... I'll get the dump	[reply]
Re^5: getting rid of UTF-8 by BernieC (Pilgrim) on Nov 25, 2022 at 03:25 UTC
OK. I've got a hex dump. Here's what the file looks like in a text editor: Importance,"First Name","Middle Name","Last Name","Full Name",Company, +Department,"Job Title","Street (b.)","City (b.)","State (b.)","ZIP Co +de (b.)","Country/Region (b.)","Home Phone","Business Phone","Mobile +Phone","Business Phone 2","Business Phone 3","Business Phone 4","Busi +ness Fax","Business Web Page","Street (h.)","City (h.)","State (h.)", +"ZIP Code (h.)","Country/Region (h.)","Home Phone 2","Home Phone 3"," +Home Phone 4","Home Fax","Personal Web Page","Mobile Phone 2","Mobile + Phone 3","Mobile Phone 4",E-mail,"E-mail 2","E-mail 3","E-mail 4",x, +y,z,w,Office,Supervisor,Assistant,Salutation,Nickname,Gender,Spouse,B +irthday,Anniversary,Family,Hobbies,Specialty,Strengths,Personality,No +tes,"Custom 2","Custom 3","Custom 4","Custom 5","Custom 6","Custom 7" +,"Custom 8",Comment,Group,"Birthday Reminder On/Off","Anniversary Rem +inder On/Off" Normal,,,,"A-1 Heating and Cooling","A-1 Heating and Cooling",,,"PO Bo +x 94",Newport,Virginia,24128,"United States",,544-7810,,,,,,,,,,,"Uni +ted States",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"953-1513 Scott - service mgr - cell 540 357 2816","Emergency contacts",No,No [download] And here's the hex dump of it EF BB BF 49 6D 70 6F 72 74 61 6E 63 65 2C 22 46 69 72 73 74 20 4E 61 6 +D 65 22 2C 22 4D 69 64 64 6C 65 20 4E 61 6D 65 22 2C 22 4C 61 73 74 2 +0 4E 61 6D 65 22 2C 22 46 75 6C 6C 20 4E 61 6D 65 22 2C 43 6F 6D 70 6 +1 6E 79 2C 44 65 70 61 72 74 6D 65 6E 74 2C 22 4A 6F 62 20 54 69 74 6 +C 65 22 2C 22 53 74 72 65 65 74 20 28 62 2E 29 22 2C 22 43 69 74 79 2 +0 28 62 2E 29 22 2C 22 53 74 61 74 65 20 28 62 2E 29 22 2C 22 5A 49 5 +0 20 43 6F 64 65 20 28 62 2E 29 22 2C 22 43 6F 75 6E 74 72 79 2F 52 6 +5 67 69 6F 6E 20 28 62 2E 29 22 2C 22 48 6F 6D 65 20 50 68 6F 6E 65 2 +2 2C 22 42 75 73 69 6E 65 73 73 20 50 68 6F 6E 65 22 2C 22 4D 6F 62 6 +9 6C 65 20 50 68 6F 6E 65 22 2C 22 42 75 73 69 6E 65 73 73 20 50 68 6 +F 6E 65 20 32 22 2C 22 42 75 73 69 6E 65 73 73 20 50 68 6F 6E 65 20 3 +3 22 2C 22 42 75 73 69 6E 65 73 73 20 50 68 6F 6E 65 20 34 22 2C 22 4 +2 75 73 69 6E 65 73 73 20 46 61 78 22 2C 22 42 75 73 69 6E 65 73 73 2 +0 57 65 62 20 50 61 67 65 22 2C 22 53 74 72 65 65 74 20 28 68 2E 29 2 +2 2C 22 43 69 74 79 20 28 68 2E 29 22 2C 22 53 74 61 74 65 20 28 68 2 +E 29 22 2C 22 5A 49 50 20 43 6F 64 65 20 28 68 2E 29 22 2C 22 43 6F 7 +5 6E 74 72 79 2F 52 65 67 69 6F 6E 20 28 68 2E 29 22 2C 22 48 6F 6D 6 +5 20 50 68 6F 6E 65 20 32 22 2C 22 48 6F 6D 65 20 50 68 6F 6E 65 20 3 +3 22 2C 22 48 6F 6D 65 20 50 68 6F 6E 65 20 34 22 2C 22 48 6F 6D 65 2 +0 46 61 78 22 2C 22 50 65 72 73 6F 6E 61 6C 20 57 65 62 20 50 61 67 6 +5 22 2C 22 4D 6F 62 69 6C 65 20 50 68 6F 6E 65 20 32 22 2C 22 4D 6F 6 +2 69 6C 65 20 50 68 6F 6E 65 20 33 22 2C 22 4D 6F 62 69 6C 65 20 50 6 +8 6F 6E 65 20 34 22 2C 45 2D 6D 61 69 6C 2C 22 45 2D 6D 61 69 6C 20 3 +2 22 2C 22 45 2D 6D 61 69 6C 20 33 22 2C 22 45 2D 6D 61 69 6C 20 34 2 +2 2C 78 2C 79 2C 7A 2C 77 2C 4F 66 66 69 63 65 2C 53 75 70 65 72 76 6 +9 73 6F 72 2C 41 73 73 69 73 74 61 6E 74 2C 53 61 6C 75 74 61 74 69 6 +F 6E 2C 4E 69 63 6B 6E 61 6D 65 2C 47 65 6E 64 65 72 2C 53 70 6F 75 7 +3 65 2C 42 69 72 74 68 64 61 79 2C 41 6E 6E 69 76 65 72 73 61 72 79 2 +C 46 61 6D 69 6C 79 2C 48 6F 62 62 69 65 73 2C 53 70 65 63 69 61 6C 7 +4 79 2C 53 74 72 65 6E 67 74 68 73 2C 50 65 72 73 6F 6E 61 6C 69 74 7 +9 2C 4E 6F 74 65 73 2C 22 43 75 73 74 6F 6D 20 32 22 2C 22 43 75 73 7 +4 6F 6D 20 33 22 2C 22 43 75 73 74 6F 6D 20 34 22 2C 22 43 75 73 74 6 +F 6D 20 35 22 2C 22 43 75 73 74 6F 6D 20 36 22 2C 22 43 75 73 74 6F 6 +D 20 37 22 2C 22 43 75 73 74 6F 6D 20 38 22 2C 43 6F 6D 6D 65 6E 74 2 +C 47 72 6F 75 70 2C 22 42 69 72 74 68 64 61 79 20 52 65 6D 69 6E 64 6 +5 72 20 4F 6E 2F 4F 66 66 22 2C 22 41 6E 6E 69 76 65 72 73 61 72 79 2 +0 52 65 6D 69 6E 64 65 72 20 4F 6E 2F 4F 66 66 22 0D 0A 4E 6F 72 6D 6 +1 6C 2C 2C 2C 2C 22 41 2D 31 20 48 65 61 74 69 6E 67 20 61 6E 64 20 4 +3 6F 6F 6C 69 6E 67 22 2C 22 41 2D 31 20 48 65 61 74 69 6E 67 20 61 6 +E 64 20 43 6F 6F 6C 69 6E 67 22 2C 2C 2C 22 50 4F 20 42 6F 78 20 39 3 +4 22 2C 4E 65 77 70 6F 72 74 2C 56 69 72 67 69 6E 69 61 2C 32 34 31 3 +2 38 2C 22 55 6E 69 74 65 64 20 53 74 61 74 65 73 22 2C 2C 35 34 34 2 +D 37 38 31 30 2C 2C 2C 2C 2C 2C 2C 2C 2C 2C 2C 22 55 6E 69 74 65 64 2 +0 53 74 61 74 65 73 22 2C 2C 2C 2C 2C 2C 2C 2C 2C 2C 2C 2C 2C 2C 2C 2 +C 2C 2C 2C 2C 2C 2C 2C 2C 2C 2C 2C 2C 2C 2C 2C 2C 2C 2C 2C 2C 2C 2C 2 +C 22 EF BB BF 39 35 33 2D 31 35 31 33 0D 0A 53 63 6F 74 74 20 2D 20 7 +3 65 72 76 69 63 65 20 6D 67 72 20 2D 20 63 65 6C 6C 20 35 34 30 20 3 +3 35 37 20 32 38 31 36 22 2C 22 45 6D 65 72 67 65 6E 63 79 20 20 63 6 +F 6E 74 61 63 74 73 22 2C 4E 6F 2C 4E 6F 0D 0A 4E 6F 72 6D 61 6C 2C 2 +C 2C 2C 22 41 62 69 6E 67 64 6F 6E 20 45 71 75 69 70 6D 65 6E 74 22 2 +C 22 [download] Notice, from the dump that there another EFBBBF toward the end of the file. And: I tried to brute force it and it didn't work!! I did the `$line =~ s/\xef\xbb\xbf//` [download] and it didn't remove the characters! I'll try again...	[reply] [d/l] [select]
Re^6: getting rid of UTF-8 by haukex (Archbishop) on Nov 25, 2022 at 09:43 UTC
I did the `$line =~ s/\xef\xbb\xbf//` and it didn't remove the characters! Using the advice from kcott here to use `/g`, it works for me. If it really doesn't work for you, then perhaps the data you have in your Perl string is not what you think it is. See my node here for advice on how to show us the real data, in particular Devel::Peek, and make sure to provide an SSCCE that we can run to see the problem for ourselves.	[reply] [d/l] [select]
Re^7: getting rid of UTF-8 by BernieC (Pilgrim) on Nov 25, 2022 at 15:08 UTC
Re^8: getting rid of UTF-8 by haukex (Archbishop) on Nov 25, 2022 at 16:09 UTC
Some notes below your chosen depth have not been shown here
Re^8: getting rid of UTF-8 by Anonymous Monk on Nov 25, 2022 at 16:19 UTC


Perl: the Markov chain saw
	PerlMonks