Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re: decoding a UTF-16B string found in an email subject

by skx (Parson)
on Oct 30, 2013 at 19:24 UTC ( #1060422=note: print w/replies, xml ) Need Help??


in reply to decoding a UTF-16B string found in an email subject

The headers of mesages are encoded as per RFC 2047.

You can see sample code, in Perl, to decode such headers if you consult CPAN for Encode::Mime::Header

Steve
--
  • Comment on Re: decoding a UTF-16B string found in an email subject

Replies are listed 'Best First'.
Re^2: decoding a UTF-16B string found in an email subject
by runrig (Abbot) on Oct 30, 2013 at 20:18 UTC
    This should be the correct answer, but I don't think the string is correctly encoded. This:
    use Encode qw(decode); my $str = 'username, A Ne=?UTF-16?B?dwAgAEMAcgBlAGQAaQB0ACAAQwBhAHIAZA +AgAEMAbwB1AGwAZAAgAEIAZQAgAEgAZQBhAGQAZQBkACAAWQBvAHUAcgAgAFcAYQB5AA= +=?='; my $chr = decode('MIME-Header', $str); print "$chr\n";
    Gets me:
    UTF-16:Unrecognised BOM 7700 at /.../Encode/MIME/Header.pm line 81.
    While this:
    use MIME::Base64; my $cstr = 'dwAgAEMAcgBlAGQAaQB0ACAAQwBhAHIAZAAgAEMAbwB1AGwAZAAgAEIAZQ +AgAEgAZQBhAGQAZQBkACAAWQBvAHUAcgAgAFcAYQB5AA'; my $chk = decode_base64($cstr); print "$chk\n";
    Gets me:
    w Credit Card Could Be Headed Your Way
    So the part that is supposed to be UTF-16 appears to be just base64 encoded.

    UPDATE: And if you change 'UTF-16' in the first part to 'UTF-8', then it is correctly decoded without error.

      According to all docs I've found, a BOM is not necessary, and when a BOM is not present then big-endian is supposed. However the string you give seems to be little-endian (as is the case in the problem that got me to this page...). If you s/UTF-16/UTF-16LE/ then your string gets decoded correctly.
Re^2: decoding a UTF-16B string found in an email subject
by neaj (Initiate) on Oct 30, 2013 at 20:22 UTC

    thanks the doc for Encode::Mime::Header explained what i was doing wrong

    #my $string = "=?UTF-16?B?dwAgAEMAcgBlAGQAaQB0ACAAQwBhAHIAZAAgAEMAbwB1 +AGwAZAAgAEIAZQAgAEgAZQBhAGQAZQBkACAAWQBvAHUAcgAgAFcAYQB5AA==?="; my $string = "dwAgAEMAcgBlAGQAaQB0ACAAQwBhAHIAZAAgAEMAbwB1AGwAZAAgAEIA +ZQAgAEgAZQBhAGQAZQBkACAAWQBvAHUAcgAgAFcAYQB5AA=="; print MIME::Base64::decode( $string ), "\n"; w Credit Card Could Be Headed Your Way

    i had to use base64 decoding on the encoded word, and not the whole string

    =?encoding?X?ENCODED WORD?=
      If you just subsitute 'UTF-16' with 'UTF-8', then the entire line is correctly decoded with decode('MIME-Header', $str). The encoded part appears to be incorrectly encoded. Probably typical of spammers...

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1060422]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (5)
As of 2021-02-25 00:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?