Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Take a bite out of my SPAM please

by Ignorance (Monk)
on Aug 31, 2000 at 18:41 UTC ( [id://30531]=perlquestion: print w/replies, xml ) Need Help??

Ignorance has asked for the wisdom of the Perl Monks concerning the following question:

I just got another annoying/tryingToHide spam.(what other kind is there?)
Anyway, they hid the site url in a bunch of  etc.

I would like to forward this mail to abuse.net, but I don't know how to convert the hashes painlessly.

Do you think you could make a form where I could just past in the offending form action and have it return the URL's?

Here is a sample of the mail script:

<FORM action="http://www.dns&#16;&#2;&#5;&#5;&#5;&#16;&#2;&#16;&#5;&#16;&#16 +;&#16;&#5;&#5;&#16;&#2;&#16;&#16;&#2;&#16;&#16;&#2;magicsite.net&amp; +1085287724@1085287723&amp;1063256813@3493727992?75541114&amp;proxy=10 +63256794?1085287723@1063256813?www.su2537.tw|www.&#16;&#2;&#5;&#5;&#5 +;&#16;&#16;&#16;&#5;&#5;&#16;&#2;&#16;&#16;&#2;&#16;&#16;&#2;.hk?/pro +xy=1063256803@2204624165:2030@2204624133?/@1085307406:8080@1085305176 +@%31%30%38%35%33%30%35%31%36%35" method=post>

It might actualy be a good simple project for me, if you could just point me in the right direction of what function(s) or regular expression to use that would be cool to.

BUT, if this spam gets your ire up like it does mine, maybe your feury will produce something in less time than it took me to write this post ;)

Thanks, Ignorance

Replies are listed 'Best First'.
RE: Take a bite out of my SPAM please
by turnstep (Parson) on Aug 31, 2000 at 19:08 UTC
    Take a look at http://www.samsapade.org : they have some tools to decipher obfuscated URL's and, more importantly, some good links. You could also email Steve Blighty and see if he will help you write a perl version of what he has (I think it is in C).

Re: Take a bite out of my SPAM please
by merlyn (Sage) on Aug 31, 2000 at 19:27 UTC
    You could just parse it with HTML::Parser, which will downconvert it for you.

    Or, use HTML::Entities, it'd be something like:

    use HTML::Entities; $_ = 'http://www.dns&#16;&#2;&#5;&#5;&#5;&#16;&#2;&#16;&#5;&#16;&#16;& +#16;&#5;&#5;&#16;&#2;&#16;&#16;&#2;&#16;&#16;&#2;magicsite.net&amp;10 +85287724@1085287723&amp;1063256813@3493727992?75541114&amp;proxy=1063 +256794?1085287723@1063256813?www.su2537.tw|www.&#16;&#2;&#5;&#5;&#5;& +#16;&#16;&#16;&#5;&#5;&#16;&#2;&#16;&#16;&#2;&#16;&#16;&#2;.hk?/proxy +=1063256803@2204624165:2030@2204624133?/@1085307406:8080@1085305176@% +31%30%38%35%33%30%35%31%36%35'; print decode_entities $_;
    Hmm. They have illegal characters in their DNS name. How sick.

    -- Randal L. Schwartz, Perl hacker

RE: Take a bite out of my SPAM please
by araqnid (Beadle) on Aug 31, 2000 at 18:57 UTC
    hmm..
    s|&#(\d+);|pack("c",$1)|ge
    ought to work but that produces a bunch of control characters. so there's obviously a subtlety i've missed
      no, you haven't missed anything. the entities &#16;, &#2;, and &#5; all encode ASCII control characters. i don't think that this string can be turned into a navigable URL using only generic conversions. my guess is that the URL in the form is converted with JavaScript or something to allow to browser to use it as a FORM target.

      i did a couple more conversions and got a bit more intelligible stuff out. here are the other conversions i ran:

      $URL =~ s/&amp;/&/g; # numeric entities $URL =~ s/&#(\d+);/pack('c', $1)/ge; # hex escapes $URL =~ s/%(\d{2})/pack('c', $1)/ge; # 'decimal' IP $URL =~ s/(\d{5,})/join('.', unpack('C4', pack('N', $1)))/ge;
      the result was this (after running it through 'od -c'):
      0000 h t t p : / / w w w . d n s 020 002 0020 005 005 005 020 002 020 005 020 020 020 005 005 020 002 020 020 0040 002 020 020 002 m a g i c s i t e . n e 0060 t & 6 4 . 1 7 6 . 4 5 . 4 4 @ 6 0100 4 . 1 7 6 . 4 5 . 4 3 & 6 3 . 9 0120 6 . 2 . 2 3 7 @ 2 0 8 . 6 2 . 1 0140 4 . 2 4 8 ? 4 . 1 2 8 . 1 7 0 . 0160 1 2 2 & p r o x y = 6 3 . 9 6 . 0200 2 . 2 1 8 ? 6 4 . 1 7 6 . 4 5 . 0220 4 3 @ 6 3 . 9 6 . 2 . 2 3 7 ? w 0240 w w . s u 2 5 3 7 . t w | w w w 0260 . 020 002 005 005 005 020 020 020 005 005 020 002 020 020 002 0300 020 020 002 . h k ? / p r o x y = 6 3 0320 . 9 6 . 2 . 2 2 7 @ 1 3 1 . 1 0 0340 3 . 2 2 9 . 3 7 : 2 0 3 0 @ 1 3 0360 1 . 1 0 3 . 2 2 9 . 5 ? / @ 6 4 0400 . 1 7 6 . 1 2 2 . 1 4 : 8 0 8 0 0420 @ 6 4 . 1 7 6 . 1 1 3 . 8 8 @ 037 0440 036 & # ! 036 # 037 $ # \n 0452

      the main points of interest are the apparent IP addresses:

      • 64.176.45.44
      • 64.176.45.43
      • 63.96.2.237
      • 208.62.14.248
      • 4.128.170.122
      • 63.96.2.218
      • 131.103.229.37:2030
      • 131.103.229.5
      • 64.176.122.14:8080
      • 64.176.113.88

      after checking a few of these out with reverse DNS and whois.arin.net, i've come to the conclusion that it's all just random garbage. i don't think that the alleged IP addresses are owned by affiliated entities, and none of them seem to be porn sites, so i think it's just someone mucking about.

        I tried some slightly different approaches, but didn't come up with anything that looked much more usable:

        Output 1 (using araqnid's sub):
        http://www.dnsmagicsite.net&1085287724@1085287723 &1063256813@3493727992?75541114&proxy=1063256794? 1085287723@1063256813?www.su2537.tw|www..hk?/ proxy=1063256803@2204624165:2030@2204624133?/ @1085307406:8080@1085305176@%31%30%38%35%33%30%35%31%36%35

        Output 2 (using a character map):
        http://www.dns.hk?/proxy=1063256803@2204624165: 2030@2204624133?/@1085307406:8080 @1085305176@%31%30%38%35%33%30%35%31%36%35

        I've included the character map below FWIW.

        my %chars = ( 32 => '', 143 => '143', 33 => '!', 144 => '144', 34 => '"', 145 => '`', 35 => '#', 146 => "'", 36 => '$', 147 => '"', 37 => '%', 148 => '"', 38 => '&', 149 => '*', 39 => "'", 150 => '-', 40 => '(', 151 => '-', 41 => ')', 152 => '~', 42 => '*', 153 => '[tm]', 43 => '+', 154 => 's', 44 => ',', 155 => '>', 45 => '-', 156 => 'oe', 46 => '.', 157 => '&#157;', 47 => '/', 158 => '&#158;', 48 => '0', 159 => 'Y', 49 => '1', 160 => "'", 50 => '2', 161 => '¡', 51 => '3', 162 => '¢', 52 => '4', 163 => '£', 53 => '5', 164 => '¤', 54 => '6', 165 => '¥', 55 => '7', 166 => '¦', 56 => '8', 167 => '§', 57 => '9', 168 => '¨', 58 => ':', 169 => '©', 59 => ';', 170 => 'ª', 60 => '<', 171 => '«', 61 => '=', 172 => '¬', 62 => '>', 173 => '­', 63 => '?', 174 => '®', 64 => '@', 175 => '¯', 65 => 'A', 176 => '°', 66 => 'B', 177 => '±', 67 => 'C', 178 => '²', 68 => 'D', 179 => '³', 69 => 'E', 180 => '´', 70 => 'F', 181 => 'µ', 71 => 'G', 182 => '¶', 72 => 'H', 183 => '·', 73 => 'I', 184 => '¸', 74 => 'J', 185 => '¹', 75 => 'K', 186 => 'º', 76 => 'L', 187 => '»', 77 => 'M', 188 => '¼', 78 => 'N', 189 => '½', 79 => 'O', 190 => '¾', 80 => 'P', 191 => '¿', 81 => 'Q', 192 => 'À', 82 => 'R', 193 => 'Á', 83 => 'S', 194 => 'Â', 84 => 'T', 195 => 'Ã', 85 => 'U', 196 => 'Ä', 86 => 'V', 197 => 'Å', 87 => 'W', 198 => 'Æ', 88 => 'X', 199 => 'Ç', 89 => 'Y', 200 => 'È', 90 => 'Z', 201 => 'É', 91 => '[', 202 => 'Ê', 92 => "\\", 203 => 'Ë', 93 => ']', 204 => 'Ì', 94 => '^', 205 => 'Í', 95 => '_', 206 => 'Î', 96 => '`', 207 => 'Ï', 97 => 'a', 208 => 'Ð', 98 => 'b', 209 => 'Ñ', 99 => 'c', 210 => 'Ò', 100 => 'd', 211 => 'Ó', 101 => 'e', 212 => 'Ô', 102 => 'f', 213 => 'Õ', 103 => 'g', 214 => 'Ö', 104 => 'h', 215 => '×', 105 => 'i', 216 => 'Ø', 106 => 'j', 217 => 'Ù', 107 => 'k', 218 => 'Ú', 108 => 'l', 219 => 'Û', 109 => 'm', 220 => 'Ü', 110 => 'n', 221 => 'Ý', 111 => 'o', 222 => 'Þ', 112 => 'p', 223 => 'ß', 113 => 'q', 224 => 'à', 114 => 'r', 225 => 'á', 115 => 's', 226 => 'â', 116 => 't', 227 => 'ã', 117 => 'u', 228 => 'ä', 118 => 'v', 229 => 'å', 119 => 'w', 230 => 'æ', 120 => 'x', 231 => 'ç', 121 => 'y', 232 => 'è', 122 => 'z', 233 => 'é', 123 => '{', 234 => 'ê', 124 => '|', 235 => 'ë', 125 => '}', 236 => 'ì', 126 => '~', 237 => 'í', 127 => '?', 238 => 'î', 128 => '&#128;', 239 => 'ï', 129 => '&#129;', 240 => 'ð', 130 => ',', 241 => 'ñ', 131 => 'f', 242 => 'ò', 132 => ',,', 243 => 'ó', 133 => '...', 244 => 'ô', 134 => '?', 245 => 'õ', 135 => '?', 246 => 'ö', 136 => '^', 247 => '÷', 137 => '?', 248 => 'ø', 138 => 'S', 249 => 'ù', 139 => '<', 250 => 'ú', 140 => 'OE', 251 => 'û', 141 => '&#141;', 252 => 'ü', 142 => '&#142;', 253 => 'ý', 143 => '&#143;', 254 => 'þ', 'amp' => '&' ); my $string = 'http://www.dns&#16;#2;&#5;&#5;&#5;&#16;&#2;&#16;&#5;&#16 +;&#16;&#16;&#5;&#5;&#16;&#2;&#16;&#16;&#2;&#16;&#16;&#2;magicsite.net +&amp;1085287724@1085287723&amp;1063256813@3493727992?75541114&amp;pro +xy=1063256794?1085287723@1063256813?www.su2537.tw|www.&#16;&#2;&#5;&# +5;&#5;&#16;&#16;&#16;&#5;&#5;&#16;&#2;&#16;&#16;&#2;&#16;&#16;&#2;.hk +?/proxy=1063256803@2204624165:2030@2204624133?/@1085307406:8080@10853 +05176@%31%30%38%35%33%30%35%31%36%35'; $string =~ s|\&(\S+)\;|$chars{$1}|g; print $string . "\n"; exit 0;
RE: Take a bite out of my SPAM please
by Mushy (Scribe) on Aug 31, 2000 at 21:42 UTC
    May be slightly off topic

    For times when you can't do you own filtering/parsing (like hotmail) I have been using the free version of SpamCop with good results. Very good automatic header parsing, digging of URLs, talking to abuse.net to get abuse report address, automatic report generation etc. I almost feel like subscribing to their commercial service sometimes but I have managed to resist.

    Disclaimer: I am not affiliated with spamcop.net in any way.

    - Mushy (Just sharing info)

Re: Take a bite out of my SPAM please
by isotope (Deacon) on Sep 01, 2000 at 04:08 UTC
    Look at the part after the last @ symbol:
    %31%30%38%35%33%30%35%31%36%35

    Anything before the @ should be considered a Basic Authentication username, IIRC.
    URI::Escape should help with this, which gave me an integer, which needs to be converted to a proper IPV4 address. This is only partially tested:
    #!/usr/local/bin/perl -w use strict; use URI::Escape; my $URL = "%31%30%38%35%33%30%35%31%36%35"; print $URL . "\n"; my $int_ip = uri_unescape($URL); print $int_ip . "\n"; my @octets; my $register = $int_ip; while(int($register)) { unshift(@octets, $register % 256); $register /= 256; } for(@octets) { print; print "."; } print "\n";

    This is pretty rough, but it should do the job. I included the print statements so you can see exactly what it's doing. Feel free to customize (cannibalize) for your own not-so-evil purposes. Reverse DNS is up to you.

    Update: I just cleaned up the code a bit.

    HTH,

    --isotope
      This is the correct way, in accordance to those in the Lumber Cartel (TINLC)

      --
      Perl is intergalactic! WolfSkunks use it!

Re: Take a bite out of my SPAM please
by Ignorance (Monk) on Sep 01, 2000 at 00:17 UTC

    Wow,
    Thank you all for some excellent responses.
    The use of control characters was puzzling...

    Here is the accompanying javaScript. It is pure evil.

    <SCRIPT language=JavaScript> <!-- // Start of AdSubtract JavaScript block; you can ignore this. // It is used when AdSubtract blocks cookies or pop-up windows. document.iMokie = "cookie blocked by AdSubtract"; document.iMferrer = "referrer blocked by AdSubtract"; function iMwin() { this.location = ""; this.frames = new Array(9); this.frames[0] = this; this.frames[1] = this; this.frames[2] = this; this.frames[3] = this; this.frames[4] = this; this.frames[5] = this; this.frames[6] = this; this.frames[7] = this; this.frames[8] = this; this.length = 0; } // End of AdSubtract JavaScript block. --> </SCRIPT>

    I don't think it's related to the form action.

    I'm glad I can some here for solutions.
    When someone like my father gets this kind of spam in Netscape mail, it can get ugly real fast.

    A most tolerant Perl friend also tried a solution.
    You can try it at www.codegeek.org/qh/
    He included a link to the source code for those who are curious.
    Just a guess, but I think the /qh/ stands for "quick hack" :)

    Thanks again to eveyone.

RE: Take a bite out of my SPAM please
by OzzyOsbourne (Chaplain) on Sep 05, 2000 at 22:15 UTC
    One thing to note is that browsers disregard anything after an @ symbol.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://30531]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (4)
As of 2024-03-28 15:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found