I've written some code that saves me time. I hope that, in the long run, it will save me more time than it took me to write it. But as I'm still new to Perl, I would appreciate any advice on where I might have gone wrong. There are a couple of areas where I'm sure things could be done more elegantly, and I have documented these.
In fact, I think I have documented everything pretty thoroughly. So I would be just as grateful for comments on the documentation as on the code or style. Feel free to
- I'm trying to learn. I have tried to change all the http references to PM nodes to the appropriate id format, but I may have missed some, and I want to keep them in http format at my end. In fact, the id format references don't seem to work very well in a code block. Nonetheless, I'd prefer to be too conservative than anything else.
The copyright clause is one I have used for other stuff. I resent people making money from my work while I'm on the dole! Obviously, I can be /msged here.
use strict;
use warnings;
use diagnostics;
=head1 Synopsis
Does the donkey work for advising banks of phishing spams. Intended fo
+r
use by people who can read - at least at a basic level - email headers
+.
LARTs are explained here: http://catb.org/jargon/html/L/LART.html
=head1 Version
Version 1.0 - first release version
=head1 Description
=head2 Inputs
=head3 Handoffs
The user may enter a number for the handoffs. This will be included (i
+f
> 0) at the top of the output, advising that ''the first (n) Received:
headerZ<>(s) (is anŠare) internal handoffZ<>(s) at my ISP''. If this i
+tem is
blank or zero, no such information will be produced.
=head3 Insertion point
The insertion point is copied in from the email. I use Thunderbird,
in which the most reliable version always appears as a dotted quad.
To the best of my knowledge and belief, this is not a function of
Thunderbird, but of standard email practices in terms of the RFCs.
It is the dotted quad format that should be entered. It is up to the
user to determine the insertion point. It is always possible for an
ISP to be assigned new IP addresses and to use these in handoffs,
but the user is expected to be able to identify these - at least
after parsing them - and to tell the bank what they are in the LART.
If multiple addresses are parsed, that is up to the user. The user
is able to save lines from multiple insertion point parses, but this
is probably a bad idea.
A line will be produced in the output that will quote the IP address
being parsed as the insertion point. It is up to the user to decide
whether to include this in the LART, but it is something I always do.
=head3 Landing point
The landing point will normally be in an alphaZ<>(numeric) form. There
is no need to look for a dotted quad, as there is for the insertion
point. The parse of the landing point will normally do two things.
First, it will identify the IP address of the landing point, putting
this in the output. Then it will do a whois on that IP address, and
put the resulting information in the output as well.
A spam may have more than one landing point, or it may take several
parses to get the information needed. On the assumption that there
will be more than one more often than it will be necessary to do
multiple parses of a single landing point, the landing points are
numbered. The first will appear simply as ''Landing point:''.
Subsequent ones will appear as ''Landing point 2:'' or whatever
the number is. If you are doing multiple parses of a single landing
point, you will probably want to correct this manually in the final
LART.
A blank line will automatically be inserted in the LART to separate
each landing point. If there are multiple parses of a single
landing point, these blank lines may also have to be removed.
=head3 Server
The whois server defaults to whois.completewhois.com. This can be
changed by moving a different server to the top of the list in the
__DATA__ section. The user can select a different server at any time
- there is no need to use the same server for all queries, even within
a single LART. The server timeout is set to 10 seconds, but seems to
take more like 20. This may be due to processing time in Perl, but
parses that work typically take under 2 seconds, so there would have
to be a lot of time spent in error handling routines.
The following problems exist with various servers:
completewhois: Reportedly down a lot of the time. LACNIC sometimes
blocks their queries, so it may be necessary to use LACNIC directly.
geektools: Limits you to 50 queries a day (unless you make special
arrangements with them). Also reports YOUR IP address in the output,
which can confuse you.
domaintools: Seems to be dead at the time of writing.
networksolutions: Output contains a lot of advertising.
abuse: Intended to give Postmaster@ addresses. Useful if you want to
LART the hosts of the injection point or landing point. However, these
are usually chosen by spammers for their unresponsiveness. The bank is
more likely to get action. YMMV.
All others currently listed: Cover only their own subsets of whois.
Users can enter their own whois server only by adding them in the
__DATA__ block. If this is invalid, the error should be trapped.
=head3 Banks
A list of known banks is on the system, with the phish reporting
address of each and part of the SpamGourmet address I give them
to reply to. Citibank reject my emails because they look like spam!
Of course they bleeping do. They contain the phishing spam I am
trying to help them deal with. ''Hey, we aren't getting any more
phishes reported! We must be doing something right!''.
Not every organisation on this list is a bank in the strict sense.
PayPal isn't. However, most are banks, and if they can be phished,
I think it makes sense to advise them.
Two ''entry'' boxes appear to the right of the list box containing
the list of banks. These contain the information for the selected
bank. Clicking on either of these entry boxes copies the contents
to the clipboard, replacing anything already there.
=head4 ''From'' information
This is probably useless to most people. I never give out my real
email address. Instead I give out SpamGourmet addresses. This means
that anything coming to my real address without SpamGourmet
headers is 100% spam. I can also prevent spammy suppliers getting
their rubbish through to me by setting that individual address
to allow nothing more through. By holding the data here, I avoid
the danger of giving two addresses to the same organisation. This
should be no problem even if it does happen, but it's tidier if
it's kept under control.
=head4 ''To'' information
This is more likely to be of use to the average user. Every bank
seems to have a different format of address to report phish
attempts. It helps to keep a record, rather than going through
past emails, especially if the user does not get frequent phishes
for a bank.
=head3 Storing marked lines
This button can be hit only when there are some lines to mark.
However, it is not necessary to have marked any lines. This command
will copy any lines that have been ticked into the final LART,
but will also clear the lines. So if a parse has returned useless
information, the output can be cleared and a new attempt made by
hitting this button with no lines marked. However, this isn't
necessary, as any new parse will start by clearing out old data.
Once lines have been stored, they cannot be removed except by
starting a new LART. Hitting the ''Prepare LART'' button will do
this - the output to the clipboard can be ignored.
=head3 Preparing the LART
Clicking on the ''Prepare LART'' button will copy the stored lines
of the various parses into the Windows clipboard, and will clear
all information ready for a new LART.
=head2 Outputs
All outputs are to the Losedows clipboard. They are produced, as
described under the relevant sections above, by:
=over 4
=item *
Clicking on the ''From'' entry box
=item *
Clicking on the ''To'' entry box
=item *
Clicking on the ''Prepare LART'' button
=back
Any of these actions replaces whatever is already in the clipboard.
=head1 Author
Dr. John Davies
51 Elephant Lane
London SE16 4JD
=head1 To Do
I want to improve on the mindbogglingly ugly kludge that
removes the scrollbars in clearframe. The only thing to be said
in its favour is that it works.
My instincts tell me there's got to be a better way of doing it than
$hoaBank{$sFields[0]} = [ $sFields[1], $sFields[2] ];
When I know how to, I plan to build a series of tests for the programm
+e.
Validate the contents of the __DATA__ block, so that invalid data does
+n't end
up in the list boxes. This may not be possible. Different whois server
+s may
accept different parameters, and the current system means that users c
+an
put whois parameters in with the server name. A server can quite reaso
+nably
appear twice in the list, with different parameters built in.
One fine day, it might be nice to maintain external files of whois ser
+vers
and bank information via the programme, instead of holding them within
+ the
programme as __DATA__.
It might also be nice to put in a command line option to ignore the ''
+From''
data, as this is something that only a few people like me are likely t
+o want.
=head1 Bugs
I use OpenIDE. It seems to me that it doesn't always know when to (and
+ not
to) colour code purple. #== appears at the end of several lines, and s
+tops
the purple colour.
=head1 Copyright
All aspects of this programme are my copyright. I don't care what peop
+le
do with it for fun, but if anyone is making money from it or any
development they may have made from it, I want my cut. This includes,
but is not restricted to, putting it on a disc for which ''only a
distribution charge'' (or similar weasel words) is charged, or which i
+s
free if you buy something else, like a magazine. If you put it on a we
+b
site for free download, that's fine, but if you charge for access, I
want my cut, even if you don't charge specifically for this. If in
doubt, contact me first. Don't assume that I will accept your payment
scales. Don't assume that you can email me - I have aggressive spam
blocking in place. If contacting me isn't worth a stamp to you, it's
not worth bandwidth and disc space to me. If you don't like any of thi
+s,
don't use it.
=head1 Code
=cut
use Tk;
use Tk::Pane;
use Tk::JComboBox;
use IO::Socket;
use Net::Nslookup;
use Win32::Clipboard;
=head2 Naming Conventions
Hungarian notation has been used. Out of laziness, a single character
has been used most of the time. However, to avoid possible ambiguity,
three characters have been used in some places.
There is one exception. The variable ''$sock'' has been copied from
another source, documented in the code comments. I'm too lazy to work
out what it contains, so I haven't used Hungarian notation for it.
I know that the use of capital letters in variable names is deprecated
in ''Perl Best Practices'', although I haven't read the book, but I
find them easier to read.
=head2 Global Variables
Global variables are used for:
=over 4
=item *
Data that are needed at all stages of processing
=item *
Tk widgets
=item *
Variables bound to Tk widgets
=back
=cut
my @sLart; #Assembly for the elements of the email
my @aoaCheck; #Array of checkbox data
#[][0] is the checkbox itself
#[][1] is the 'ticked' variable
#[][2] is the text
my $nLandings = 0; #Number of landing points analysed
my $sServer; #Whois server
my %hoaBank; #Hash of Arrays of banks
my $sBank; #Selected bank from combobox
my $sTo; #Email address of $sBank
my $sFrom; #Spamgourmet prefix for $sBank
my $wMain = MainWindow->new;
$wMain->geometry('890x500');
$wMain->resizable(0,0);
my $fraChk = $wMain->Scrolled('Pane',
-scrollbars => 'oe',
-height => 440,
-width => 890,
-sticky => 'w')
->place(-x => 0,
-y => 60);#==
my $lHandoffs = $wMain->Label(-text => 'Internal handoff(s)')
->place(-x => 0,
-y => 0); #==
my $eHandoffs = $wMain->Entry(-width => 3)
->place(-x => 100,
-y => 0); #==
my $lInsertion = $wMain->Label(-text => 'Insertion point')
->place(-x => 130,
-y => 0); #==
my $eInsertion = $wMain->Entry(-width => 16)
->place(-x => 220,
-y => 0); #==
my $butInsertion = $wMain->Button(
-text => 'Parse',
-command => sub{insertion($eInsertion->g
+et())})
->place( -x => 330,
-y => 0); #==
my $lLanding = $wMain->Label(-text => 'Landing point')
->place(-x => 380,
-y => 0); #==
my $eLanding = $wMain->Entry(-width => 30)
->place(-x => 460,
-y => 0); #==
my $butLanding = $wMain->Button(-text => 'Parse',
-command => sub{landing($eLanding->get()
+)})
->place(-x => 655,
-y => 0); #==
my $butStore = $wMain->Button(-text => 'Store marked lines',
-state => 'disabled',
-command => sub{store()})
->place(-x => 703,
-y => 0); #==
my $butLart = $wMain->Button(-text => 'Prepare LART',
-state => 'disabled',
-command => sub{lart()})
->place(-x => 810,
-y => 0); #==
my $cboServer = $wMain->JComboBox(
-textvariable => \$sServer
)
->place(-x => 0,
-y => 30); #==
my $lBank = $wMain->Label(-text => 'Bank')
->place(-x => 300,
-y => 30); #==
my $cboBank = $wMain->JComboBox(
-textvariable => \$sBank,
-selectcommand =>
sub {$sFrom = $hoaBank{$sBank}[0];
$sTo = $hoaBank{$sBank}[1];
}
)
->place(-x => 330,
-y => 30); #==
my $eTo = $wMain->Entry(-width => 30,
-textvariable => \$sTo)
->place(-x => 460,
-y => 30); #==
$eTo ->bind('<Button-1>',
sub {Win32::Clipboard::Set($sTo);
}
);
my $eFrom = $wMain->Entry(-width => 30,
-textvariable => \$sFrom)
->place(-x => 670,
-y => 30); #==
$eFrom ->bind('<Button-1>',
sub {Win32::Clipboard::Set($sFrom);
}
);
=head2 Bare block before MainLoop;
This is written as a bare block to ensure that the variable created
inside it does not become a global variable. This block reads the
__DATA__ block and puts the contents into the banks and servers
listboxes, depending on the format. Blank lines are ignored. Lines
with an ''@'' symbol are assumed to be bank email addresses, of the fo
+rm
name (white space) email@domain (white space) from
Deviating from this format will produce invalid information in the
listbox or ''entry'' boxes for the banks.
Non-blank lines without an ''@'' symbol are put into the list of
servers. Including white space on such lines is likely to cause
parses to fail or produce invalid data if that data line is selected
as the server, unless they are valid server options.
The first line inserted into the list of servers is also used as the
default server.
=cut
{
my $nDefault = 1;
while (<DATA>) {
chomp;
if (/@/) {
my @sFields = split /\s+/, $_;
$hoaBank{$sFields[0]} = [ $sFields[1], $sFields[2] ];
$cboBank->addItem($sFields[0]);
} elsif (length) {
$cboServer->addItem($_, -selected => $nDefault);
$nDefault = 0;
}
}
}
MainLoop;
=head2 Subroutine ''output''
Although ''place'' is used everywhere else for geometry management, it
has been necessary to use ''grid'' here. This is because ''place'' doe
+s
not coexist happily with scrollbars, as discussed here:
[id://id=651906]. According to ''Mastering Perl/Tk''
by Steve Lidie and Nancy Walsh, First Edition, published January 2002,
+ it
is permissible to use different geometry managers in different frames.
=head3 Inputs
Array of data usually produced by a whois or nslookup query, but somet
+imes
as a failure message.
=head3 Outputs
=over
=item *
To screen - the input array with a checkbox at the start of each line
=item *
@aoaCheck - contains checkboxes (the Tk widget) in element 0, the vari
+able
to indicate if the box is ticked in element 1, and the text in element
+ 2
=item *
The ''store marked lined'' button is enabled
=back
=head3 Called by
=over
=item *
landing
=item *
insertion
=back
=head3 Calls made
=over
=item *
clearframe
=back
=cut
sub output {
clearframe();
for (@_) {
chomp;
push @aoaCheck, [$fraChk->Checkbutton(
-text => $_,
-font => 'courier 10',
-justify => 'left'
)
->grid(-column => 0,
-row => $#aoaCheck + 1,
-sticky => 'w'
),
undef,
$_
];
#The variable can't be set in the 'push' statement. Until it has been
#comleted, the value of $#aoaCheck is not incremented. Either you have
#to increment it by one, meaning that the variable is misaligned, or
#you get the first one being assigned to index -1, which the compiler
#dislikes, even though it seems to get corrected later!
$aoaCheck[$#aoaCheck][0]->configure(
-variable => \$aoaCheck[$#aoaCheck][1]);
}
}
=head2 Subroutine ''landing''
=head3 Inputs
A single datum, which may be in the form of a dotted quad or of an
alphaZ<>(numeric) URL. This is entered in the landing entry box, and
passed as a parameter by the Tk call.
=head3 Outputs
An array of text.
If not a dotted quad, then the resolution of the web site to a dotted
+quad
is produced. The dotted quad (whether entered or resolved) is resolved
+ and
this resolution is included in the output array.
=head3 Called by
Tk. Bound to the "Parse" button next to the landing entry box.
=head3 Calls made
=over
=item *
output
=back
=cut
sub landing {
my @sOutput;
my $sLanding = shift(@_);
$nLandings++;
my $sHeader = "Landing point";
if ($nLandings > 1) {
$sHeader = "$sHeader $nLandings"
}
$sHeader = "$sHeader" . ':' . $sLanding;
@sOutput= $sHeader;
#[id://221512]
#warns against using a regex to parse IP addresses.
#However, Thunderbird always provides addresses
#as dotted quads, nslookup seems to do the same,
#and since the node offers no suggested
#code for doing better, this will just have to do.
if ($sLanding =~ /^(\d+\.){3}\d+$/) {
push (@sOutput, whois ($sLanding));
} else {
if (defined($sLanding)) {
my $sResolved = nslookup $sLanding;
unless ($sResolved eq "") {push(@sOutput, $sLanding . " resolves
+ to " . $sResolved)};
push (@sOutput, whois($sLanding));
unless ($sResolved eq "") {push (@sOutput , whois ($sResolved))}
} else {
push (@sOutput, "Undefined landing point");
}
}
$butStore->configure(-state => 'normal');
output @sOutput;
}
=head2 Subroutine ''insertion''
=head3 Inputs
A single datum, which should be in the form of a dotted quad. This is
entered in the insertion entry box, and passed as a parameter by the T
+k call.
=head3 Outputs
An array of text
=head3 Called by
Tk. Bound to the "Parse" button next to the insertion entry box.
=head3 Calls made
=over
=item *
output
=back
=cut
sub insertion {
my @sOutput;
my $insertion = shift(@_);
#The insertion point must be a dotted quad. If it's not, it's not
#a reliable insertion point per my version of Thunderbird
if ($insertion =~ /^(\d+\.){3}\d+$/) {
@sOutput = ("Insertion point: $insertion", whois ($insertion));
} else {
$sOutput[0] = "No insertion point identified";
}
$butStore->configure(-state => 'normal');
output @sOutput;
}
=head2 Subroutine ''store''
=head3 Inputs
No parameters passed.
Uses @aoaCheck
=head3 Outputs
=over
=item *
@sLart (may already contain data)
=item *
''Prepare LART'' button enabled if any lines actually stored
=back
=head3 Called by
Tk. Bound to ''Store marked lines'' button.
=head3 Calls made
=over
=item *
clearframe
=back
=cut
sub store{
$butStore->configure(-state => 'disabled');
for (0 .. $#aoaCheck) {
if ($aoaCheck[$_][1]) {
push (@sLart, $aoaCheck[$_][2]);
$butLart ->configure(-state => 'normal');
}
}
push (@sLart,"");
clearframe();
}
=head2 Subroutine ''whois''
Plagiarised from [id://=23082]
=head3 Inputs
A single datum, which may be a dotted quad or an alphaZ<>(numeric) URL
+.
The selected whois server is used to parse the data passed.
=head3 Outputs
An array of text, containing the whois information on the input
=head3 Called by
=over
=item *
landing (3 places)
=item *
insertion
=back
=head3 Calls made
None
=cut
sub whois {
use vars qw/$sock $sPort @sData/;
$sPort = "43";
my $sTarget = $_[0];
die "Whois Usage: whois(<domain>)" unless(defined($sTarget));
$sock = IO::Socket::INET->new(PeerAddr => $sServer,
PeerPort => $sPort,
Proto => 'tcp',
Timeout => 10);
eval {
print $sock $sTarget,"\r\n"
};
if ($@) {
@sData =("Error while parsing. Perhaps the whois server is down?",
+$@);
$butStore->configure(-state => 'normal');
} else {
@sData = <$sock>;
}
undef($sock);
@sData;
}
=head2 Subroutine ''lart''
=head3 Inputs
=over
=item *
No parameters passed
=item *
@sLart used as input
=item *
$nHandoffs used as input
=back
=head3 Outputs
=over
=item *
LART put into Losedows clipboard
=item *
$eHandoffs (handoffs entry box) cleared
=item *
$eLanding (landing point entry box) cleared
=item *
$eInsertion (insertion point entry box) cleared
=item *
@sLart cleared
=item *
@aoaCheck cleared
=item *
$nLandings set to zero
=item *
''Prepare LART'' button disabled
=back
=head3 Called by
Tk. Bound to ''Prepare LART'' button.
=head3 Calls made
None
=cut
sub lart{
my $nHandoffs = $eHandoffs->get();
my $sClip = "";
if ($nHandoffs == 1) {
$sClip = "The first Received: header is an internal handoff at my
+ISP\r\n"
} elsif ($nHandoffs > 1) {
$sClip = "The first $nHandoffs Received: headers are internal hand
+offs at my ISP\r\n"
}
for (@sLart) {
$sClip = "$sClip\r\n$_";
}
Win32::Clipboard::Set($sClip);
$eHandoffs->delete(0, 'end');
$eLanding->delete(0, 'end');
$eInsertion->delete(0, 'end');
undef @sLart;
undef @aoaCheck;
$nLandings = 0;
$butLart ->configure(-state => 'disabled');
}
=head2 Subroutine ''clearframe''
=head3 Inputs
None
=head3 Outputs
=over
=item *
@aoaCheck - cleared & undefined
=back
=head3 Called by
=over
=item *
output
=item *
store
=back
=head3 Calls made
None
=cut
sub clearframe {
for (0 .. $#aoaCheck) {
$aoaCheck[$_][0]->gridForget( );
}
undef @aoaCheck;
#The following kludge succeeds in getting rid of the scrollbars afte
+r
#clearing the frame. I can't find any other way of doing it.
my $chkDummy = $fraChk->Checkbutton(
-text => '')
->grid(-column => 0,
-row => 0);
$wMain->update;
$chkDummy->gridForget( );
$wMain->update;
}
close DATA;
exit;
=head1 Data
=head2 Blank lines
Ignored
=head2 Email address lines
Defined as being any line containing the ''@'' sign
Expected format:
Bank (white space) Email address (white space) From information
=head2 Server lines
Defined as being any non-blank line not containing the ''@'' sign
The entire line is treated as being the server. It is possible to
include multiple copies of the same server with different options.
=head2 Other lines
Assumed to be server lines.
=cut
__DATA__
whois.completewhois.com
whois.geektools.com
whois.domaintools.com
whois.networksolutions.com
whois.abuse.net
whois.internic.net
whois.arin.net
whois.ripe.net
whois.apnic.net
whois.lacnic.net
whois.apnic.net
whois.afrinic.net
whois.denic.de
Citibank NotAccepted@braindead CitibankPhish
CitizensBank fraud_prevention@citizensbank.com CitizensBankPhish
HSBC phishing@hsbc.com HSBCPhish
Nationwide phishing@nationwide.co.uk NationwidePhish
NatWest phishing@natwest.com NatWestPhish
PayPal spoof@paypal.com PayPalPhish
RBS phishing@rbs.co.uk RBSPhish