I've been looking on CPAN, googling, etc. and I can't find anything on this topic, so I'm hoping someone here can share a flash of brilliance.
I have a list of about 100 companies, and I need to find the short registered names from DNS for each of them.
For example, I have "Adobe Systems, Inc." - since they own "adobe.com" the result is "adobe". "American Power Conversion" should result in "apc".
This needs to be done just once, so I'm tempted to just put a person on it, but if there is a way to automate it, I'd rather do that.
Any insights are appreciated! Doug
Update: Got something working.
It needs some work, but this gets me dang close.
Given a file with the NVD known vendor strings, and a file with the NSRL manufacturer text strings, this gets 6 URLs from Yahoo, strips them down and gives suggested mappings in square brackets, known NVD vendor strings in curly braces and other context, tab separated.
Output looks like this:
BeLight Software [belight] [belightsoft] en.wikipedia
BeLight Software, Ltd. [belightsoft] go.cadwire macs.abou
+t
Bea Systems, Inc. [beasys] {bea} {oracle}
Beermat Software Ltd. [beermatsoftware] encarta.msn
Belkin Corp [belkin] bizjournals cnet updates.zdnet
Bell Atlantic Internet Solutions Inc. [bellatlantic] prnewsw
+ire verizon yale.edu
Berkley Systems berkeley best.me.berkeley.edu bt-systems
+ bvsystems en.wikipedia gis.co.berkeley.sc.us
Bethesda SoftWorks bethsoft elderscrolls support.bethsoft
Big Fish Games [bigfishgames] atlantis.bigfishgames bigfi
+sh.es
Big Fish Games, Inc. [bigfishgames] bigfish.es bigfishgam
+es.es otg.bigfishgames
BioWare [bioware] blog.bioware nwn.bioware store.biowa
+re
The code:
#!/opt/local/bin/perl -w
use strict;
use Yahoo::Search;
use vars qw( %nvdVendor $textName %foundName @Results );
open(VIN,"NVD-vendors.txt") or die "$0 : cant open support file NVD-ve
+ndors.txt\n";
while(<VIN>) {
chomp;
$nvdVendor{$_} = 1;
}
close(VIN);
open(NIN,"NSRL-manufacturers.txt") or die "$0 : cant open support file
+ NSRL-manufacturers.txt\n";
while(<NIN>) {
chomp;
$textName = $_;
(defined $textName) or next;
@Results = Yahoo::Search->Results(Doc => "$textName", AppId => "Ya
+hooDemo", Count => 6, Mode => 'all');
warn $@ if $@; # report any errors
for my $Result (@Results)
{ addFullName($Result->Url); }
print "$textName\t\t";
my %guesses;
for my $k (keys %foundName) {
if (defined $nvdVendor{$k}) {
$guesses{"\{$k\}"} = 1;
}
else {
if (closeEnuff($textName, $k)) {
$guesses{"[$k]"} = 1;
}
else {
$guesses{$k} = 1;
}
}
delete $foundName{$k};
}
print join("\t", (sort keys %guesses));
print "\n";
sleep(3);
} # NIN
exit;
sub addFullName ()
{
my $url = shift;
$url = lc($url);
($url =~ /^http:/ ) or return(0);
$url =~ s/http:\/\/// ;
(my $n, my $p) = split(/\//,$url,2); # get the server name
$n =~ s/^www\.// ; # strip common pre/postfixes
$n =~ s/\.com$// ;
$n =~ s/\.net$// ;
$n =~ s/\.org$// ;
$n =~ s/\.co\.uk$// ;
$foundName{$n} += 1;
return(1);
}
sub closeEnuff()
{
my $t = shift;
my $y = shift;
if ($t =~ / $y /i ) { return(1); } # does the candidate match a wo
+rd in the text name?
$t =~ s/ //g ;
if ($t =~ /$y/i ) { return(1); } # does the candidate match the te
+xt name with spaces removed?
# should do a check after removing special chars
return(0);
}
__END__