http://qs321.pair.com?node_id=442694
Category: Web Stuff
Author/Contact Info Sven-Thorsten Fahrbach sven-thorsten.fahrbach@gmx.net
Description: I hope this program to look up Wikipedia entries from the command line will come in handy for some people. My main motivation to write this was my rather slow computer. So, I don't always have to start up my browser when I can look up a Wikipedia entry from within a shell. Also, if you display a long version of an article the script automatically archives it for you in a subfolder. I'm not a Perl Guru (I mention that in every post ;-)) but I hope the script is useful enough to be posted here. SveTho
#!/usr/bin/perl
# wikipedialookup - conveniently look up articles in the Wikipedia fro
+m
# the command line. Version 0.2

use strict;
use warnings;
use File::Copy;
use File::Spec;
use Getopt::Long;
use WWW::Wikipedia;
##################################################
use constant VERSION=> "0.2";
use constant DEBUGGING=> 1;

##################################################
# Subroutine prototypes
##################################################
sub display_long;
sub display_short;
sub display_help;
sub display_version;
sub make_conf;
sub get_conf;
sub chk_env;
sub clean_up;
##################################################

$SIG{INT} = \&clean_up;# clear tmp folder if SIGINT is received

# following are the language codes that are currently recognized by
# the Wikipedia. We need this list to display it on demand in the
# configuration dialog
my @lcodes = qw/ af ar bg bn ca cs cy da de el en eo es et fa fi fr fy
 he hi hu ia id it ja ko la ln ml ms nl no oc pl pt ro
 ru sk sl sq sr sv ta th uk ur wo yi zh /;

# options that need to be implemented:
# - quick lookup: take a single arg and display short article (default
+)
# - long info   : take keyword as arg look up article disply in pager 
+or editor
# - make config : make configuration file, implement dialog. mkconf if
+ 1. start
# - clean up    : clean up temporary and permanent folders. tmp on def
+ault
# - chck duplics: check for and delete duplicates in permanent folder
# - version     : display version info
# - help        : display help

# take care of the command line options
my %options;
GetOptions (\%options, "long", "make-config", "clean-up", "language:s"
+,
       "version", "help");

my $keyword = shift;

if ((keys %options == 0) && (! $keyword)) {
    die "Usage: $0 keyword [options]. Type $0 --help for more informat
+ion.\n";
}
if (($options{long}) && (! $keyword)) {
    die "Usage: $0 keyword [options]. Type $0 --help for more informat
+ion.\n";
}
if (($options{language}) && (! $keyword)) {
    die "Usage: $0 keyword [options]. Type $0 --help for more informat
+ion.\n";
}

# read in configuration
my %conf = get_conf ("wikipedialookup.conf");

# for portability's sake we use File::Spec to build the pathnames
# to the temporary and the permanent directory. As we store them
# in two variables here the code where these paths are actually used
# will look much less cluttered.
my $tmp_path = File::Spec->rel2abs ("tmp");
my $prm_path = File::Spec->rel2abs ("articles");
my $tmp_file = File::Spec->catfile ($tmp_path, "wikipedialookup$$");

##################################################
# MAIN
##################################################

# before we get started we need to perform some checks
if ($options{help}) {
    display_help();
    exit;
}
if ($options{version}) {
    display_version();
    exit;
}
if ($options{"make-config"}) {
    make_conf ("wikipedialookup.conf");
    exit;
}
if ($options{"clean-up"}) {
    clean_up ("all");
    exit;
}

chk_env();

# okay, ready to go...
my $wiki = WWW::Wikipedia->new (language => $options{language}
    || $conf{language});
$wiki->timeout ($conf{timeout});

# perform the search...
my $entry = $wiki->search ($keyword);

if ($options{long}) {
   if ($entry) {
my $fulltext = $entry->fulltext();
my @related  = $entry->related();
my @categories= $entry->categories();

      unless (display_long (\$fulltext, \@related, \@categories)) {
    if (@related) {
print "The keyword you provided is ambiguous: $keyword\n";
print "Please see the following articles:\n";
print "\t- $_\n"foreach @related;
    }
    else {
print "Nothing to show. This might be due to a bug. Sorry :-(\n";
    }
}
    }
    else {
print "No article found for $keyword.\n";
print "You might try to search for $keyword in another language.\n";
print "Try: $0 $keyword --language=fr e.g. to search the French\n";
print "version of Wikipedia.\n";
print "\nSorry :-(\n";
    }
}
else {
    if ($entry) {
my $text    = $entry->text();
my @related = $entry->related();
unless (display_short ($text)) {
    if (@related) {
print "The keyword you provided is ambiguous: $keyword\n";
print "Please see the following articles:\n";
print "\t- $_\n"foreach @related;
    }
    else {
print "Nothing to show. This might be due to a bug. Sorry :-(\n";
    }
}
    }
    else {
print "No article found for $keyword.\n";
print "You might try to search for $keyword in another language.\n";
print "Try: $0 $keyword --language=fr e.g. to search the French\n";
print "version of Wikipedia.\n";
print "\nSorry :-(\n";
    }
}

##################################################
# Subroutines
##################################################

##############################
sub display_long {
##############################
    my ($text, $related, $categories) = @_;
    return undef    unless $text;

    my $line = "=" x length "Entry for $keyword";

    open OUT, "> $tmp_file"    or die "Couldn't open $tmp_file: $!";
    
    print OUT $line;
    print OUT "Entry for $keyword\n";
    print OUT $line;
    print OUT "\n\n";
    print OUT $$text;
    print OUT "\n";
    print OUT "========================================\n";
    print OUT "See also the following related articles:\n";
    print OUT "========================================\n";
    print OUT "\n";
    print OUT "\t- $_\n"foreach @$related;
    print OUT "\n";
    print OUT "=================================================\n";
    print OUT "This article belongs to the following categories:\n";
    print OUT "=================================================\n";
    print OUT "\n";
    print OUT "\t- $_\n"foreach @$categories;

    close OUT;

    unless (! system $conf{editor}, $tmp_file) {
print "Couldn't exec $conf{editor}: $! Please re-run the configuration
+";
    }

    my $prm_file = File::Spec->catfile ($prm_path, "$keyword.txt");
    if (copy ($tmp_file, $prm_file)) {
print "You can find a copy of the article in $prm_file.\n";
    }
    else {
print "Couldn't copy $tmp_file to $prm_file", ": $!";
    }

    return 1;
}

##############################
sub display_short {
##############################
    my $text = shift;
    return undef    unless $text;

    if ($conf{"use_pager"} && $conf{"pager"}) {
unless (open OUT, "| $conf{pager}") {
    warn "Couldn't pipe to $conf{pager}", ": $!";
    warn "Will print to STDOUT\n";

    *OUT = *STDOUT;
}
    }
    else {
*OUT = *STDOUT;
    }

    print OUT "=" x length $keyword;
    print OUT "\n";
    print OUT "$keyword\n";
    print OUT "=" x length $keyword;
    print OUT "\n";
    print OUT $text;
    print OUT "\n";

    eval { close OUT; };

    return 1;
}

##############################
sub chk_env {
##############################
    mkdir "tmp",      0755    unless -e "tmp";
    mkdir "articles", 0755    unless -e "articles";

    make_conf ("wikipedialookup.conf")    unless -e "wikilookup.conf";
}

##############################
sub make_conf {
##############################
    my $file = shift;
    my %conf;

    open CONF, "> $file"or die "Couldn't open $file: $!";

    print "Please specify some information on how to run wikilookup.\n
+";
    print "\nDefault language (default is 'en', type 'all' to see\n";
    print "all available language codes).\n";
    print "\t[en]: ";
    chomp (my $choice = <STDIN>);

    if ($choice eq "all") {
        print "$_\t"        foreach @lcodes;
        print "\n[en]: ";
        chomp ($choice = <STDIN>);
    }   
    elsif ($choice eq "") {
        $choice = "en";
    }   
    $conf{language} = $choice;

    print "Please enter the name of the editor you want to use:\n";
    print "\t[vim]: ";
    chomp ($choice = <STDIN>);
    $choice = "vim"    unless $choice;
    $conf{editor} = $choice;

    print "Do you want to use a pager to display short articles?";
    print "\t[n]: ";
    chomp ($choice = <STDIN>);
    $conf{"use_pager"} = "1"    if $choice =~ /^y/i;
    if ($conf{"use_pager"}) {
print "Please enter the name of your pager:\n";
print "\t[less]: ";
chomp ($choice = <STDIN>); 
$choice = "less"    unless $choice;
$conf{pager} = $choice;
    }

    print "After how many seconds shall the request time out?\n";
    print "\t[12]: ";
    chomp ($choice = <STDIN>);
    unless (($choice =~ /^\d+$/) || ($choice eq "")) {
print "Invalid timeout: $choice. Will use 12.\n";
$choice = 12;
    }
    $conf{timeout} = $choice;
    print "Thanks, that's all for now.\n";

    print CONF "$_=$conf{$_}\n"    foreach sort keys %conf;

    close CONF;

    %conf;
}

##############################
sub get_conf {
##############################
    my $file = shift;
    my %conf;

    unless (open CONF, $file) {
%conf = make_conf ($file);
return %conf;
    }

    while (<CONF>) {
chomp;
my ($key, $value) = split /=/;
$conf{$key} = $value;
    }

    eval { close CONF; };

    %conf;
}

##############################
sub clean_up {
##############################
    my $all = shift;
    # we need a trailing slash (colon/backslash) for the glob later
    $tmp_path = File::Spec->catfile ($tmp_path, "");
    $prm_path = File::Spec->catfile ($prm_path, "");

    eval { unlink glob "$tmp_path*.tmp" };
    warn "Error while cleaning up: $@"if $@;
    if ($all) {
eval {
    unlink glob "$tmp_path*";
    unlink glob "$prm_path*";
};
warn "Error while cleaning up: $@"if $@;
    }
}

##############################
sub display_help {
##############################
    print <<EOF;
wikipedialookup - a small command line tool to quickly look up entries
+ in the
     Wikipedia.

Usage: $0 keyword [ --long | --make-config | --clean-up | 
    --language | --version | --help ]

    --long    Display long article, as opposed to the cut version
displayed on default.
--language    Specify language of the query in one of the
available Wikipedia language codes.
    -m    --make-configSet up a new configuration file. You will be as
+ked
to set up a new configuration file the first time
you start the program, also.
    -c    --clean-up    Delete all files from both the tmp and the per
+manent
folder. wikipedialookup automatically cleans up the
tmp folder on exit.
    -v    --version    Display version information.
    -h    --help    Show this help screen.

Please send bug reports to: sven-thorsten.fahrbach\@gmx.net
EOF
    
}

##############################
sub display_version {
##############################
    print "wikipedialookup version ", VERSION, "\n";
    print "\nThis program may be used and redistributed under the term
+s\n";
    print "of the Free BSD license.\n";
    print "\nPlease send bug reports to: sven-thorsten.fahrbach\@gmx.n
+et\n";
}
Replies are listed 'Best First'.
Re: wikilookup.pl
by b10m (Vicar) on Mar 27, 2005 at 20:47 UTC

    Handy! Although there are a couple of things I like to note:

    • undefunless should prolly be undef unless ;)
    • I think it'd be nice when HTML is converted. I don't care about <u>word</u>, I rather see _word_ etc. (same with chars like &#1591...)
    • During config writing, typing 'all' to see "all available language codes" didn't work.
    --
    b10m

    All code is usually tested, but rarely trusted.
      Hi and thanks for your comment.
      On second glance I realized that the whole while loop for the language code was crappy and I left it out altogether. It should work now (though not as nicely as I intended).
      Oddly enough the undefunless typo only appeared when I posted the code.
      As for the HTML thing, is there a module that can easily take care of that?
      Thanks for your reply.
Re: wikilookup.pl
by jdporter (Paladin) on Mar 28, 2005 at 21:02 UTC
    Nice. But could you call it something other than "wikilookup"? That name rather implies that it can lookup stuff in any wiki, generically. Probably "wikipedialookup" would be better.
      Yup, you're right, the name is misleading, but, after all, it's five characters less to type ;-). Nevertheless, I reviewed the code and did change the name (I hope I took care of every occurrence).
      I also saw that when I posted my code here, many of the tabs got stripped (that was also the problem with the 'undefunless' typo above). I took care of that, too.
      I also fixed a bug that gave you an error message when you just hit return on the timeout question in the configuration.
      Thanks for the feedback.