Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

simple filtering proxy

by Blop (Monk)
on Jul 10, 2001 at 12:44 UTC ( [id://95255]=CUFP: print w/replies, xml ) Need Help??

Last time, in annoying phrases one-liner censor I wrote a ridiculous command line filter, that changed expressions on webpages, on the fly. It only worked with lynx.

andreychek suggested me to make it work with Mozilla. So here it is, a little bit longer than one line. This is just a simple proxy that forwards requests to hosts (no caching, no nothing), and transmits them back to the browser via a little filter.
Note that all occurrences of the searched items will be replaced, including those inside the URIs. So it could mess up the links. But that will only be more fun with the default replacements (see the translation table) :)

If you want to have an idea of a real proxy in Perl, I suggest wsproxy.

I'm looking forward to hearing your remarks.

Blop
#!/usr/local/bin/perl -Tw # -*- Mode: cperl -*- # small web proxy - Ronan Le Hy # usage: prox [port] # (default port is 8888) # works fine with Netscape 4.75, ok with Konqueror 1.9.8 (some problem +s though) use strict; use Socket; use IO::Handle; sub extract; sub filter; $SIG{KILL} = sub {close PROX; close NAV; close HOST; exit;}; # we don't want to exit if a connection is cut $SIG{PIPE} = sub {}; $SIG{CHLD} = sub {wait;}; my $port_proxy = $ARGV[0] || 8888; # if you surf perlmonks with those defaults on, that's gonna be fun :) my %translation_table = ( 'azatoth' => 'OeufMayo', 'merlyn' => 'Erudil', 'BooK' => 'azatoth', 'mirod' => 'merlyn', 'Dominus' => 'BooK', 'tilly' => 'vroom', 'tye' => 'Dominus', 'virtualsue' => 'tye', 'crazyinsomniac' => 'tilly', 'OeufMayo' => 'virtualsue', 'Erudil' => 'mirod', 'vroom' => 'crazyinsomniac', 'NodeReaper' => 'root', 'root' => 'NodeReaper' ); # untaint that harmless $port_proxy if ($port_proxy =~ /(\d*)/) { $port_proxy = $1; } else { die "I'd really like to see that happen!\n"; } # open server a server socket so we can wait for connections my $tcp = getprotobyname('tcp'); socket(PROX, PF_INET, SOCK_STREAM, $tcp) || die "socket: $!"; bind(PROX, sockaddr_in($port_proxy, INADDR_ANY)) || die "bind: $!"; listen(PROX,SOMAXCONN) || die "listen: $!"; while (accept(NAV, PROX)) { # we fork to manage that connection unless (fork()) { # the first line of the request (the one with GET) my $get = ''; $get = <NAV> while $get =~ /^\s*$/; # we split that first line into interesting elements my ($method, $protocol, $host, undef, $port_host, $path, $rest) = +extract($get); # reading the rest of the request (the header) my $bazar = ''; my $line_header = ''; $bazar .= $line_header while (($line_header = <NAV>) !~ /^\s+$/); # connecting to the web site my $host_net = inet_aton($host); my $sin = sockaddr_in($port_host, $host_net); socket(HOST, PF_INET, SOCK_STREAM, $tcp) or die "cannot open that +damn socket\n"; connect(HOST, $sin) or die "no way to connect to that stupid host\ +n"; # let's pipe hot! that can do only good HOST->autoflush(1); PROX->autoflush(1); NAV->autoflush(1); # making request to web site print HOST "$method $path $rest"; print HOST $bazar, "\015\012"; # that's CR LF # read lines from the web site, and transfer them to the browser, +via a little filter my $ligne = ''; while (NAV->opened && HOST->opened && ($ligne = <HOST>)) { # the filter $ligne = filter($ligne); # the string is transmitted back to the browser print NAV $ligne; } # close the sockets and exit the thread close HOST; close NAV; exit; } } sub extract { # ok, this one was fun to write # the regexp splits the given string into ($method, $proto, $host, + undef, $port, $path, $rest) &{sub {$_[4] ||= 80; return @_;}}($_[0] =~ m#^(\w+)\s+(\w+)://([^/ +:]+)(:?(\d*))(/\S*)(.*)$#); } # with the default translation table, # 'azatoth OeufMayo virtualsue' should give: # 'OeufMayo virtualsue tye' sub filter { my $line = shift; my $return = $line; # for each entry in the table foreach my $source (keys %translation_table) { # in $return, we replace the occurrences of $source with $tran +slation_table{$source} my $len = length $source; while ($line =~ /$source/gi) { substr($return, pos($line) - $len, $len) = $translation_ta +ble{$source}; } } return $return; }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: CUFP [id://95255]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (2)
As of 2024-04-25 03:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found