If
you have a question on how to do something in Perl, or
you need a Perl solution to an actual real-life problem, or
you're unsure why something you've tried just isn't working...
then this section is the place to ask.
However, you might consider asking in the chatterbox first (if you're a
registered user). The response time tends to be quicker, and if it turns
out that the problem/solutions are too much for the cb to handle, the
kind monks will be sure to direct you here.
Can a lady or gentleman please help me find the correct resource. I am looking for information that I can use in an app I'm building.
It will be run on a LAN behind a firewall. It would be far more convenient if the user could be logged in automatically if they are already logged in to the network it is running on. So, I'm seeking information on accessing a network users's userid and login status. I'm not looking for specific code, just guidance to the proper keywords or terms that I need to research to achieve this end goal, if it is even possible.
I don't mind doing the legwork, I just don't know what the specific legwork is that I must do right now.
I've created a helper function for my own purposes and thought it would be useful to others. So CPAN seems a sensible place to put it so others can use it if they want to...
It's function is simple - to go to the homepage of a website and return an array of URI's within that site, being careful not to stray outside it, that use the http or https scheme. It ignores things that aren't plain text or that it cannot parse such as PDFs or CSS files but includes Javascript files as links (thing like window.open or document.location.href) might be lurking there. It deliberately doesn't try to follow the action attribute of a form as that is probably meaningless without the form data.
As the Monastery has taught be that all published modules should have tests, I want to do it probably and provide those tests...
But, given that there is only one function and it makes HTTP requests, what should I test?
The obvious (to me) test is that it returns the right number of URIs from a website. But that number will likely change over time, so I cannot hardcode the 'right' answer into the tests. So beyond the necessary dependencies and their versions, I'd like some ideas of what should be in the tests, please.
In case you're interested, this came about from wanting to automate producing and installing sitemap files.
However, I get the following error message when I try to run it via SSH from my local machine:
leudwinus@localmachine:~$ ssh user@remotemachine './json_test'
Can't locate JSON.pm in @INC (you may need to install the JSON module)
+ (@INC contains: /home/user/perl5/lib /usr/local/lib/perl5/site_perl/
+mach/5.32
/usr/local/lib/perl5/site_perl /usr/local/lib/perl5/5.32/mach /usr/loc
+al/lib/perl5/5.32) at ./json_test line 4.
BEGIN failed--compilation aborted at ./json_test line 4.
I have a large amount of legacy CGI Perl scripts, which were made to be executed by Apache modperl.
What I want to do is move to a http server inside a perl script (like AnyEvent::HTTPD) that supports calling the perl CGI scripts - and then package the entire thing as a single exe using par. This is so users can run it without having to have either apache or perl installed.
Now CGI seems fairly straight forward (content to STDIN, headers to ENV, and STDOUT is the HTTP response), but I figured someone has probably done this before? Or are there better ideas?
I am essentially doing this to execute the cgi in the httpd request callback:
{
local *STDOUT;
local *STDIN;
#local %ENV;
#$ENV{http-cookie} = $requestCookie;
open(STDIN, "<", \$requestBody);
open (STDOUT, '>>', \$response);
do './cgi/foobar.pl';
}
The error references the List::Util module, but doesn't occur unless I add the "use File::Copy", which is a completely separate module. What is going on here?
Hello, monks.
With your help I've written a script that processes a large number of text files, efficiently. I run this script inside directories containing 1K to 10K files, usually less than 5K.
However, I've noticed that attempting to process larger number of files, i.e. several directories at once, the script gets exponentially slower. For example, while a run on 3.5K files would takes around 4.5 seconds, on 35K files takes 90 instead of 45 seconds and on 350K files it runs for hours.
This has baffled me, as I'm using subdirectories to organize the data, and filesystem operations shouldn't impact performance negatively; additionally, the data filenames are glob()bed into an array which is looped over and not slurped in at once and processed in bulk (although, in my tests I tried that approach which exhibited the same behavior).
What's very interesting is that when I put a counter to stop processing at 1000 files, I got increasingly longer processing times with each subdirectory added to the list, despite only processing 1000 files from it. Also, I always copy my data to /tmp which is mounted as tmpfs to reduce SSD wear and achieve maximum read/write performance.
Testing:
wget http://www.astro.sunysb.edu/fwalter/AST389/TEXTS/Nightfall.htm
html2text-cpp Nightfall.htm >nightfall.txt
mkdir 00; for i in `seq -w 0 3456`; do head -$((RANDOM/128)) nightfall
+.txt >00/data-$i; done
This will create a directory ("00") with 3,456 random sized files inside. Perl script:
#!/usr/bin/perl
use strict;
use warnings;
use 5.36.0;
use Env;
use utf8;
use POSIX "sys_wait_h"; #for waitpid FLAGS
use Time::HiRes qw(gettimeofday tv_interval);
use open ':std', ':encoding(UTF-8)';
my $benchmark = 1; # print timings for loops
my $TMP='/tmp';
my $HOME = $ENV{HOME};
my $IN;
my $OUT;
my @data = glob("data-* ??/data-*");
my $filecount = scalar(@data);
die if $filecount < 0;
say "Parsing $filecount files";
my $wordfile="data.dat";
truncate $wordfile, 0;
#$|=1;
# substitute whole words
my %whole = qw{
going go
getting get
goes go
knew know
trying try
tried try
told tell
coming come
saying say
men man
women woman
took take
lying lie
dying die
};
# substitute on prefix
my %prefix = qw{
need need
talk talk
tak take
used use
using use
};
# substitute on substring
my %substring = qw{
mean mean
work work
read read
allow allow
gave give
bought buy
want want
hear hear
came come
destr destroy
paid pay
selve self
cities city
fight fight
creat create
makin make
includ include
};
my $re1 = qr{\b(@{[ join '|', reverse sort keys %whole ]})\b}i;
my $re2 = qr{\b(@{[ join '|', reverse sort keys %prefix ]})\w*}i;
my $re3 = qr{\b\w*?(@{[ join '|', reverse sort keys %substring ]})\w*}
+i;
truncate $wordfile, 0;
my $maxforks = 64;
print "maxforks: $maxforks\n";
my $forkcount = 0;
my $infile;
my $subdir = 0;
my $subdircount = 255;
my $tempdir = "temp";
mkdir "$tempdir";
mkdir "$tempdir/$subdir" while ($subdir++ <= $subdircount);
$subdir = 0;
my $i = 0;
my $t0 = [gettimeofday];
my $elapsed;
foreach $infile(@data) {
$forkcount -= waitpid(-1, WNOHANG) > 0 while $forkcount >= $maxfor
+ks;
# do { $elapsed=tv_interval($t0); print "elapsed: $elapsed\n"; die;
+ } if $i++ >1000; # 1000 files test
$i++; # comment out if you uncomment the above line
$subdir = 1 if $subdir++ > $subdircount;
if (my $pid = fork) { # $pid defined and !=0 -->parent
++$forkcount;
} else { # $pid==0 -->child
open my $IN, '<', $infile or exit(0);
open my $OUT, '>', "$tempdir/$subdir/text-$i" or exit(0);
while (<$IN>) {
tr/-!"#%&()*',.\/:;?@\[\\\]”_“{’}><^)(|/ /; # no punct "
s/^/ /;
s/\n/ \n/;
s/[[:digit:]]{1,12}//g;
s/w(as|ere)/be/gi;
s{$re2}{ $prefix{lc $1} }g; # prefix
s{$re3}{ $substring{lc $1} }g; # part
s{$re1}{ $whole{lc $1} }g; # whole
print $OUT "$_";
}
close $OUT;
close $IN;
defined $pid and exit(0); # $pid==0 -->child, must exit itself
}
}
### now wait for all children to finish, no matter who they are
1 while wait != -1; # avoid zombies this is a blocking operation
local @ARGV = glob("$tempdir/*/*");
my @text = <>;
unlink glob "$tempdir/*/*";
open $OUT, '>', $wordfile or die "Error opening $wordfile";
print $OUT @text;
close $OUT;
$elapsed = tv_interval($t0);
print "regex: $elapsed\n" if $benchmark;
I'm using Mail::IMAPClient to write my own mail monitor and currently trying handling connection problems, simulated simply detaching the ethernet cable. Ideally I'd like the program to retry a connection for some times before giving definitevely up.
With both intuitive solution like $imap->connect() if $imap->IsUnconnected and also reinstanciating the whole $imap object ( new calls connect or login on its own) I get a nasty loop of: Lost connection to ... retry 1 retry 2.. [I plug eth] Succesfully reconnected .. Lost connection to ..
Here the code reduced to the minimum (I left the read password echo(0) to play nicely with your terminal security ;)
use strict;
use warnings;
use Mail::IMAPClient;
use Term::ReadKey;
my $user = 'put.here.your.imap.account@example.com';
my $server = 'email.example.com';
my $port = 993;
my $ssl = 1; # generally ssl is used
my $imap_folder = "INBOX";
my $sleep = 5;
print "\n(use CTRL-C to permit logout)\n";
print "Enter the password for $user on $server\n";
my $password;
ReadMode('noecho');
$password = ReadLine(0);
chomp $password;
ReadMode 'normal';
my $imap = Mail::IMAPClient->new(
Server => $server,
User => $user,
password => $password,
Port => $port,
Ssl=> $ssl,
Uid=> 1,
) or die "IMAP Failure: $@";
print "Logged in succesfully\n" if $imap->IsConnected();
# Handle Ctrl-C
$SIG{INT} = sub{
print "\n\nLogging out..\n";
$imap->logout();
exit;
};
# do NOT mark as read when handling messages
$imap->Peek(1);
# look in INBOX
$imap->select( $imap_folder ) or die "IMAP Select Error for imap folde
+r [$imap_folder]: $@";
my $now = time;
my %seen = map { $_ => 1} $imap->sentsince($now);
while (1){
##################################################################
+##############
# Handling a lost connection
##################################################################
+##############
if ( $imap->IsUnconnected ){
print "Lost connection to $server\n";
for (1..100){
sleep 1 for 1..5;
print "Connection retry $_..\n";
# option 1
# enters in a loop connected / unconnected
# $imap->connect();
# option 2
# also this enters in a loop
$imap = Mail::IMAPClient->new(
Server => $server,
User => $user,
password => $password,
Port => $port,
Ssl=> $ssl,
Uid=> 1,
);
next unless $imap;
last if $imap->IsConnected();
}
# give up..
die "Reconnection impossible!" if $imap->IsUnconnected();
# if here we succeded
print "Succesfully reconnected!\n";
}
# as the above solutions do not work..
next if $imap->IsUnconnected();
##################################################################
+##############
my @msgs = $imap->sentsince($now);
foreach my $msg (@msgs){
# without this line I get an undef $msg entry in %seen
next unless defined $msg;
next if $seen{$msg};
$seen{ $msg }++;
print "New message: ",
$imap->get_header( $msg, "Subject" ), " from: ",
$imap->get_header( $msg, "From" ),"\n";
}
sleep 1 for 1..$sleep;
$now = time;
}
I'm doing what people have to do in mormon country to save their sanity: study the rocks, which turns out to be, by far, the most interesting things in this region. So I came up with a script that was going to tell me something about the Bonneville flood, which I've been digging through the last 30 years of my life without having known it until last year. It is literally true that I brought the internet to Idaho in the form of walking in front of the caterpillar that ripped a channel for the fiberobtic cable from Preston to Spokane. (1989)
For the distance calc, I used geodatasource.com as a source. I supply a date and time that isn't relevant yet but stubbed in to be used eventually. Let me me just trot this thing out: 2.millcreek.pl. It's behaving pretty well, I think. I don't mean to thumb my nose at the metric system by using feet and miles, but it's definitely the parlance of the locals:
fritz@laptop:~/Documents/gitlab1$ ./2.millcreek.pl
Subroutine get_logger redefined at ./2.millcreek.pl line 68.
INFO: ./2.millcreek.pl
INFO: pi is 3.14159265358979
INFO: Bonneville max altitude in feet: 5200
INFO: Boise 43.61 -116.2
INFO: return from the google is 821 meters
INFO: Altitude in feet is 2693.56964
INFO: Difference from max Bonneville elevation is 2506.43036 ft
INFO: ==============
INFO: distance is 240.161359349768 miles
INFO: ==============
INFO: near sublett 42.3278114 -113.2104084
INFO: return from the google is 1455 meters
INFO: Altitude in feet is 4773.6222
INFO: Difference from max Bonneville elevation is 426.3778 ft
INFO: ==============
INFO: distance is 175.16979925839 miles
INFO: ==============
INFO: snowville 41.9655701 -112.7105234
INFO: return from the google is 1384 meters
INFO: Altitude in feet is 4540.68256
INFO: Difference from max Bonneville elevation is 659.31744 ft
INFO: ==============
INFO: distance is 35.8058824322112 miles
INFO: ==============
INFO: juniper 42.152429 -112.9842191
INFO: return from the google is 1577 meters
INFO: Altitude in feet is 5173.88468
INFO: Difference from max Bonneville elevation is 26.1153199999999 ft
INFO: ==============
INFO: distance is 19.0729839145012 miles
INFO: ==============
INFO: on the outcrop 40.703684 -111.7849951
INFO: return from the google is 1752 meters
INFO: Altitude in feet is 5748.03168
INFO: Difference from max Bonneville elevation is -548.03168 ft
INFO: ==============
INFO: distance is 117.801976807568 miles
INFO: ==============
INFO: Cascade way 40.7062734 -111.7941259
INFO: return from the google is 1524 meters
INFO: Altitude in feet is 5000.00016
INFO: Difference from max Bonneville elevation is 199.99984 ft
INFO: ==============
INFO: distance is 0.510598364633098 miles
INFO: ==============
INFO: Mantua 41.5073303 -111.944728
INFO: return from the google is 1582 meters
INFO: Altitude in feet is 5190.28888
INFO: Difference from max Bonneville elevation is 9.71111999999994 ft
INFO: ==============
INFO: distance is 55.8975408837833 miles
INFO: ==============
INFO: dry creek 41.5501001 -111.9537977
INFO: return from the google is 1810 meters
INFO: Altitude in feet is 5938.3204
INFO: Difference from max Bonneville elevation is -738.3204 ft
INFO: ==============
INFO: distance is 2.99196958759918 miles
INFO: ==============
INFO: wellsville 41.6365147 -111.9288947
INFO: return from the google is 1376 meters
INFO: Altitude in feet is 4514.43584
INFO: Difference from max Bonneville elevation is 685.56416 ft
INFO: ==============
INFO: distance is 6.1074736112501 miles
INFO: ==============
fritz@laptop:~/Documents/gitlab1$
So far, so good, and I welcome any style, usage, or numeric improvements. I might have 2 significant figures, but how much width can I fill up? Let's take a look at the dependencies:
fritz@laptop:~/Documents/gitlab1$ grep use 2.millcreek.pl
use v5.030; # strictness implied
use warnings;
use feature qw[ signatures ];
use Data::Dumper;
use DateTime;
use DateTime::Format::ISO8601;
use DateTime::TimeZone;
use Log::Log4perl;
use Try::Tiny;
use LWP::UserAgent;
use HTTP::Request;
use Data::Roundtrip;
I have to use Log4perl to keep this type of data straight, and I use this hack of Log4perl so as to not have a .conf file dependency. Otherwise heavy with a lot of software from bliako. I wouldn't have been able to get these values right in geologic time without bliako's help at this level:
# this request was translated from Curl command-line
# by [Corion]'s https://corion.net/curl2lwp.psgi
my $req = HTTP::Request->new(
'POST' =>
'https://www.mapcoordinates.net/admin/component/edit/Vpc_MapCoordinate
+s_Advanced_GoogleMapCoords_Component/Component/json-get-elevation',
[
'Connection' => 'keep-alive',
'Accept' => '*/*',
'Accept-Encoding' => 'gzip, x-gzip, deflate, x-bzip2, bzip2',
'Accept-Language' => 'en-US,en;q=0.5',
# 'Host' => 'www.mapcoordinates.net:443',
'Referer' => 'https://www.mapcoordinates.net/en',
'User-Agent' =>
'Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firef
+ox/78.0',
'Content-Length' => $payloadlen,
'Content-Type' => 'application/x-www-form-urlencoded; charse
+t=UTF-8',
'DNT' => '1',
'Origin' => 'https://www.mapcoordinates.net',
'X-Requested-With' => 'XMLHttpRequest'
],
$payload
);
die "call to HTTP::Request has failed" unless $req;
What's the problem? 1. These values are ultimately dependent on Google, LLC. I'm happy that I can make use of them at this heartbeat, but I'm looking for other sources.
Q1) Does anyone know of another source for elevations on the interweb, given latitude and longitude? It seems like something more than one crappy company could pull off, in the age of satellites and lidar.
Q2) How do I make make such queries without associating it to my IP address?
changing tack...
Rather than continuing in this line, let me try to get where I'm going by different methods. As I followed the path of the flood from Brigham City to Pocatello, I realized that I was on US highway 91 the entire time. If I instead had a dataset that represented that path, then I would also have a gradient for the flood. Let me try to represent what I have:
I know that very few Germans will have been to Pocatello, but I find them to be some of the greatest enthusiasts for the rifted out sections of the American west. All that nothing, but then it's also so rich because it was a pluvial lake. The potatoes love it.
Q3) If not Highway 91, is there any other highway you can glean the points from?
Q4) Can we thus approximate the Mosel river from Trier to Koblenz? How many points do you need to drive a car next to this river (without falling in), as I remember doing many moons ago?
Ok, let me just throw this out there and see what I get.
Snippets of code should be wrapped in
<code> tags not<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).