Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

foreach problem

by Anonymous Monk
on Aug 27, 2008 at 20:46 UTC ( [id://707299]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I am so confused. I created a small script to download some files from a server. These are configs visible in webpages. We need to automatically backup over time and from several servers while just HTTP access is possible. Now I have a code where I check whether I can get a response from server in a reasonable time (accessibility check). I want this done for all html pages I specify in file. I got a foreach loop that does it. But that just don't work and asks only first item of an array. If i copy-paste the same foreach code second time, the firs loop works a expected while second again asks just the first item of the array. Here the code. Please help.
#!/usr/bin/perl use strict; use Switch; my $i; my $arguments; my $date; my $curla; my @args; my @links; my $response; my $part; my $curlc; my $time; sub check_link($) { my $link=$_[0]; my $page="NOK"; $link =~ s/\n//g; $link =~ s/\r//g; print LOG "check_link: checking validity of $link\n"; my $curlc="curl ".$curla." ".$link."\n"; print LOG "check_link: getting page with $curlc"; local $SIG{ALRM} = sub { print LOG "check_link: no response - timing out\n"; `pkill -9 -f $link`; $page="NOK"; }; alarm(10000); $page=`$curlc`; alarm(0); print LOG "check_link: got response in time\n"; return $page; } open (LOG,">>rapget.log") or die "can't open logfile\n"; $date=`date`; print LOG "$$: ============================\nnew start of script : $da +te\n"; foreach $i(@ARGV){ if($i=~/-/){$arguments.=$i." "}; if($i!~/-/){$arguments.=$i."%"}; } print LOG "$$:ARGUMENTS : $arguments\n"; @args= split (/%/,$arguments); foreach $i(@args){ switch ($i) { case /-U.*/ {print LOG "Adding user to curl\n";$curla. +=$i." ";} case /-x.*/ {print LOG "Using proxy for curl\n";$curla +.=$i." ";} else {print LOG "Unknown argument: $i\n"} } } print LOG "$$:curla=$curla\n"; open (IN,"<linklist.txt") or die "no input file found\n"; @links=<IN>; close(IN); my $link; foreach $link(@links) { chomp($link); $response=check_link("$link"); $part = $response; $part =~ s/\r//g; $part =~ s/\n//g; print "1 Empty run\n"; }

Replies are listed 'Best First'.
Re: foreach problem
by ikegami (Patriarch) on Aug 27, 2008 at 21:12 UTC
    It's a problem with Switch or your switch. If I comment out the following bit, it works fine
    switch ($i) { case /-U.*/ {print LOG "Adding user to curl\n";$curla.=$i." ";} case /-x.*/ {print LOG "Using proxy for curl\n";$curla.=$i." ";} else {print LOG "Unknown argument: $i\n"} }

    Please refer to the BUGS section of the Switch documentation:

    There are undoubtedly serious bugs lurking somewhere in code this funky

    Update: And here's the immediate cause of the bottom loop existing after one pass:

    $ perl -MO=Deparse 707299.pl ... foreach $link (@links) { chomp $link; $response = check_link("$link"); $part = $response; $part =~ s/\r//g; $part =~ s/\n//g; print "1 Empty run\n"; } continue { <------ Added by Switch. last; <------ Causes "for @links" to } <------ exit after first pass.
Re: foreach problem
by grinder (Bishop) on Aug 27, 2008 at 21:15 UTC

    First thing to do would be to ditch the Switch.

    This is a deprecated module in production code, and only served to examine and explore what a real switch construct would look like. This lay the groundwork for the switch statement in Perl 6, which in turn was backported to Perl 5 in the 5.10.0 release (via a feature pragma). The Switch module is implemented via a source filter, and is the cause of many strange action-at-a-distance errors.

    Your handling of @ARGV would be handled in a more orthodox manner with Getopt::Std or Getopt::Long. Doing so just might make your problem go away.

    • another intruder with the mooring in the heart of the Perl

Re: foreach problem
by JadeNB (Chaplain) on Aug 27, 2008 at 21:10 UTC
    Which foreach loop is not iterating as many times as you want? Is it the last one (foreach $link(@links))? What are the command line options when you run it? What are the contents of linklist.txt? When I run this with no command-line arguments and linklist.txt containing the names of two web pages, I get two runs (although what I see is '1 Empty run' printed twice, which may or may not be what you want—see below). What should I see?

    Note that you almost certainly don't want to roll your own option-parser. For example, your parser would be perfectly happy to accept two (or zero) arguments to -U or -x, which is probably not what you want. Something like getopt is probably more appropriate.

    Note that the quotes around $link in the call check_link("$link") are unnecessary—they would 'stringify' $link, but it's just a simple scalar read from a file anyway.

    What is the significance of $response and $part? You perform substitutions on them, but you never seem to use them.

    That last foreach loop is probably more appropriate as a while loop anyway: Unless you need to know in advance all the lines you'll be processing (which doesn't seem to be the case), it's much more memory-efficient to process them one at a time.

    UPDATE: I don't have Switch installed, so I used Perl 5.10's given instead. Since ikegami says below that switch is the problem, that could be why I'm not seeing anything.
    UPDATE 2: If you do want to keep your hand-rolled parser, note that you must anchor your regexes at the beginning of the string. /-/ will match a string containing a hyphen anywhere, not just at the beginning.

Re: foreach problem
by toolic (Bishop) on Aug 27, 2008 at 21:08 UTC
    Complete stab in the dark: what if you don't use a prototype in your sub:
    sub check_link {

    You didn't specify which foreach you are having trouble with. Is it this one?

    foreach $link(@links) {

    Also, start printing things out and using Data::Dumper.

Re: foreach problem
by Perlbotics (Archbishop) on Aug 27, 2008 at 21:18 UTC
    Additionally to what has been said already:
    • you might like to quote the link that you give to curl, e.g. my $curlc = "curl $curla '$link'";
    • you might like to redirect the error-stream of curl to /dev/null or to STDIN (2>&1), but be aware that curl can print progress information to STDERR
    • A timout of 10.000s is quite high.
    • Update: If you want to check accessibility and document-existence only, it would be sufficient to download the header information only (curl switch: --head), not the whole file. Then check for /HTTP\S+\s+(\d+)\s+/ and $1 eq 200 for an existing file.
Re: foreach problem
by Anonymous Monk on Aug 28, 2008 at 06:44 UTC
    Thanks for the help. What you see is a much simplified version which helped me to come to conclusion that there is something wrong with the code. The hand-made command line parser was done on purpose as I wanted some really tricky argument processing and didn't know Getopt::Long. It will be a bit uglier to read with Getopt::Long but I will be able to do what I need. Thanks to pointing me in the right direction. I see I'm still just a noob.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://707299]
Approved by varian
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (5)
As of 2024-04-25 21:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found