Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re^2: IPC::Open, Parallel::ForkManager, or Sockets::IO for parallelizing?

by mldvx4 (Friar)
on Oct 02, 2023 at 23:33 UTC ( [id://11154802] : note . print w/replies, xml ) Need Help??


in reply to Re: IPC::Open, Parallel::ForkManager, or Sockets::IO for parallelizing?
in thread IPC::Open, Parallel::ForkManager, or Sockets::IO for parallelizing?

Thanks. I've taken a closer look at LWP::Parallel now and have some questions about how it should handle many (most?) HTTPS sites. For now, it seems to return HTTP Status "503 Service Unavailable" for ones that exist and are accessible via other agents. Here is one example:

#!/usr/bin/perl use LWP::Parallel::UserAgent; use LWP::Debug qw(+); use strict; use warnings; my $headers = new HTTP::Headers( 'User-Agent' => "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 +(KHTML, like Gecko) Chrome/30.0.1599.66 Safari/537.36", ); my @requests; foreach my $url ('https://blog.arduino.cc/feed/') { push(@requests, HTTP::Request->new('GET', $url, $headers)); } # new parallel agent my $pua = LWP::Parallel::UserAgent->new(); $pua->in_order (0); $pua->duplicates(1); $pua->timeout (9); $pua->redirect (0); $pua->max_hosts (5); $pua->nonblock (0); foreach my $req (@requests) { if ( my $res = $pua->register ($req, \&handle_answer, 8192) ) { print $res->error_as_HTML; } else { print qq(ok\n); } } my $entries = $pua->wait(); foreach my $k (keys %$entries) { my $res = $entries->{$k}->response; my $url = $res->request->url; print $res->code,qq(\t $url\n); } exit(0); sub handle_answer { my($content, $response, $protocol, $entry) = @_; if (length($content)) { $response->add_content($content); } return(undef); }

As one can see with various browsers the feed in question is there but yet it is one of the feeds that LWP::Parallel is choking on.

  • Comment on Re^2: IPC::Open, Parallel::ForkManager, or Sockets::IO for parallelizing?
  • Download Code

Replies are listed 'Best First'.
Re^3: IPC::Open, Parallel::ForkManager, or Sockets::IO for parallelizing?
by hippo (Bishop) on Oct 03, 2023 at 10:57 UTC
    have some questions about how it should handle many (most?) HTTPS sites.

    Yeah, it seems to be pretty much all of them, which is a real shame. I guess it must have been about 6 or 7 years ago that I last used LWP::Parallel for anything serious and back then this wasn't really an issue. In the meantime the heavy hand of Google has de-facto forced most of the web over onto HTTPS and now this is a major consideration.

    Having tested this briefly against one of my own sites it does actually appear to be downloading the content in that the server receives, accepts and serves the request OK. It's just that the user agent has some sort of internal problem with the response.

    It might be worth raising a ticket although there are plenty open. Still, it would alert other users to the problem.


    🦛

      I'm willing to try making a bug report. Is there an alternate approach to raising a ticket, other than that link?

      The self-censored comment: I ask because the link offered goes not to a web page or web site but a "web app", and a broken "web app" at that. After 10 minutes of faffing about with broken "web app", I was able to create an account and log in. However, after an additional 15 minutes I was not able to make any headway in actually getting a complete web form in order to report a bug let alone actually report a bug. I enjoy Perl a lot and am really grateful for all the knowledge here here but have zero tolerance for javascript, especially when it is abused to block what used to be a simple activity. :(

        "Is there an alternate approach to raising a ticket ..."

        Take a look at "How to use rt.cpan.org". It provides a variety of information including bug reporting; the email option may be more to your liking.

        There's links to other information which may help you with problems that you're experiencing. I've raised quite a few bug reports over the years but always used email; I can't provide any direct help with the "web app" you describe.

        — Ken

        RT certainly used to be fine with JS disabled. Perhaps there has been some change in that regard. Anyway, from the RT homepage:

        To submit a bug report for a given distribution by email, send mail to bug-<distribution-name>@rt.cpan.org, where "<distribution-name>" is something like DBIx-SearchBuilder or Class-DBI or Acme-Current-Forever. Use search to find name of a distribution.

        In this case the distribution name is ParallelUserAgent

        PS. The direct link to create a ticket for any dist is here.


        🦛