Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: Using GET in a loop

by davidj (Priest)
on Sep 17, 2004 at 09:47 UTC ( [id://391731]=note: print w/replies, xml ) Need Help??


in reply to Using GET in a loop

Obviously, due to the fact that it doesn't even compile, the code you have posted is not the code you are using, but a strip-down for this question. I would suggest that you post the code you are using. I guess you could dummy the urls you are fetching, but the rest of the code should be posted as is.

davidj

Replies are listed 'Best First'.
Re^2: Using GET in a loop
by New Novice (Sexton) on Sep 17, 2004 at 09:59 UTC
    Here is the compilable code. I thought it would be easier to focus on the problem directly. Sorry about any inconvencience caused.

    #! C:/programme/perl use LWP::Simple; use LWP::UserAgent; use HTML::Stripper; use warnings; use strict; our $stripper = HTML::Stripper->new( skip_cdata => 1, strip_ws => 1 ); our $ID; our @ID=(161060, 160920, 160999, 160899); our $count=1; foreach $ID (@ID) { my $content; my $content_full; my $url="http://europa.eu.int/prelex/detail_dossier_real.cfm?CL=en&Do +sId="."$ID"; $content_full=" "; $content_full=get($url); $content=$stripper->strip_html($content_full); our $i_type=index($content, " COM "); our $d_type=substr($content, $i_type+1,3); our $d_year=substr($content, $i_type+6,4); our $d_number=substr($content, $i_type+12,3); our $proposal="$d_type "."\($d_year\)"." $d_number"; print "Proposal\: $proposal \n"; open DB, ">> C:/programme/perl/test/prelex.dta" or die "Problem: $!"; flock (DB, 2); print DB "$proposal\n"; close DB; }

      By "focusing on the problem", you managed to focus away the part of the code with the bug.

      Now, with your assumptions tempered, it is obvious that the problem lies in the re-use of the $stripper object.

      Easiest solution; make a new $stripper in each iteration.

      foreach $ID (@ID) { $stripper = HTML::Stripper->new( ... ); ... }

      And in case it isn't clear, this has nothing to do with get().

       

        That is the solution! Thank you!!!

        Sometimes you do not see the forest because of all the trees...

        The problem is that HTML::Stripper uses HTML::Parser (which accumulates content) in strange ways and the documentation is not very explicit about it.
      Well, now the next question: is it $content_full or $content that is getting appended to instead of replaced? That is, is it the result of LWP::Simple's get function or HTML::Stripper's strip_html function that is not working correctly?

      davidj

        I think it is the $content_full variable. I printed content_full into a file last week and it got bigger and bigger being appended.

        There is a hint at this problem in the LWP::User Agent documentation. It states that there should be a new object for each request. Presumably, because an internal variable in GET is appended and not replaced with each new request.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://391731]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (2)
As of 2024-04-20 05:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found