Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Some Weird behavior with BerkeleyDB and WWW::Mechanize

by downer (Monk)
on Dec 05, 2007 at 19:21 UTC ( [id://655184]=perlquestion: print w/replies, xml ) Need Help??

downer has asked for the wisdom of the Perl Monks concerning the following question:

I have one program that reads the contents of a file generated by a colleague, splits that data, and enters it into a BerkeleyDB. I now have another script which I'd like to periodically go through the keys of that hash, split the associated value, and perform a get on one of those values. if i do
foreach my $x (keys %hash) { my $data = $hash{$x}; print "$x\n"; }
the keys and associated values are correct. similarly if I split $data and just print the one element that I want. Things get weird when I add WWW::Mechanize into the mix. say within the above loop I have:
my @parts = split(/\t/, $data); my $url = $parts[0]; $mech->get( "$url" );
then the get operation is never successful. when I take a closer look at $x and $url, things are now funny. the contents of $x should be something like OtnFfkSpw6A. they are now --FfkSpw6A, the 1st 2 characters have been replaced with a -'s. Something similar happens with the $url, $x is actually a part of the url, and the same 2 characters are replaced. what could cause this? what am i doing wrong!
thanks, monks.

Replies are listed 'Best First'.
Re: Some Weird behavior with BerkeleyDB and WWW::Mechanize
by moritz (Cardinal) on Dec 05, 2007 at 20:09 UTC
    It seems very unlikely that the sole useage of WWW::Mechanize and the method call changes your data.

    I think it's more likely that there is another part of your code that interferes with the code that you showed us.

    If you want us to test that you'd have to provide a sample script that we can execute and that demonstrates your problem.

      there really isnt much else to my code:
      #!/usr/bin/perl -w use strict; use BerkeleyDB; use WWW::Mechanize; my $mech = WWW::Mechanize->new(); my %pageHash; tie %pageHash, "BerkeleyDB::Btree", -Filename => 'pageContents', -Flags => DB_CREATE, or die "Cannot open file pageContents: $! $BerkeleyDB::Error\n +" ; my %hash; tie %hash, "BerkeleyDB::Btree", -Filename => 'videoDB', -Flags => DB_CREATE, or die "Cannot open file videoDB: $! $BerkeleyDB::Error\n" ; foreach my $x (keys %hash) { print "getting $x:"; my $data = $hash{$x}; my @parts = split(/\t/, $data); my $url = $parts[0]; $mech->get( "$url" ); if($mech->success()) { #$pageHash{$x} = $mech->content( format => 'text' ); print " done\n"; } else { print " failed\n"; } sleep(15); }
        I tested your code (I added two URLs to the hash) and I can't reproduce your problem. Neither key nor value get modified in any strange ways.

        Maybe try to move your pageContents files and start with an empty, clean one and check if the error still occurs.

Re: Some Weird behavior with BerkeleyDB and WWW::Mechanize
by weismat (Friar) on Dec 05, 2007 at 22:14 UTC
    My personal experience on Linux and Windows is that pretty bad with WWW::Mechanize as it tends to leak from my pov. How many pages are you looking at? I am downloading between 1500-2000 web pages and the leak was somewhere inside WWW::Mechanize as I did not have the leak after changing the URL class. With the simple UA class there was no leak and I gained a lot of speed by running several UAs in parallel.
      I switched to LWP::Simple, and that seems to work no problem. I didnt change anything else, so this behavior remains mysterious.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://655184]
Approved by bart
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (6)
As of 2024-04-25 11:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found