Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: query from more than one search engine

by fuzzysteve (Beadle)
on Nov 26, 2001 at 20:47 UTC ( [id://127573]=note: print w/replies, xml ) Need Help??


in reply to query from more than one search engine

I won't write it for you (don't have the time, sorry), but the module you should look at is LWP::UserAgent It will allow you to emulate a browser uerying the search engine. You'll have to build up the query string with the requests, but in most cases thats just a case of inserting the search strings into the query string.
Google for example, uses query strings like
http://www.google.com/search?q=blam
to search for 'blam' Then you'll need to parse your way through the returned page and pull the links to outside pages. Harder, but not too bad. Google for example, looks liek if you pull two lines from each <p> tag you'll get the lines with the details you'll need.
The hardest bit, to my mind? Merging them in a sensible way. what counts as sensible?

Replies are listed 'Best First'.
Re: Re: query from more than one search engine
by xxxxxxxx (Initiate) on Nov 27, 2001 at 15:43 UTC
    that is what i do.. i put the the strings like the one up there. but, i could only put 1 string/address..not 3...that is my problem....how do i put 3 strings/addresses? ..or how do i make the query passes at least 3 search engine before giving the results?
      you'll have to grab all three (seperate calls), parse out the links, then compile a page from those links.
      pseudo code:
      page1=grab link1 into variable; page2=grab link2 into variable; page3=grab link3 into variable; define an array of hashs; parse links from page1 pushing into the array; parse links from page2 pushing into the array; parse links from page3 pushing into the array; weight the array in a sensible fashion; output page using the array;
        need help here..these are my problem areas..

        my $new_url = "http://www.google.com/search?q=$a" ;
        my $new_url = "http://hotbot.lycos.com/?MT=$a" ;
        my $new_url = "http://www.altavista.com/sites/search/web?q=$a&kl=XX&pg=q" ;



        # HERE IS ONE TOO. I CAN'T PUSH THE STRING INTO THESE VARIABLES.
        open(OUTF,">subject1.html") or dienice("Can't open subject1.html for writing: $!");
        open(OUTF,">subject2.html") or dienice("Can't open subject2.html for writing: $!");
        open(OUTF,">subject3.html") or dienice("Can't open subject3.html for writing: $!");

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://127573]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (2)
As of 2024-04-26 03:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found